Ed Toner, CIO

Blog:High Availability

I looked at the 2023 NASCIO Top 10 State Priorities list posted recently and realized that Business Continuity and/or High Availability (HA) was not mentioned. I went back through several years of lists and had to travel all the way back to 2016 to find it—it was listed at number 10. 2015’s list had it at number 9. High Availability and/or resiliency was missing from every list. Is it because we have no competition for our services, so our customers don’t expect as much from us?


High Availability demonstrates to your customers that you want to provide good service and protect your reputation. 2023 started the year with outages impacting Airlines, the FAA, the New York Stock Exchange, Microsoft’s worldwide Teams and Outlook programs, and Instagram. Just last week, Twitter had a tech failure announcing Ron DeSantis’ run for president. Each incident resulted in harm to their reputations and quite possibly customer churn. Where is the focus on mitigating risk?


During my last year at First Data, I led a team of IT architects and performance engineers who successfully achieved availability and performance improvement targets, transforming 14 front-end authorization platforms to a Recovery Time Objective (RTO) of zero within that initial year of the program. We architected the infrastructure utilizing two geographically separated Active/Active data centers and defined the standards required to keep the technology running in spite of unavoidable IT disruptions.


High Availability basically involves failover and the following redundancies:

  • Hardware redundancy.
  • Software and app redundancy.
  • Data redundancy.


Every business, private or public, encounters technology issues. Industry leaders publish availability SLAs of 99.9%, which suggests a reliable system. Downtime is inherent in the thousands of hardware and software systems we manage, but what we do to minimize the impact distinguishes us from others. Factors that minimize downtime include a strong change management program that enforces maintenance periods, limiting privileged access to systems, high availability architecture via failover capabilities, and load balancing across geographically separated data centers with independent system instances to ensure availability even if one instance fails. Lastly, redundant network paths and synchronized databases achieve a Recovery Point Objective (RPO) of zero.


Many of these practices have guided our mission here at the OCIO. IT teams should continuously attempt to reduce downtime and improve availability, introducing redundancy strategically by starting with mission-critical workloads. Resiliency improves the probability that our systems are fully operational or improves our chances to recover business functions in time to meet customer needs. It is the reason for our requirement to have instances of every mission-critical system running in our two data centers with real-time replication of data across these systems, eliminating single points of failure.


This high availability approach is vital for mission-critical systems that cause major service disruptions when they run into inevitable failures. Availability is a main component by which our customers judge the OCIO’s ability to deliver quality services. We saw the benefit of our practice with a recent hardware outage. Agencies that correctly identified mission-critical applications were not affected and did not see any interruption. Those that did learned the value add of having an instance in place in each of our data centers.
CIOGoalList


Minimizing downtime of critical systems should be fundamental. For the State of Nebraska our number one priority will always be high availability.

As always, I appreciate your efforts to provide quality services to the State and the Citizens of Nebraska!

Ed Toner