When Systems Fail: Building Resilient Laboratory Informatics Through Comprehensive COOP Planning
When the network goes down, how will your laboratory's informatics systems keep running? This question took on urgent importance for Tennessee Department of Health Division of Laboratory Services’ newborn screening program on Christmas Day 2020, when a car bomb devastated downtown Nashville's AT&T service building, bringing down communications and network access across the region.
The explosion rendered the state laboratory's internet infrastructure nonfunctional, creating an unprecedented crisis. Without connectivity, the laboratory information management systems (LIMS) became inoperable, preventing laboratory staff from processing or analyzing any newborn screening samples. For a program that detects potentially life-threatening conditions requiring immediate intervention, the network outage created an urgent public health emergency.
The Single Point of Failure
Tennessee's experience revealed a critical vulnerability: AT&T served as both the primary and backup internet provider for the state laboratory, creating a single point of failure. When the service went down, there was no redundancy to fall back on. Initial attempts to restore connectivity through portable Wi-Fi hotspots, point-to-point Wi-Fi connections and emergency internet service provider (ISP) contracts all proved too slow or uncertain for immediate need.
As the outage continued, laboratory leadership recognized that temporary connectivity solutions would not be sufficient. This prompted activation of their continuity of operations plan (COOP), involving an interstate partnership with the Florida Department of Health Bureau of Public Health Laboratories.
Interstate Collaboration in Action
Hugh Peeples, clinical application coordinator from the Tennessee Department of Health Division of Lab Services, advised the laboratory's response to the crisis. The COOP implementation involved several critical steps: samples were assigned accession numbers and demographic entry sections were scanned for follow-up; prepared samples were securely packaged and shipped to Florida for testing under emergency arrangements; and Florida conducted available tests according to their capabilities before returning samples to Tennessee for additional screening not available in Florida.
However, the collaborative arrangement revealed significant interoperability challenges. Florida's LIMS system could not properly process Tennessee's sample control numbers because they were not long enough to meet Florida's data format requirements. This incompatibility required manual workarounds and highlighted the need for standardized identifiers across state systems.
Results Management Without LIMS
The collaborative testing arrangement with Florida produced results that needed efficient management outside the normal LIMS system. Tennessee's informatics team developed a manual procedure involving four key steps:
- Hard copies of test results arrived from Florida with Tennessee's sample identifiers and Florida's testing data.
- Results were physically sorted by Tennessee accession number to create a retrievable filing system.
- Documents were scanned and filenames standardized to match accession numbers creating a searchable digital archive.
- A special mnemonic was created to alert providers that results came from Florida and required special attention.
Lessons Learned and System Improvements
Following the 2020 network disaster, Tennessee conducted a comprehensive review of their emergency procedures, leading to several technological and procedural improvements. The laboratory worked with the state information technology department to implement truly redundant internet solutions using different service providers and separate physical routes. They collaborated with their sample puncher vendor to develop capabilities for processing blood spot cards even when internet connectivity is unavailable. Tennessee also revised their sample numbering system to ensure interoperability with partner laboratories during emergency operations.
Building a Comprehensive Framework
The Wadsworth Center's own experience with a 2018 server interruption caused by a car accident demonstrated the importance of preparation. The incident resulted in an inability to perform testing due to lack of accessioning capabilities, leading the laboratory to create a standard operating procedure for COOP labels. They now print a set of COOP labels at the start of each year and review and update the procedure regularly.
Dr. Christina Egan, deputy director of the Division of Infectious Diseases and chief of biodefense and mycology laboratories at the Wadsworth Center, outlined best practices for information technology preparedness at the forum. These include having an individual within the laboratory with sole responsibility focused on information technology, documenting all IT systems and identifying critical assets, developing IT systems COOP plans, reviewing the COOP every few years to account for changes such as paperless test requests, keeping COOP and disaster planning on IT meeting agendas, and regularly exercising the plan.
A Holistic Approach to Resilience
The experiences of Tennessee and New York demonstrate that effective disaster recovery for public health laboratories requires planning beyond traditional IT concerns. Critical considerations include equipment inventory for minimum required emergency operations, alternate location planning with necessary utilities and security, multi-provider network redundancy through different ISPs with separate physical infrastructure, interstate collaboration agreements with multiple neighboring states using standardized protocols, and offline capabilities that can function without network connectivity or access to primary facilities.
Regular testing of emergency protocols through simulated disaster scenarios is essential to ensure staff familiarity with procedures and identify potential weaknesses before they impact actual operations during a crisis. As public health laboratories become increasingly dependent on digital infrastructure, comprehensive COOP planning has evolved from a recommended practice to an essential requirement for maintaining critical public health services when systems go silent.