The questions health system CIOs have after the global IT outage

Giles Bruce - Wednesday, July 24th, 2024

Jeffrey Ferranti, MD, chief digital officer of Duke University Health System, was awakened in the early morning hours of July 19 with a text that the computers at his hospitals were down. His first thought: ransomware attack.

He wasn't the only health system IT executive to get a similar call or text around that time. Or have the same initial thought.

Tens of thousands of workstations at the Durham, N.C.-based health system had blank blue screens. Not knowing the cause, Duke activated its hospital incident command system (the first time it had done so for a cyber incident). By later that day, over 100 IT staffers had been trained to fix the computers.

"Health systems tend to perform well in crisis situations," Dr. Ferranti told Becker's. "And it brought out the best in our people, and I was pretty impressed by how many of them responded to the event."

Duke was fortunate. Its patient care wasn't affected. Some other health systems had to reschedule appointments and surgeries, divert ambulances and close outpatient clinics. Some healthcare providers were unable to access Epic or other EHRs.

The IT outage hit not only healthcare but industries across the globe, canceling flights and leaving customers unable to access their bank accounts online. The incident was caused by a faulty update from CrowdStrike, one of the biggest cybersecurity vendors in healthcare, sent to computers running on Microsoft Windows.

"It was an all-hands-on-deck effort between information services and our operational partners to remediate the disruption," Michael Restuccia, CIO of Philadelphia-based Penn Medicine, said July 22. "Quite the last 72 hours."

CrowdStrike sent out a fix the morning of July 19, and most hospitals and health systems were able to restore patient care by July 22. But the outage illustrates how healthcare must continually improve its cybersecurity posture at a time when the industry is increasingly interconnected with outside technology companies, health system leaders told Becker's. The event comes just five months after a ransomware attack on UnitedHealth Group subsidiary Change Healthcare massively disrupted claims and payment processing for providers.

"The incident underscored the significant reliance on third-party vendors for critical infrastructure," said Zafar Chaudry, MD, chief digital and information officer of Seattle Children's. "A single point of failure can have catastrophic consequences."

Dr. Chaudry said future interruptions can be prevented by reducing reliance on a single vendor, better evaluating third-party companies' security practices, regularly testing comprehensive incident response plans, and implementing redundant systems and data backups.

After discovering the outage the night of July 18, Seattle Children's activated its incident response protocol. The health system was able to access its Epic EHR, which is hosted by cloud provider Rackspace, via Epic's Rover and Haiku apps on iPads. But the organization's active directory service and virtual desktop infrastructure network were down for 12 hours. All outpatient appointments and nonemergency surgeries were canceled July 19. Seattle Children's restored its systems by July 22.

After the outage was discovered overnight on a Thursday into Friday, health systems around the country similarly worked over the weekend to get operations back up and running by the next Monday morning.

"Through a massive and nonstop all-hands-on-deck deployment of resources, hospitals and health systems rose to the occasion and have made tremendous progress in restoring mission-critical systems and patient care services," said John Riggi, national advisor for cybersecurity and risk at the American Hospital Association. "Some effects, although diminished, still continue — and the true impact to hospitals and health systems may not be known for weeks."

In Arizona, Banner Health Chief Clinical Officer Marjorie Bessel, MD, received a call late the night of July 18 local time about the outage. Dr. Bessel is the Phoenix-based health system's incident commander for emergency response, a role previously utilized during the COVID-19 pandemic. The organization does tabletop drills for these types of cyber incidents. Banner had systemwide calls to update the situation every hour for the first eight hours, every two after that.

"This is not just your electronic medical record being down," Dr. Bessel said. "The tube stations that we use to send drugs up from the pharmacy and send specimens to the lab — down. Security alarm systems — down. All pharmacy systems — down. It's not the same as Epic or Cerner going down. It's way larger than that."

Banner Health decided not to open its ambulatory locations July 19. The health system also delayed surgeries July 19 but was able to complete them later in the day, about three or four hours past schedule, after getting systems up and running about 16 hours after the outage was discovered.

"You've got to be willing to make decisions that might turn out not to have been the best decision, retrospectively, but you make the best decision that you can, and you get everybody behind you," Dr. Bessel said. "I just want to say kudos to everybody in the healthcare industry, not just our team members at Banner Health, who, on the tails of very difficult pandemic a couple years ago, once again rose to the occasion, worked all night, and did everything they could to keep patients safe as well as provide the ability for them to get ongoing care."

At Duke, IT staffers had to remedy the computers one at a time. It took about five to eight minutes to manually decrypt each machine, delete the CrowdStrike file and reboot. Operations leaders went around and put yellow sticky notes on the computers to prioritize. The goal was to get 50% of the workstations up and running as soon as possible, allowing patient care to be unaffected. Clinicians went back to paper records in the early going.

Of the health system's over 60,000 computers that use CrowdStrike, 40,000 received the bad update. Of those, 18,000 had their screens go blue. The other 22,000 hadn't yet been restarted (the health system notified staff not to reboot their computers if they hadn't already). Less than 1,000 nonclinical machines remained unfixed as of July 24 (laptops belonging to people who have been on vacation, for instance).

"The question every CIO in the country has is, 'Why was this pushed out so broadly'?" Dr. Ferranti said. "Why wasn't it pushed out to a smaller group to make sure everything's OK and do a rolling update? That's just blocking and tackling."

Becker's reached out to CrowdStrike for comment. The company's CEO has been asked to testify before Congress.

"Those are the kinds of things that are going to have to be commonplace in the industry," Dr. Ferranti added. "The stakes are just too high to be pushing things out at that scale. We saw over the last four or five days what can happen from that."

The questions health system CIOs have after the global IT outage

Articles We Think You'll Like

Featured Whitepapers

Featured Webinars