Digital Engineering 24/7

Helping design and engineering professionals discover, evaluate and specify technologies and processes that shorten the design cycle and enable success.

Lessons from the CrowdStrike Meltdown

Industry leaders reflect on system vulnerability and the need for redundancy

Lessons from the CrowdStrike Meltdown
A faulty update to Crowdstrike Falcon, an enterprise security software, was the source of a worldside IT meltdown that sent many Windows PCs crashing.

Latest Digital Thread News

Latest Digital Thread Resources

  • Design & Simulation Software Guide 2025

    In this Special Issue, Digital Engineering presents its second annual guide to design and simulation software vendors.

  • Design & Simulation Software Guide

    In this Special Issue, Digital Engineering presents its inaugural guide to design and simulation software vendors, including listings for CAD, CAM, simulation, generative design, PLM, rendering and visualization, design for additive manufacturing,…

  • More Resources

By Kenneth Wong  

July 29, 2024

Wikipedia now has a dedicated page explaining the CrowdStrike IT outage that sent many Windows PCs crashing on July 19, 2024. According to CrowdStrike, "As part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques. These updates are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash." 

The irony is, CrowdStrike is what many enterprises rely on to safeguard their data and systems. Instead, it became the source of a worldwide IT meltdown that cost Fortune 500 companies roughly $5.4B, estimated The Guardian. The event is a wakeup call for system resilience, many agreed, but in what form? 

Design and Test Your Systems like Airplanes

Todd Tuthill, Vice President of Aerospace and Defense, Siemens Digital Industries Software, says the meltdown echoes the disruption the world has just experienced four years ago during the COVID shutdown. Somehow, certain industries still seemed unprepared. "Disaster planning needs to be a priority and should be a part of any high-level planning for worst-case scenarios," he says. "By and large, the airlines affected by the CrowdStrike IT incident recovered quickly, which inclines me to think that the answer may be to design systems to operate with limited data access for brief (or not so brief) windows while critical infrastructure is repaired."

Most airlines recovered in a couple of days, but Delta Airline's recovery stretched into weeks, prompting the Department of Transportation to launch an investigation

Capt. Sully Sullenger, known for successfully landing of the US Airways Flight 1549 in the Hudson River after a bird strike on January 15, 2009, writes in a LinkedIn blog post, "Our systems should be designed more like airplanes, by avoiding single points of failure, having multiple ways of keeping critical functions operational, and with humans in the loop and in control of it all."

Tuthill says, "The strenuous testing an aircraft needs to pass before being cleared for flight is a great example for manufacturing firms before signing off on any system that impacts daily lives to keep critical functions operational."

Third-Party Enterprise Security Software in the Crosshair

During the COVID shutdown in 2020, many Asian suppliers went off line first, forcing manufacturers to scramble for alternative sources. Many of them turned to on-demand manufacturing service providers like Xometry and Fictiv to make up the capacity and meet their regular demands.

Looking at the CrowdStrike crisis, Matt Leibel, on-demand manufacturing service provider Xometry's Chief Technology Officer, says, “In a world that is growing ever more connected in the cloud and in the integration and use of 3rd party platforms, the events underscore the need for all organizations, especially manufacturers, to have a comprehensive continuity plan in place. Manufacturers need to test their continuity plan regularly and update it as necessary. They also need to have redundancies in place to communicate in real-time with key stakeholders, from employees to partners and especially customers.”

Jim Ruga, CTO of Fictiv, says, "Always maintain a staging/testing layer between third-party software and production systems. Never assume that updates from a vendor are flawless. In the case of Crowdstrike, if the update had been tested in a staging environment, the IT departments would have detected the issue before it reached production systems. This incident has raised questions about the reliability of third-party updates and the responsibility of IT departments in managing such updates."

The Cloud Paradox

In many instances, a faulty update can be reverted or its impact minimized with a subsequent corrective update, often released and installed via the cloud. In other words, the cloud could be both a source of threat as well as a method of recovery. In the case of the CrowdStrike bug, the update disabled the physical machine itself, leaving many Windows users unable to reboot or get to the cloud to absorb the fix. 

Ruga says, "The solution for most Windows systems was straightforward--boot into safe mode, remove one file, restart, and done. However, the fact that IT departments had to manually log into systems one at a time significantly delayed the recovery process, highlighting the urgent need for a quicker response to such system failures."

Tuthill says, "The CrowdStrike issue showed a fundamental weakness in many companies' strategy of moving everything to the cloud, and exposed a liability in a part of modern society--our ability to move data securely and quickly."

John McEleney, Cofounder of the cloud-based CAD firm Onshape (now part of PTC), says the CrowdStrike issue didn't affect Onshape users. He says, "Cloud-native has the advantage over older installed software. According to our head of development, our use of cloud (virtual infrastructure) allows us to access and replace our servers at any point in time from any location. Companies spent many days just getting access to the machines they needed to restart."

 

More about Siemens Digital Industries Software

Simcenter™ software, from Siemens Digital Industries Software, uniquely combines system simulation, 3D CAE and test to help you predict performance across all critical attributes earlier and throughout the entire product lifecycle. By combining…

Making the Case for Artificial Intelligence and Machine Learning in Engineering

In this paper, Siemens Digital Industries Software and Digital Engineering will provide an overview of how AI/ML-enabled software can improve design workflows, while also addressing common questions and concerns around implementing this technology in an engineering organization.

Latest in Siemens Digital Industries Software

Latest in Onshape

About Kenneth Wong

Kenneth Wong

Kenneth Wong is Digital Engineering's resident blogger and senior editor. Email him at [email protected] or share your thoughts or suggestions at digitaleng.news/facebook.

Follow DE
on Facebook
on Linkedin

Related Topics

Digital Thread   PLM   Internet of Things   News   Fictiv   Onshape   Siemens Digital Industries Software   Xometry   All topics
 

Subscribe

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.

Subscribe today

 
 

From our Sponsors

Meltio Takes Metal Additive to the Next Level
Meltio's DED technology enables industries to tailor and customize their solutions to create & repair metal parts.
Easing the Transition from ETO to CTO with Configuration Lifecycle Management
Manufacturers are discovering that the Configure-to-Order (CTO) model provides significant benefits when it comes to customization.
Siemens + Altair = The Next Chapter in Design and Simulation
With its acquisition of Altair, Siemens creates a unified simulation portfolio combining generative design with high-performance computing and AI workflows.