CrowdStrike outage: Faulty update exposes gaps in quality control, cripples global systems

While CrowdStrike swiftly released information to fix affected systems, experts warned that full recovery would be time-consuming.

CrowdStrike

A routine update to CrowdStrike’s widely used cybersecurity software backfired spectacularly on Friday, triggering a global IT outage that crippled businesses, airlines, and government agencies. Security experts are now pointing to inadequate quality checks as a potential cause for the widespread disruption.

The faulty update to CrowdStrike’s Falcon Sensor software, intended to bolster security against emerging threats, contained flawed code that caused widespread system crashes on Windows-based computers. The outage’s impact was felt globally, with banks, airlines, hospitals, and even government offices experiencing significant disruptions.

While CrowdStrike swiftly released information to fix affected systems, experts warned that full recovery would be time-consuming, requiring manual removal of the flawed code.

“What it looks like is, potentially, the vetting or the sandboxing they do when they look at code, maybe somehow this file was not included in that or slipped through,” said Steve Cobb, chief security officer at Security Scorecard, whose own systems were impacted.

The problem became apparent soon after the update was rolled out, with users flooding social media with images of the dreaded “blue screen of death” (BSOD) accompanied by error messages.

Security researcher Patrick Wardle traced the outage to a file in the update containing configuration information or signatures used to detect malicious code. He speculated that the frequency of such updates might have led to insufficient testing: “It’s very common that security products update their signatures, like once a day… because they’re continually monitoring for new malware and because they want to make sure that their customers are protected from the latest threats… The frequency of updates ‘is probably the reason why (CrowdStrike) didn’t test it as much,’ he said.”

Experts criticised the lack of a phased rollout for the update. “Ideally, this would have been rolled out to a limited pool first,” said John Hammond, principal security researcher at Huntress Labs told Reuters. “That is a safer approach to avoid a big mess like this.”

This incident underscores the potential for catastrophic consequences when security updates, intended to protect systems, contain undetected flaws. It also highlights the need for robust quality control measures and cautious deployment strategies to prevent similar widespread outages in the future.

Srirang Srikantha, Founder & CEO, Yethi Consulting said, “The outages represent how fragile and interconnected our systems are. Companies like MSFT have great practices, and the fact that a bug passes through its process is unfortunate. It reiterates the need for good practices of testing before releasing new software to production systems.”

Sundareshwar K, Partner & Leader – Cybersecurity, PwC India commented, “This is a black swan event impacting not just businesses but the overall national machinery, and underscores how safeguarding entities against risk involves much more than technology… This development highlights how it is a misnomer that enhanced technology deployment alone will help organisations become more secure and ensure business continuity. While organisations work towards remediation of the current situation, the focus should be on rethinking risks and moving beyond the layers, patches, products and tools to building an inherently strong cyber architecture with complementary interventions that ensure resilience in the face of such unforeseen technology setbacks or failures.”

Athenian Tech said in a statement, “The recent CrowdStrike Falcon sensor incident highlights significant vulnerabilities and operational risks with automatic security updates, leading to widespread system failures, especially in enterprise environments. This underscores the need for rigorous testing and controlled deployment strategies of software updates. While CrowdStrike is addressing the issue, this incident emphasises the importance of balancing robust security with system stability and adopting best practices for software updates to prevent similar incidents in the future.”

Piyush Goel – Founder & CEO of Beyond Key said, “The complex interactions between CrowdStrike’s update and Microsoft’s infrastructure were likely unforeseen. CrowdStrike quickly identified the bug and rolled back the update, while CERT-In provided guidelines for users to delete the problematic file. This incident underscores the need for diverse and well-tested cybersecurity solutions to prevent similar large-scale outages in the future.”

The global impact of this outage is a testament to CrowdStrike’s widespread adoption, with its software used by over half of Fortune 500 companies and numerous government agencies, including the US Cybersecurity and Infrastructure Security Agency (CISA).

CrowdStrike’s Response

Here’s a breakdown of what happened, according to CrowdStrike’s official statements:

The Timeline:

July 19, 2024, 04:09 UTC: CrowdStrike released a sensor configuration update to Windows systems as part of its ongoing security operations.

July 19, 2024, 05:27 UTC: The faulty update was remediated.

The Impact:

Windows systems running Falcon Sensor version 7.11 and above that were online and downloaded the update between 04:09 UTC and 05:27 UTC were susceptible to a system crash.

Systems running Linux or macOS were not affected.

The Technical Details:

The issue stemmed from a flawed update to “Channel File 291,” a configuration file that dictates how Falcon Sensor evaluates named pipe execution on Windows systems.

The update aimed to target malicious named pipes used in cyberattacks but triggered a logic error, leading to operating system crashes.

CrowdStrike has since corrected the logic error and updated Channel File 291.

Exit mobile version