close
close

Association-anemone

Bite-sized brilliance in every update

Even small IT failures can cost millions, technology leaders say
asane

Even small IT failures can cost millions, technology leaders say

This sound is generated automatically. Please let us know if you have any feedback.

Diving:

  • The average annual downtime from high-impact IT outages is 77 hours, with an hourly cost of up to $1.9 million, according to the data. a report from Tuesday published by New Relic. The observability company commissioned ERT to survey 1,700 tech professionals in April and May.
  • IT teams spend an average of 30 percent of their time resolving outages, respondents said—the equivalent of 12 hours per 40-hour work week. The main causes of unplanned outages reported over the past two years included network failure, issues with third-party services and human error.
  • Major disruptions such as the global event triggered by a flawed Windows CrowdStrike systems update in July can halt operations, according to Nic Benders, chief technical strategist at New Relic. But minor problems can also snowball. “It doesn’t have to be CrowdStrike for this to be a three-alarm fire,” he told CIO Dive. “You can eliminate the business function of IT with a relatively small technical problem.”

Diving Perspective:

All it took was an automated software update sent shortly after midnight EST on July 19 to wipe out millions of Windows-based computers around the globe. The CrowdStrike update was live for a little over an hour but the impacts were felt for days, ca several major airlines they scrambled to restart workstations and restore operations, grounding thousands of flights.

“The CrowdStrike incident is in a class of its own because it disproportionately affected some of the world’s largest companies — it was a poison pill that those companies had to fix themselves,” Benders said.

As executives reviewed the losses, which rose to 5.4 billion dollars among Fortune 500 companies and cost Delta Air Lines $500 million in just five days, IT resiliency and recovery planning took center stage.

“When something like a cloud provider outage occurs, it’s rare that the problem is initially clear,” Benders said. “Your alarms are going off, your support tickets are going off, and you’re in chaos, but the first step is just trying to characterize the nature of the problem.”

While major supplier outages and cyber events tend to steal the headlines, death by a thousand cuts scenarios involving smaller outages are much more common. The average number of annual outages among respondents was 232, with more than half of companies experiencing low-impact outages on a weekly basis.

Costs can be difficult to assess, especially for low-impact issues. But it adds up to the minutes or hours it takes engineering teams to identify and defuse even minor IT outages. Over the course of a year, teams spend approximately 134 hours – the equivalent of nearly six full days – fixing IT outages at all levels of business impact.

“It all comes down to dollars,” Benders said. “I would take 1,000 incidents a week if it had zero costs. It’s not an incident at all.”