Home Tech Updates Cloudflare malfunction in 19 data centers worldwide due to its own fault

Cloudflare malfunction in 19 data centers worldwide due to its own fault

by Helen J. Wolf
0 comment

Content delivery network Cloudflare says the outage of 19 of its data centers yesterday resulted from a change in a long-term project to increase resilience in the busiest locations.

Many sites went down as a result, including Discord and Shopify. The company says that while these locations, including Mumbai, Osaka, Singapore, Sydney, and Tokyo, make up just 4% of its total network, the outage affected 50% of the real applications.

Cloudflare says the first outage started on June 21 at 06:27 UTC. At 06:58 UTC, the company brought the first data center back online, and by 07:42 UTC, all data centers were online and working properly.

In a statement on its blog, Cloudflare said, “We are very sorry for this outage. This was our mistake and not the result of an attack or malicious activity.”

Cloudflare malfunction in 19 data centers worldwide due to its own fault

The company has spent 18 months converting its busiest locations to a more flexible and resilient architecture. During this time, Cloudflare has converted 19 of its data centers to this architecture, internally called Multi-Colo PoP (MCP).

Cloudflare says this new architecture offers significant reliability improvements, allowing it to perform maintenance in those locations without disrupting customer traffic. But those locations also carry a substantial portion of Cloudflare traffic, so any problem there can have a broad impact.

Network networks like CInternete use a protocol called BGP to be reachable online. As part of this protocol, operators define policies determining which prefixes (a collection of adjacent IP addresses) are advertised to peers (the other networks they connect to) or accepted by peers.

Cloudflare says this policy has separate components, which are evaluated sequentially. As a result, certain prefixes may or may not be advertised. A policy change could mean that a previously announced prefix will no longer be reported, also known as “revoked,” and those IP addresses will no longer be reachable on the Internet.

The company sayInternethile changing its advertising policy for prefixes, a reordering of the terms caused a critical subset of prefixes to be revoked.

Because of this pullback, Cloudflare engineers experienced additional difficulties reaching the affected locations to undo the problematic change. However, the company says it has backup procedures for handling such an event and uses them to take control of the affected sites.

But David Warburton, leader of F5 Labs’ threat research, says this should be a reminder of the dangers of centralizing major cloud solution providers.

“In a traditional internet app deployment model, a server failure or misconfigured application can disable a single website,” he says.

“But similar issues with a cloud solution provider could lead to the shutdown of all of their customers, taking not one website offline, but hundreds or thousands. The impact could affect organizations’ digital experiences, revenues, and reputations.”

Warburton says that cloud solution providers bring immeasurable benefits to their users but that decentralization of the Internet through these clInternettions is creating the very problems the original design of the Internet was supposed to Internetrough redundancy.

ServerChoice Commercial Director Adam Bradshaw says outages could cause businesses serious reputational and financial damage.

“For a small or medium-sized business, the risk of such service disruptions can be critical. Even brief outages in the digital economy can cause damage,” he says.

“Diversifying an IT environment reduces the likelihood that third-party outages will negatively impact a business. Owning the hardware gives an organization greater control, while IT components such as colocation services provide an organization with a backup when cloud services fail.” .”

You may also like