Cloudflare Unveils a Resilience Plan: Core Pillars

Instant deployment is convenient, but rarely essential.

Cloudflare explains that it is applying the same control process to configuration changes as it does to code.

During software updates, every binary version goes through several validation steps. Any team owning a service must define a deployment plan, success/failure indicators, and the actions to take in case of problems. An automated system executes this plan and, if necessary, triggers a rollback, potentially alerting the team.

Read also: Cloudflare outage: what happened in the anti-bot system

This mechanism will also be applied to configuration changes by the end of March across all production environments, they promise.

Set against this backdrop were the two major outages that occurred on November 18 and December 5, 2025. Both were triggered by a configuration change (in the bot classifier for the first; in a security tool for the second).

Isolating Failures

Cloudflare has another commitment by the end of March: to review the interface contracts between each product and critical service, in order to better anticipate and isolate failures.

The November incident is a case in point. Two key interfaces could have been managed differently, according to Cloudflare. On the one hand, the one that read the configuration file (there should have been a validated set of default values allowing traffic to continue flowing). On the other, the interface between the central software and the bot-management module (in case of failure of the latter, traffic should not have been blocked by default).

Eliminating — or Circumventing — Circular Dependencies

Cloudflare also intends to remove circular dependencies, or at least enable quick circumvention of them in an incident. Example: during the November incident, the unavailability of Turnstile (an alternative to CAPTCHA) prevented customers from accessing the dashboard unless they had an active session or an API token.

At the same time, there is talk of updating internal procedures of the break glass type (temporary privilege elevations) to gain access to the right tools as quickly as possible.

A “Code Orange” for the Second Time

To implement this resilience plan, Cloudflare declared a “Code Orange.” This procedure directs most technical resources toward incident resolution. It has been activated once before. It was at the end of 2023, after a power outage in one of Cloudflare’s major datacenters (PDX01, in Oregon), which hosts the control plane for many services. The trigger: maintenance work carried out by the electric utility operator that resulted in a grounding fault in the installation.