Microsoft Azure Hit by CDN Outage Again

Software defect in the control plane, then inadvertent deletion of a configuration value: these were, on October 9, the sources of a double incident affecting the Azure Front Door CDN.

Access to a large number of service management portals was disrupted for more than half a day. Europe and Africa were the most affected regions.

An Invalid Configuration State That Slipped Past Protection Systems

A new Azure Front Door incident occurred on October 29. This time, the impact was not limited to management portals. Microsoft lists about a dozen services affected. Among them are Azure SQL Database, Azure Virtual Desktop, Copilot for Security, Purview, Sentinel and Entra ID (on certain components including IAM and the user management interface).

The trigger was a change to an Azure Front Door tenant configuration. It introduced an invalid state that prevented loading of a large number of nodes, resulting in increased latency and even connection errors on downstream services.

Read also: IaaS, inference, office software… Microsoft makes its cloud a bit more “local”

Microsoft then blocked any configuration changes to prevent the faulty state from propagating. It subsequently launched a rollback to the latest “good version” of the configuration. To avoid overloading the system, traffic had to be rebalanced progressively. It took nearly 10 hours from the start of the incident to the official resolution (1:00 a.m. in France on October 30, though client configuration changes remained blocked a little longer).

This invalid state slipped through protection mechanisms due to a software defect, we are told.

A Faulty Version of the Azure Front Door Control Plane

The “software defect” motive was also cited in the wake of the October 9 incident. More precisely on the first phase, which lasted around eight hours.

The problem lay in the Azure Front Door control plane, at the level of the information communicated to the data plane as part of client-initiated operations to create and modify CDN profiles.

The problematic version of the control plane had been deployed six weeks earlier. A specific sequence of profile update operations generated erroneous metadata that crashed the data plane.

Automated protections detected the issue early enough to prevent it from propagating beyond the data plane. Moreover, with the old control plane version still running, it was possible to redirect all requests there.

Read also: The Nextcloud coalition drops its lawsuit against Microsoft

In this context, on October 9, a configuration cleanup process for the metadata containing the erroneous entries was launched. Because the automated protection system blocked updates to the affected profiles, a temporary workaround was put in place. It, however, opened the door to the propagation of the problematic metadata and the disruption of Azure Front Door—essentially in Africa and Europe.

The ensuing load redistribution, coupled with higher traffic as business hours began, drove resource usage to the point that critical thresholds were exceeded. An additional protection layer was then deployed to distribute even more traffic. Manual interventions were required when the automated process took too long.

A Configuration Value Deleted Because Unknown to an API

By early afternoon, Azure Front Door’s availability was fully restored. In the evening, after normalization was validated, Microsoft began routing all traffic back through its CDN.

That’s when a second problem emerged. A script intended to update the load-balancing configuration removed a configuration value. The cause: the API it used did not know about that value.

Integrity checks for the Azure Front Door endpoint began to fail. As network filters were updated, the issue propagated. It took about four hours to resolve. In the meantime, roughly half of the customers who had used the Azure service management portals experienced some form of impact.

Dawn Liphardt

Dawn Liphardt

I'm Dawn Liphardt, the founder and lead writer of this publication. With a background in philosophy and a deep interest in the social impact of technology, I started this platform to explore how innovation shapes — and sometimes disrupts — the world we live in. My work focuses on critical, human-centered storytelling at the frontier of artificial intelligence and emerging tech.