AWS Outage: US-East-1 Region, a Known but Persistent Weak Point

Perplexity, Signal, Snapchat, Uber, and the British tax authority’s website… A number of digital services experienced outages on October 19 due to an incident at AWS.

No official tally has been released yet, but the latest update on the AWS status page provides a clear sense of the sequence of events.

It was around 9:00 a.m. in France when the problems began. Errors and latency increased in the us-east-1 region, the result of a DNS resolution issue at the DynamoDB endpoints.

Read also: AWS Outage: 10 incidents where the us-east-1 region was the epicenter

By about 11:30 a.m., operations were restored. But issues persisted in components that rely on DynamoDB. For example, the subsystem that launches EC2 instances.
AWS also noted load-balancing problems that impacted services such as CloudWatch and Lambda. They were finally cleared by around 6:30 p.m. after temporarily capping operations such as processing SQS queues.

By midnight, everything was back to normal, according to the American group. However, there remained a backlog to clear on services such as AWS Config, AWS Connect, and Amazon Redshift.

Multiple levels of dependency on the us-east-1 region

AWS acknowledges—without dwelling on it—that the incident affected services not located in the us-east-1 region but that depend on it. It specifically mentions IAM.

Read also: From Glacier to CodeCatalyst, AWS shelves a number of services

This latter category is part of the “global services unique per partition.” Global, because their control planes and data planes do not exist independently in each region. Their resources are “global,” at least within the AWS partition (here, the public/commercial cloud, as opposed to those dedicated to China or the U.S. government).

For most of these services, the data plane is distributed, while the control plane is hosted in a single AWS region. It sits in us-east-1 for IAM, as well as for AWS Organizations and Account Management (cloud account management), as well as the private DNS Route 53. Its unavailability can thereby jeopardize CRUDL operations (create, read, update, delete, list) on a global scale.

The edge global services themselves also have a single-region control plane (their data plane is distributed across points of presence, and potentially also across regions). This category includes, among others, the public DNS Route 53, AWS Shield Advanced (anti-DDoS) and the CloudFront CDN (as well as its WAF and its access-control manager).

Read also: AWS Free Tier shifts to a credit-based system

There are also regional or zonal services that depend on other regions. On Amazon S3, for example, various operations (tagging, replication, monitoring, ACLs…) pass through the us-east-1 region. In this same region lies the control plane for Route 53, invoked to create DNS records when provisioning resources across a range of services: PrivateLink (VPC endpoints), API Gateway (REST and HTTP APIs), ELB (load balancers), OpenSearch Service (domains), ECS (service discovery), etc.

The region of novelty… and tutorials

Located in Virginia, on the Ashburn campus, the us-east-1 region houses, at the latest count, 159 data centers. It is especially foundational for the AWS cloud since it was the base: that’s where it all began in 2006 (the launch of S3).

Alongside its longevity, us-east-1 handles more workloads than any other AWS region. This, in turn, makes it, in general, a potential single point of failure.

Its appeal can be explained by historically lower pricing, although the gap with other regions has narrowed over time. It also remains, today as well, often the first to host new AWS services. Many tutorials and articles highlight it, and it is the default value in many tools.

Network fees can also deter those seeking to implement multi-region redundancy. Even though AWS has partly evolved its policy, particularly to populate its us-east-2 region (price aligned with inter-region traffic).

The centralization of control planes can be justified when immediate coherence is necessary, such as for authentication or billing. A decentralization would more broadly carry a non-negligible cost for AWS, while potentially compromising availability commitments.