No more shards or federated queries: you can consolidate your vector data into a single index.
AWS makes this one of the key talking points for S3 Vectors, which reached general availability at re:Invent.
With S3 Vectors, the promise of a single index
The service had been in preview since July. It brings native vector management to S3, using a dedicated bucket type. On paper, it’s a cheaper alternative to Aurora Serverless and OpenSearch Serverless, with longer response times (“under a second,” AWS claims).
The preview allowed storing up to 50 million vectors per index (and 10,000 indexes per bucket). With the commercial version, the capability jumps to 2 billion, hence the consolidation argument. Another threshold raised: the maximum number of results per query (now 100, up from 30 in preview). As for latency, it’s now “frequently under 100 ms.”
S3 Vectors integrates with Bedrock Knowledge Bases (RAG) and with Amazon OpenSearch (used as the engine on managed clusters or for injecting a snapshot in the serverless version).
GPU-accelerated OpenSearch
In parallel, a GPU acceleration option arrives on AWS OpenSearch. The promise: build vector databases “up to 10 times faster” for a quarter of the traditional price, thanks to optimized infra usage. Additionally, it becomes possible to tune recall and latency levels to the desired settings.
A episodic memory for Bedrock agents
On the occasion of re:Invent, there is also news in Bedrock AgentCore. This offering, launched in summer 2025, builds on Bedrock Agents. It extended their capabilities (native MCP management and finer memory control, for example) and disaggregated most of them into independent modules, also “detached” from Bedrock so they can support technologies not available on the platform.
Thus Bedrock AgentCore is endowed with a form of episodic memory. With this strategy, agents capture “structuring episodes” (context, reasoning process, actions, results). They are expected to act more coherently when they encounter similar situations.
AWS also equips AgentCore with bidirectional audio diffusion. During voice interactions, the agent can be interrupted and adapt to the new context without having to finish its action first.
A managed supervision service is also added, but currently in preview. It can incorporate custom evaluations in addition to those delivered to analyze indicators such as accuracy, usefulness, conciseness, and safety. Results are delivered in CloudWatch.
Another preview: the Policy in AgentCore feature. It allows intercepting tool calls on the gateway and applying strategies defined in natural language or with Cedar.
The latest Mistral and Gemma models added to Bedrock
AWS also used re:Invent to highlight the most recent open models added to Bedrock. Among them:
- Mistral Large 3, Ministral 3 (3B, 8B, 14B), Magistral Small 1.2, Voxtral Mini 1.0, Voxtral Small 1.0
- Gemma 3 (4B, 12B, 27B)
- Kimi K2 Thinking (from Moonshot AI)
- MiniMax M2 (from MiniMax AI)
- Nemotron Nano 2 9B and a 12B vision version (from NVIDIA)
- GPT-OSS-safeguard 20B and 120B (content-moderation models)
- Qwen3-Next-80B-A3B and Qwen3-VL-235B-A22B
Nova Sonic: a second generation with broader language support
AWS is expanding its Nova model family as well, most notably with Nova 2 Sonic.
The first-generation model for speech recognition and synthesis was launched in April. The second generation handles alphanumeric input more effectively, short utterances, accents, background noise, and telephony-grade audio (8 kHz). It also brings “polyglot voices” (the ability to switch languages mid-conversation), asynchronous tool calls, and a sensitivity control for voice detection, giving users more or less time to finish their sentence.
AWS launches Nova into the web automation space
Under the Nova Forge brand, AWS continues training its own models from various checkpoints, using off-the-shelf specialized datasets or importing them. The suite relies on SageMaker AI tooling and can optionally perform reinforcement learning.
There is also a model from Amazon (Nova 2 Lite) at the base of Nova Act, a service for agent automation for web browsers. It is integrated with the Strands Agents orchestration framework.
Synthetic data through a privacy lens
The MLflow tracking servers that can be grafted onto SageMaker since last year to oversee ML experiments now offer a serverless option. With the ability to share instances across AWS accounts and domains.
The Clean Rooms service now enables creating synthetic datasets (tabular, intended to train regression and classification models; not LLMs). The system uses a model that reproduces the statistical patterns of the original dataset while removing identifying data. In this sense, it is presented as an alternative to anonymization techniques.
AI Factories: AWS also embraces the concept
AWS is adopting the AI Factories concept with a new offering by that name. Not much is known yet, other than that it should enable deploying managed AI clusters (Trainium and NVIDIA GPUs + AWS services) in customers’ data centers, “like a private AWS region.” The first reference client: the Saudi company HUMAIN, which will set up an on-site AI zone with up to 150,000 GPUs.
Durable Lambda functions
Durable Lambda functions aren’t limited to AI workloads, but they could make their execution easier.
By durable, we mean “with a lifetime that can reach up to one year.” They can be paused until certain conditions are met (typically external events). Only the active compute time is billed.
An SDK integrates with the function code to implement these pauses, as well as “steps” to avoid restarting execution from the beginning in case of failure.