KubeCon 2026: From Istio to Dapr — When an Entire Ecosystem Speaks

Inference requests can consume far more resources than traditional API calls. Their routing is therefore all the more critical and must rely on tailored strategies (buffers, streaming, timeouts…).

Starting from this premise, the Istio project introduced, in the summer of 2025, a dedicated extension for the Kubernetes Gateway API. It brings CRDs InferenceModel and InferencePool. The former allows defining logical endpoints. The latter acts as a specialized back-end service that understands the characteristics of AI workloads. As for incoming requests, they follow the Gateway API’s HTTPRoute rules, but with distinct load-balancing algorithms. Notably, considerations include GPU memory usage, queue evolution, and adapter affinity (for LoRA models, requests are directed to servers where the correct adapter is already loaded).

Istio opens up to Agent Gateway

This extension has recently moved into beta (mid-February, with the release of Istio 1.29).

Read also: { Expert Column } – Securing GenAI starts with a clear inventory and real visibility on its components

The ambient multicluster mode reached the same milestone at the same time, following telemetry work.

In a local cluster (or across clusters on the same network), xDS peer discovery makes it possible to know all endpoints. At scale across multiple networks, the mechanism is less practical: information can be lost given the amount to be replicated.

To enable the exchange of peer metadata between endpoints and gateways located on different networks, the HBONE protocol has been enriched with specific headers. Istio associates the concept of namespace sameness: in a multicluster mesh, all namespaces bearing the same name are treated as the same namespace. Each cluster has its east–west gateway with an IP that serves as the entry point for the zero-trust tunnels deployed on every node. To secure the traffic, ambient multicluster mode nests two HBONE connections. One encrypts the traffic from the ztunnel to the gateway and allows the two to verify each other’s identities. The other end-to-end encryption also enables the source and destination tunnels to verify their identities.

Multicluster on a single network remains in alpha. A newer—and also experimental—support is Agent Gateway as a data-plane component. This Solo.io-made proxy feeds on A2A and MCP protocols to deliver more flexible traffic management in the AI workload context.

The Inference extension, taken into account for the “AI Certification” of Kubernetes clusters…

Managing an implementation of the inference extension for the Gateway API is expected to become a criterion within the CNCF AI Compliance program soon.

Launched in fall 2025, this framework defines a set of capabilities, APIs and configurations that a cluster certified as “Kubernetes compliant” must offer to run AI/ML workloads reliably and efficiently. The primary objective: prevent fragmentation that would undermine portability.

Each Kubernetes version has its own list of requirements. For the latest, 1.35, the mandatory baseline includes:

Support for dynamic resource allocation (DRA)
Manage the Gateway API within an implementation enabling advanced inference service management (weighted traffic distribution, routing based on headers, integration with service meshes…)
Enable installation and operation of at least one gang-scheduling solution
Proper operation of the Horizontal Pod Autoscaler—when present—for pods leveraging accelerators
Expose metrics on accelerators and the AI/ML workloads supported
Isolate accelerator access from containers
Support at least one AI/ML operator that provides a CRD (Ray, Kubeflow…)
In the presence of an autoscaler or equivalent mechanism, allow resizing node groups containing specific accelerators based on the pods requesting those accelerators

… like disaggregated architectures

Only the last point is not mandatory with earlier Kubernetes versions (it was recommended with 1.34). With 1.35, the spec has been enriched with three additional recommendations:

A verifiable mechanism to ensure the installation of drivers and proper configurations on nodes equipped with accelerators
The ability to implement at least one static resource-sharing strategy with accelerators that manage them
Capability to expose virtualized accelerators

A few weeks ago, several recommendations were integrated in view of Kubernetes 1.36, expected on April 22. Managing an implementation of the Inference extension for the Gateway API is thus part of this. It also covers the use of DRA to attach pods to multiple network interfaces and the support for disaggregated inference architectures (vLLM with prefill and decode instances, llm-d, Dynamo…).

Read also: Towards AI certification for Kubernetes clusters

As of now, distro vendors self-certify. A suite of automated tests is expected to take over this responsibility this year.

Dapr Agents in GA

Another project in the CNCF ecosystem that has just crossed a milestone: Dapr Agents. This Python framework for building agent-centric applications on top of the distributed Dapr runtime has reached v1.

With this, the notion of an “agent” becomes obsolete, replaced by “durable agents.” A terminological shift intended to illustrate the benefits of the runtime in terms of persistence—storing in Python lists, vector databases or Dapr-backed stores; automatic retries; deterministic orchestration or event-driven…

The v1 release brings the ability to invoke agents as tools. It also provides the choice between sequential and parallel execution of tools. And adds Redis as a basic vector store option.

Kyverno’s CNCF Maturity: Platform engineering may not always equate to adoption

Kyverno is not a AI-centric project (although AI/MCP gateway management is on its roadmap), but it has just been elevated within the CNCF to the highest level of maturity.

This policy-as-code engine owes its prominence to the American company Nirmata, which has built governance solutions on top of it.

Under the CNCF umbrella since 2020, Kyverno is today used by Adidas, Bloomberg, Coinbase, Deutsche Telekom, LinkedIn, Spotify, Vodafone and Yahoo, among others. It began as a Kubernetes admission controller and is now available as a CLI, an SDK and a container. The latest release marks full adoption of CEL (Common Expression Language) in addition to YAML.

Platform engineering: CNCF Maturity Does Not Always Translate into Adoption

Kyverno appears in the CNCF’s latest quarterly radar for platform engineering tools. Another PaC engine appears there as well: OPA (Open Policy Agent). Although it has reached the highest level of maturity within the foundation, it still only appears at the first adoption stage among developers—at least among the roughly 400 survey respondents.

Read also: Can ChatGPT Secure Kubernetes?

The survey covered three domains: workflow automation, application delivery, and security/compliance management. It used four dimensions:

Tool usage (which developers use or have used)
Utility (judgment of fit to project needs)
Maturity (stability and reliability assessments)
Propensity to recommend

Four categories emerged: “adopt” (reliable tools applicable to most use cases), “trial” (worth exploring), and “assess” (to be evaluated with caution). The table below summarizes the situation. “S” means a tool is in CNCF’s sandbox (first maturity level). “I” indicates incubation, and “G” indicates graduated.

	Assess	Trial	Adopt
Automation of workflows	Crossplane (I) Flux (G) Knative (G) SpinKube (I) Spinnaker (I) wasmCloud (I)	Jenkins X (I) Karmada (I) Tekton (I) werf (S)	Argo CD (G) Armada (S) Buildpacks (I) Jenkins (G)
Application Delivery	Crossplane (I) kcp (S) Kusionstack (S) Microcks (S)	Buildpacks (I) Dapr (G) KubeVela (I) Operator Framework (I) Score (S)	Backstage (I) Helm (G) kro (G)
Security/Compliance	Falco (G) in-toto (G) KubeArmor (S) KubeWarden (S) Kyverno (I) Sigstore (G)	Bank-Vaults (S) Bpfman (S) Capsule (S) Cloud Custodian (I) External-Secrets Operator (S) KubeScape (I) Notary (I) OpenCost (I) SPIFFE/SPIRE (G)	cert-manager (G) Keycloak (I) OPA (G)

Four platform approaches, four AI approaches

The survey also provides a snapshot of how organizations handle AI workflows depending on how they assemble their developer platforms.

Most commonly (41% of respondents), several teams—DevOps, SRE, and infrastructure—contribute to delivering capabilities. In 10% of cases, a platform team develops internally; in 18%, it predominantly integrates third-party tools. It also happens that they rely largely on market solutions (6%) or that each team selects and manages its tooling (16%).

For handling AI workflows, some extended their existing platform (17%). Others built a dedicated one (19%), use market platforms (18%), or adopt a hybrid approach (separate experiments, shared production; 35%). Direct team-led management of workflows without platform support is less common (5%).

Platform Approach ↓	AI Approach →	Extension	Dedicated AI Platform	Hybrid	Market Solution
Internal Development		28 %	11 %	25 %	17 %
Integration of third-party tools		18 %	26 %	26 %	26 %
Collaborative operation		19 %	16 %	48 %	10 %
Team-driven tool choices		8 %	25 %	26 %	25 %