The Infrastructure Question

If you’re deploying AI agents to production, you face a fundamental infrastructure question: Where should they run?

The options seem endless:

  • Serverless functions (Lambda, Cloud Functions)
  • Managed container services (ECS, Cloud Run)
  • Custom VMs
  • Kubernetes
  • Vendor-specific platforms (LangGraph Cloud, etc.)

Each option has tradeoffs. But there’s a growing consensus in the infrastructure community that Kubernetes is the right foundation for AI agent workloads — and the CNCF is making significant investments to prove it.

Why Not Serverless?

Serverless seems appealing at first. No infrastructure to manage. Pay-per-execution pricing. Automatic scaling.

But AI agents have characteristics that make serverless a poor fit:

Long-Running Connections: AI conversations are stateful and potentially long-lived. WebSocket connections need to stay open for streaming responses. Serverless functions are designed for short request-response cycles, not persistent connections.

Cold Start Latency: Loading models, establishing provider connections, and warming up caches takes time. Serverless cold starts hurt user experience for conversational AI.

Memory Requirements: AI agents often need significant memory for conversation history, embeddings, and tool state. Serverless memory limits can be restrictive.

Cost at Scale: Serverless pricing is attractive at low volume but can become expensive as usage grows. Per-request pricing adds up for chatty conversational workloads.

Why Kubernetes?

Kubernetes addresses these limitations while providing additional capabilities that AI agents need:

1. Native Support for Stateful Workloads

AI agents are inherently stateful — they maintain conversation history, tool state, and session context. Kubernetes provides:

  • StatefulSets for ordered, stable deployments
  • PersistentVolumes for durable storage
  • Headless Services for stable network identities

But more importantly, Kubernetes enables patterns where stateless pods connect to external state stores (Redis, PostgreSQL) with graceful failover.

2. Declarative Configuration

Define your agent infrastructure as YAML manifests:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
  name: support-agent
spec:
  replicas: 3
  promptPackRef:
    name: support-prompts
  providerRef:
    name: claude-provider
  session:
    type: redis

Store configurations in Git. Apply with kubectl. Track changes over time. Roll back when needed. The same GitOps patterns you use for other infrastructure work for AI agents.

3. Autoscaling That Understands AI Workloads

Kubernetes HPA (Horizontal Pod Autoscaler) can scale based on custom metrics:

spec:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 20
    metrics:
      - type: memory
        target: 70%
      - type: custom
        name: active_connections
        target: 100

AI agents are often I/O bound (waiting for LLM API responses), not CPU bound. Scaling on memory utilization or active connections makes more sense than scaling on CPU — and Kubernetes supports both.

For even more sophisticated scaling, KEDA (Kubernetes Event-Driven Autoscaling) enables scale-to-zero — crucial for cost management when you have many agents with variable traffic.

4. Multi-Tenancy and Isolation

Enterprise AI deployments need isolation between teams, environments, and customers:

  • Namespaces provide logical isolation
  • Network Policies control traffic flow
  • Resource Quotas prevent runaway costs
  • RBAC controls who can deploy what

Kubernetes provides these capabilities out of the box, battle-tested at scale across thousands of organizations.

5. Ecosystem Integration

The Kubernetes ecosystem provides pre-built solutions for common AI infrastructure needs:

You don’t have to build this infrastructure — you integrate with existing solutions.

The CNCF AI Bet

The Cloud Native Computing Foundation is making significant investments in Kubernetes for AI:

CNCF AI Conformance Program (November 2025)

The CNCF launched an AI Conformance Program specifically for Kubernetes, establishing standards for AI workloads on K8s. This signals industry-wide recognition that Kubernetes is the deployment target for AI infrastructure.

Kagent (CNCF Sandbox)

Kagent is an open-source framework for AI agents in Kubernetes, backed by Solo.io and recently accepted into the CNCF Sandbox. It provides:

  • Custom Resources for agent definitions
  • K8s-native deployment
  • MCP integration for tool calling

Kubeflow Evolution

Kubeflow, the CNCF’s ML platform for Kubernetes, continues to evolve with better support for inference workloads alongside training.

OpenTelemetry for AI

The CNCF’s OpenTelemetry project is adding semantic conventions specifically for LLM observability — trace attributes for token counts, model identifiers, and AI-specific metrics.

The Operator Pattern for AI Agents

The most powerful approach for Kubernetes-native AI agents is the operator pattern — extending Kubernetes with custom resources and controllers.

Custom Resource Definitions (CRDs)

Define new Kubernetes resource types specific to AI agents:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime       # Deploy and manage agents
---
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: PromptPack         # Version prompt configurations
---
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: ToolRegistry       # Register available tools
---
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: Provider           # Configure LLM providers

Controllers That Understand AI

Kubernetes controllers watch custom resources and reconcile actual state with desired state:

  • AgentRuntimeController: Creates Deployments, Services, ConfigMaps for agents
  • PromptPackController: Manages prompt versioning and canary rollouts
  • ToolRegistryController: Discovers and validates tool endpoints

The controller handles the complexity. You just define what you want.

Automatic Configuration Propagation

When you update a PromptPack, the controller:

  1. Detects the change
  2. Updates the ConfigMap
  3. Triggers a rolling restart of affected agents
  4. Tracks the rollout status

No manual coordination required.

The Sidecar Architecture for AI Agents

A proven pattern for Kubernetes AI agents is the sidecar architecture:

+-----------------------------------------------------+
|                    Agent Pod                         |
|                                                     |
|  +---------------------+  +---------------------+  |
|  |   Facade Container  |  |  Runtime Container  |  |
|  |                     |  |                     |  |
|  |  - WebSocket Server |  |  - LLM Communication|  |
|  |  - Protocol Handling|  |  - Tool Execution   |  |
|  |  - Health Checks    |  |  - Session Logic    |  |
|  |                     |  |                     |  |
|  +----------+----------+  +----------+----------+  |
|             |    gRPC (localhost)     |              |
|             +------------------------+              |
+-----------------------------------------------------+

Facade Container: Thin, stateless, handles client protocols (WebSocket, HTTP). Easy to scale and debug.

Runtime Container: Fat, handles LLM interaction, tool execution, business logic. Can be swapped for different implementations.

Benefits:

  • Separation of concerns: Protocol handling vs. AI logic
  • Independent scaling: More facades if connection-bound, more runtimes if compute-bound
  • Framework flexibility: Different runtime implementations for different frameworks
  • Easier debugging: Isolate issues to protocol or logic layer

Multi-Cloud and Hybrid Deployment

Kubernetes runs everywhere — AWS EKS, GCP GKE, Azure AKS, on-premises, edge locations. The same agent manifests deploy to any Kubernetes cluster.

This matters for enterprises with:

  • Multi-cloud strategies: Avoid lock-in to any single cloud
  • Data sovereignty requirements: Run agents in specific regions
  • Hybrid architectures: Some workloads on-premises, some in cloud
  • Edge deployments: AI agents at the edge for latency-sensitive applications

Define once, deploy anywhere.

The Path Forward

If you’re building AI agent infrastructure, Kubernetes provides:

  1. Battle-tested foundation for production workloads
  2. Declarative, GitOps-friendly configuration
  3. Rich ecosystem for observability, security, and networking
  4. Multi-cloud portability without lock-in
  5. Industry backing from the CNCF and major cloud providers

The question isn’t whether Kubernetes can run AI agents — it’s whether you’re using it effectively.


Key Takeaways

  1. Serverless doesn’t fit AI agent characteristics — long connections, cold starts, memory needs
  2. Kubernetes provides stateful workload support, declarative config, custom scaling, and isolation
  3. The CNCF is betting on K8s for AI with conformance programs, Kagent, and OpenTelemetry extensions
  4. The operator pattern extends Kubernetes with AI-specific custom resources
  5. Sidecar architecture separates protocol handling from AI logic
  6. Multi-cloud portability enables deployment anywhere Kubernetes runs