Insights on AgentOps, platform engineering, and shipping AI to production.
Memory, End-to-End: This Week in Omnia
Everyone on Reddit and X is shipping agentic memory this month. We've had basic memory in Omnia for a while; this week we cranked it up a bit more. Here's what landed between 2026-04-18 and 2026-04-24 — MemoryRetentionPolicy as a CRD, consent-revocation cascade, purpose-filtered retrieval, trust-aware redaction, summarisation-as-an-agent — and the facade auth chain that landed the same week, which is really the other half of the same story.
Skills, End-to-End: This Week in Omnia and PromptKit
For the last month 'Skills' meant slightly different things in each repo. This week both halves finally met — PromptKit shipped the primitives that make a skill safe to load, Omnia shipped the CRD, reconciler, runtime logging and dashboard that make declaring one a one-line change. Here's the status update: what shipped between 2026-04-11 and 2026-04-17, why it matters, and the PromptKit v1.4.5 release that dropped alongside it.
The First Rule of Fine-Tuning Is: You Don't Need to Fine-Tune
Fine-tuning isn't a model upgrade — it's a way of baking whatever data you already have into the model's wiring, permanently, in a way you can't edit afterwards. Here's what's actually happening inside, why LoRA and QLoRA made it cheap without changing anything about inference, and why the teams that win at it are the ones who did the unglamorous data work first.
The Two Families of Generative Inference: Autoregressive and Iterative Refinement
Every generative model in production today belongs to one of two architectural families. Text and music went autoregressive. Images and video went diffusion. Speech is a mess split across both. Here's how the two shapes differ, and why the choice settles almost every interesting operational question about the infrastructure underneath.
Progressive Rollouts for AI Agents: Canary, Blue/Green, and Experiments in Six Phases
Two months ago we wrote about why prompt changes need canaries. This week we shipped the real thing — an Istio-backed, session-aware rollout system for AgentRuntime, built in six phases. Here's how it works and what we learned building it.
Bulletproofing Streaming LLM Calls: Three Layers of Back-Pressure
A single HTTP/2 reset can kill 100 concurrent LLM streams at once. Naively retrying them makes it worse. Here's the three-layer back-pressure stack we built in PromptKit — and the benchmark showing it kept us 6× more efficient than LangChain at 2000 concurrent.
What Actually Happens When You Call an LLM API
Inside the token-by-token generation loop, the KV cache, vLLM's PagedAttention, and why 'just retry the request' is harder than it looks when the API you're calling isn't stateless at all.
Why 95% of AI Pilots Fail to Reach Production (And What to Do About It)
The barrier has shifted from AI technology to AI operations. Here's why most pilots die in deployment and what production AI actually requires.
How Transformer Attention Actually Works: A Worked Example
Attention, embeddings, Q/K/V, softmax — walked through by hand with two-dimensional numbers a platform engineer can verify on the back of an envelope. No machine-learning background required.
The Klarna Effect: What Happens When You Scale AI Agents Without Measurement
Klarna's AI went from triumph to cautionary tale. Here's what every CX leader deploying AI in 2026 needs to learn from their journey.
Why Platform Engineers Are the Next AI Engineers
If you've spent five years building on Kubernetes, you already have 90% of the skills needed to operate AI agents in production. Here's why the 'AI skills gap' is mostly a tooling gap.
The Framework Lock-In Trap: Why Your AI Agent Platform Shouldn't Pick Sides
Most agent deployment platforms force you into a single framework. Here's why framework-agnostic infrastructure matters and how to avoid costly lock-in.
Self-Hosted AI Agents: Why You Shouldn't Need an Enterprise Contract
Most AI agent platforms gate self-hosted deployment behind enterprise sales calls. Here's why that model is broken and what self-hosted infrastructure should actually look like.
PromptPack: A Portable Standard for AI Agent Configuration
AI teams face the same configuration chaos that Docker solved for applications. PromptPack provides a portable, versioned standard for packaging AI agent prompts, tools, and configuration.
Arena Fleet: Why AI Agents Need Unified Testing Infrastructure
AI agents require three types of testing -- load, evaluation, and data generation -- but most teams use fragmented tools. Here's why unified testing infrastructure changes the game.
Kubernetes-Native AI Agents: Why the CNCF Is Betting on K8s for AI
Serverless doesn't fit AI agent workloads. Here's why Kubernetes is emerging as the foundation for production AI agent infrastructure, backed by CNCF investments.
Context-Based Isolation: Solving the Multi-Session AI Compliance Problem
Most AI agent platforms have no concept of compliance-grade session isolation. Here's why context-based isolation matters for regulated industries and how to implement it.
Voice AI Agents: The Three Execution Modes You Need to Understand
Building voice AI agents for production requires choosing between VAD pipelines, native audio LLMs, and hybrid architectures. Here's how each mode works and when to use it.
MCP: The Universal Protocol for AI Agent Tool Integration
Every AI framework handles tool integration differently. The Model Context Protocol provides a single standard that works everywhere -- build once, use with any agent.
Observability for AI Agents: What Traditional APM Tools Miss
Your APM dashboard says everything is fine, but users say the AI is broken. Here's what AI-specific observability requires -- from conversation tracing to cost intelligence.
Go vs. Python for Production AI Agents: When Runtime Choice Matters
Python dominates the AI ecosystem, but production AI agents have infrastructure requirements that push teams toward Go. Here's the performance data and a practical hybrid approach.
Canary Deployments for AI Prompts: Reducing the Blast Radius of Prompt Changes
Prompt changes have 100% blast radius by default and fail quietly. Here's how canary deployments -- the same pattern that made code releases safer -- can protect your AI agents.
Multi-Provider LLM Strategy: Why Betting on One Provider Is a Risk
Single-provider lock-in creates outage risk, cost inflexibility, and capability gaps. Here's how to build a multi-provider LLM strategy with practical routing and failover patterns.
Red-Teaming AI Agents: Finding Failures Before Your Users Do
Normal testing proves AI agents work. Red-teaming proves how they fail. Here's how to build automated adversarial testing into your AI agent deployment pipeline.
Cost Intelligence for AI: Your Cloud Bill Doesn't Tell the Whole Story
Your cloud bill says you spent $80K on AI. That tells you almost nothing. Here's how to build application-level cost intelligence that actually enables decisions.
Cloud Agent Platforms Compared: AWS, Azure, Google, and the Open Alternative
Every major cloud provider now offers an AI agent platform. Here's an honest comparison of AWS Bedrock Agents, Azure AI Agent Service, Google Vertex AI, and cloud-agnostic alternatives.
The AI Measurement Paradox: Why 79% Think It Works But Only 29% Can Prove It
Worldwide AI spending will hit $2.5 trillion in 2026, yet most enterprises can't prove their investments are paying off. Here's why measurement is the defining challenge of enterprise AI.
The Knowledge Codification Problem: Why Enterprise AI Is Stuck at Assist
The bottleneck for enterprise AI isn't model quality or infrastructure -- it's the inability to codify institutional knowledge into a form AI systems can execute. Here's how to break through.
From Connectors to Capabilities: Why Your AI Agent Needs More Than API Access
MCP solved the connector problem for AI agents. But connecting to Zendesk isn't the same as knowing how to handle a customer escalation. The next abstraction layer is codified operational knowledge.
The Trust Plateau: Why 79% of Consumers Still Prefer Humans Over AI Agents
Consumer trust in AI agents remains stubbornly low despite massive enterprise investment. Here's why adoption is outpacing trust and how to build a deployment strategy that earns it.
AI Guardrails Stop Being Optional in 2026: What Your Agent Deployment Needs Now
The EU AI Act reaches full enforcement in August 2026. California's AI Transparency Act is already live. Here's what production-grade AI guardrails actually require.
Data Sovereignty and AI: Why Where Your Agent Runs Matters More Than Which Model It Uses
93% of executives now rank data sovereignty as their top technology governance concern. Here's why the physical location of AI inference has become a first-order architecture decision.
The Integration Tax: Why Enterprises Need Six Tools to Run One AI Agent
Deploy one AI agent, watch six vendor contracts appear. The integration tax -- the cumulative cost of a fragmented AI stack -- is a primary driver of AI project failure.
RAG in Production: Why 72% of Enterprise Implementations Fail in Year One
Most enterprise RAG implementations fail not because of model limitations but because of knowledge organization failures. Here are the five failure modes and what actually works.
The Agent Quality Crisis: Why AI-Generated Code Has 1.7x More Issues Than Human Code
AI-generated pull requests contain 1.7x more issues than human-written code, with 2.74x more XSS vulnerabilities. Speed was the 2025 story. Quality will be the 2026 reckoning.
Why Your AI Agent Needs Memory: Building Persistent Relationships, Not Just Conversations
Every conversation with your AI agent starts from zero. Memory infrastructure -- episodic, semantic, and procedural -- is the layer that transforms transactional tools into trusted advisors.
The SI Opportunity: How Consulting Firms Can Turn AI Expertise Into Recurring Revenue
The AI consulting market is $11-22B and growing, but every engagement produces a deliverable, not a product. Here's how SIs can productize domain expertise into reusable, deployable bundles.
Assist, Execute, Operate: A Practical Framework for AI Agent Maturity
40% of agentic AI projects will be canceled by 2027 because organizations skip maturity stages. Here's a three-level framework grounded in what the data shows actually works.
The METR Paradox: When AI Tools Make Experienced Developers 19% Slower
A rigorous randomized controlled trial found AI coding tools made developers 19% slower despite believing they were 20% faster. The implications for how we measure AI ROI are profound.
Testing AI Agents at Scale: Why You Can't Ship What You Can't Measure
42% of AI initiatives failed in 2025 and 39% of AI bots were pulled back due to quality issues. The root cause: deploying systems that can't be adequately tested with traditional methods.
Enterprise AI in 2026: What's Real, What's Hype, and What's Next
The AI agent market is growing at 43% CAGR, but only 130 of thousands of vendors have genuine agent capabilities. Here's how to separate signal from noise.
Building Customer Support Agents That Don't Embarrass Your Brand
AI support agents can resolve 55-70% of tier-1 queries at a fraction of human cost. But 39% of companies pulled back their bots in Q1 2025. Here's what separates success from brand damage.
Reusable AI: Why Every Enterprise Implementation Should Produce a Product, Not Just a Project
42% of AI initiatives fail and each costs $6.8M on average. The root cause: every implementation starts from zero. Here's how to shift from project delivery to product delivery.
Beyond Token Counts: The KPIs That Actually Prove Your AI Agent Works
79% of leaders perceive AI productivity gains but only 29% can measure ROI. Companies with AI-native KPIs see 3x the financial benefit -- yet only 34% have adopted them.