The Problem: Why Most AI Support Deployments Fail
In 2024, Klarna made headlines claiming their AI assistant was doing the work of 700 agents. By 2025, they were rehiring humans after customer satisfaction cratered. They are not alone. Gartner predicts that over 50% of organizations that replaced customer service reps with GenAI will reverse course by 2028.
The pattern is predictable. A team launches an AI chatbot. It handles the easy questions well. Then it hallucinates a refund policy. Fabricates a shipping status. Tells a customer to ship their laptop to a truck stop. The Air Canada chatbot invented a bereavement discount and the company was held legally liable for it.
Meanwhile, PwC research shows that 71% of consumers will abandon a brand after one bad AI interaction. The stakes are not hypothetical.
The failure is not in the AI models. The failure is in how teams deploy them: no guardrails, no measurement, no escalation path, and no way to debug what went wrong after the fact. You would never ship a web service without monitoring and alerting. Why would you ship a customer-facing AI agent without them?
The Maturity Path: Assist, Execute, Operate
The organizations that succeed with AI support do not flip a switch. They progress through maturity levels, expanding AI autonomy as they build confidence and measurement. Here is what that looks like for customer support.
Assist
AI suggests. Humans act.
- AI drafts replies for your agents to review and send
- AI surfaces relevant knowledge base articles during live conversations
- AI pre-fills ticket forms with intent classification and priority
- AI provides real-time sentiment analysis and escalation recommendations
Execute
AI resolves tier-1. Humans oversee quality.
- AI autonomously handles password resets, order status inquiries, return initiation
- Guardrails enforce topic boundaries, factual grounding, and brand voice
- Confidence scoring routes uncertain conversations to humans
- Human QA team reviews a sample of AI conversations daily
Operate
AI manages the queue. Humans handle exceptions.
- AI handles 80%+ of customer interactions end-to-end
- AI executes multi-step workflows: refunds, account changes, billing disputes
- AI proactively reaches out about shipping delays, renewal reminders
- AI identifies systemic issues (product defects, policy confusion) and alerts your team
No vendor can drop you at Level 3 overnight. Anyone who claims otherwise is selling you the next Klarna headline. The path is sequential, and each level requires measurement infrastructure that most platforms do not provide.
What You Can Measure (And Why It Matters)
Vanity metrics like "number of conversations handled" tell you nothing about whether your AI is actually helping customers. Here are the KPIs that matter, and why.
Resolution Rate
The percentage of conversations where the customer's problem was actually solved — not just where the conversation ended. This is the most gamed metric in AI support. A customer who gives up is not a resolution.
CSAT (Customer Satisfaction)
Post-interaction satisfaction scores, segmented by AI-handled vs. human-handled. If your AI CSAT is significantly lower than human CSAT, you have a quality problem. Track the trend, not a single number.
Cost Per Conversation
Total cost including LLM inference, guardrail overhead, infrastructure, and the human time spent on escalations and QA review. Per-resolution pricing ($0.99/conversation) sounds cheap until you factor in the full picture.
Escalation Quality
When the AI escalates, does the human agent have the context they need? One in three agents lack the customer context needed to resolve the issue. Bad handoffs compound the problem.
Time to Resolve
End-to-end time from first contact to confirmed resolution. AI should reduce this, but only if it resolves correctly on the first attempt. Faster wrong answers make things worse, not better.
If your AI support platform cannot break these metrics down by conversation type, by maturity level, and by time period, you are flying blind. You cannot improve what you cannot measure, and you cannot trust what you cannot verify.
How Omnia Helps
Omnia is an open-core AgentOps platform built on Kubernetes. It is not a customer support product — it is the infrastructure that makes customer support agents reliable, observable, and governable. Here is how its capabilities map to support needs.
Session Management for Conversation Continuity
Omnia's three-tier session storage (Redis hot, Postgres warm, S3/GCS/Azure cold) keeps conversation state durable across channel switches and agent restarts. A customer who starts on chat and moves to phone does not repeat themselves. Session retention policies let you control how long data lives and where.
OSSMulti-Provider for Cost Optimization
Route conversations to the right model for the job. Simple FAQ lookups do not need your most expensive model. Complex escalation summaries do. Omnia supports all major LLM providers (Claude, OpenAI, Gemini, Ollama, Bedrock, Vertex, AzureAI) and lets you define routing policies per agent, per conversation type.
OSSPolicy Enforcement for Guardrails
AgentPolicy CRDs define what your AI can and cannot do — topic boundaries, action permissions, spending limits, data access controls. Policies are declarative, versioned, and auditable. No more "the AI went rogue" incidents because the guardrails are infrastructure, not prompt engineering.
OSS + EnterpriseObservability for Debugging Bad Responses
OpenTelemetry tracing on every conversation turn. When a customer gets a bad answer, you can trace exactly what happened: which documents were retrieved, what the model saw in context, which guardrail evaluated what, and why the confidence score was what it was. Prometheus metrics and Grafana dashboards give you aggregate health at a glance.
OSSArena for Comparative Evaluation
Before you promote your support agent from Assist to Execute, test it. Arena lets you run the same conversations through different model configurations, guardrail settings, and prompt versions side by side. Measure which configuration resolves more accurately before it touches a real customer.
EnterpriseAnalytics Export for Business Intelligence
Stream conversation analytics to Snowflake, BigQuery, or ClickHouse. Build the CSAT dashboards, cost attribution reports, and escalation quality analyses your leadership team actually needs. Your data, your warehouse, your queries.
EnterpriseOmnia runs on your Kubernetes cluster. Your data never leaves your infrastructure. Every component is swappable. For full documentation, see the Omnia docs.
Related Reading
Start with Assist. Scale to Operate.
You do not need to automate everything on day one. Start by giving your agents better tools. Measure what works. Expand autonomy when the data says you are ready.