Early Access · Agent-Agnostic Eval Platform

Evaluate Every Decision
Your Agent Fleet Makes.

CAS Framework is an evaluation platform for enterprise agentic AI. Score your agents across three dimensions — Compliance, Policy Adherence, and Agentic Patterns — and visualise the execution DAG with per-node CAS scores in real time.

3
Evaluation categories
0
Raw data leaves your VPC
90%
LLM eval cost reduction
<5ms
Policy propagation, live
CAS Framework · Execution DAG — Live
EVALUATING
SupervisorAgent WORKFLOW ADHERENCE 0.96 ✓ MCPTool.github TOOL CALLBACK SAFETY 0.91 ✓ PII_Scanner COMPLIANCE · DEFAULT 0.99 ✓ DataAnalysisAgent CODE EXECUTION SANDBOX 0.74 ⚠ Slack_Notification POLICY ADHERENCE 0.51 ✗ BLOCKED
Agentic Patterns
Tool Safety
Compliance
Code Sandbox
OTel Trace → DAG Reconstruction Per-Node CAS Scoring Zero-Egress by Architecture Google ADK · LangGraph · CrewAI · Any OTel Agent Auto-Generated Compliance ADRs
Evaluation Platform

Three Dimensions.
One Score Per Agent.

Every agent in your fleet is evaluated across three independent dimensions. Some signatures are defaults — applied to every agent, regardless of type. Others are assigned per agent-class to match its specific responsibilities.

DIMENSION 01 / 03
🔵

Compliance

Ensures every agent output adheres to data governance, privacy regulation, and organisational policy boundaries. Catches PII leakage, data residency violations, and sensitive context egress before it leaves the execution boundary.

PII Density DEFAULT — ALL AGENTS
Data Residency Check DEFAULT — ALL AGENTS
HIPAA / PCI Field Exposure
Regulatory Boundary Adherence
DIMENSION 02 / 03
🟡

Policy Adherence

Verifies agents follow your organisation's defined behavioural rules — which MCP tools are permitted, what state mutations are allowed, which output topics are restricted. Scored against CISO-defined DSPy guardrails.

Forbidden State Mutations DEFAULT — ALL AGENTS
MCP Tool Allowlist Enforcement
Output Topic Restrictions
A2A Delegation Permissions
DIMENSION 03 / 03
🟣

Agentic Patterns

Evaluates whether the agent followed its declared execution pattern — supervisor → specialist routing, loop bounds, tool call ordering, delegation protocols. Different agent types have different expected patterns.

Workflow Adherence (Orchestrators)
Code Execution Sandbox (Executors)
A2A Protocol Delegation (Routers)
Tool Callback Sequence (Tool Agents)
Default — runs on every agent regardless of type
Per-agent-class — assigned based on agent role
DAG Visualiser

See How Your Agent Fleet
Actually Thought.

Your OTel trace spans already contain the full execution graph. CAS Framework reconstructs that as a visual DAG — and overlays per-node CAS scores, signature verdicts, and violation details on every agent and tool in the run.

🗺 Trace · a3f8b2c1 · FinanceOps Pipeline · 847ms total
4 ✓ PASS 1 ✗ BLOCK
SupervisorAgent AGENTIC PATTERNS · WORKFLOW 0.96 ✓ MCPTool.github POLICY · TOOL CALLBACK SAFETY 0.91 ✓ PII_Scanner COMPLIANCE · DEFAULT SIG 0.99 ✓ DataAnalysisAgent AGENTIC PATTERNS · CODE SANDBOX 0.74 ⚠ BillingAgent COMPLIANCE · A2A PROTOCOL 0.88 ✓ Slack_Notification POLICY ADHERENCE 0.51 ✗ BLOCKED
Agentic Patterns eval
Policy eval
Compliance eval
Code Sandbox eval
Blocked
SELECTED NODE — DataAnalysisAgent
Eval Dimension
Agentic Patterns · Code Execution Sandbox
DSPy Signature Applied
EvaluateADKCodeExecutionSandbox
0.74
CAS Score
340ms
Span Latency
Violation Reason
Import of requests library detected. Not in allowed_libraries. Network egress attempt flagged.
AVG CAS BY AGENT — THIS TRACE
SupervisorAgent
0.96
PII_Scanner
0.99
MCPTool.github
0.91
BillingAgent
0.88
DataAnalysisAgent
0.74
Slack_Notification
0.51
📋 1 ADR auto-generated from this trace. Slack_Notification blocked — forbidden state mutation on user_permissions. Policy TOOL_SAFETY_004.
Enterprise Architecture

Four Zeros That Make
Enterprise Adoption Inevitable.

Each "Zero" eliminates a category of blockers — from security reviews to developer overhead. Together they mean your platform, security, and AI teams can all say yes at the same time.

Security Architecture

Evaluate Locally.
Egress Only Math.

Raw agent conversations, prompts, and PII evaluated inside your VPC. Only CAS scores cross the network boundary.

01
Agent emits OTel spans
Your agent executes, generating trace spans. Everything stays inside your network from this moment forward.
INSIDE YOUR VPC
02
OTel Collector strips PII inline
Built-in transform and redaction processors remove cleartext PII and tokens before spans reach the evaluation engine.
INSIDE YOUR VPC
03
DSPy evaluates using local LLMs
FastAPI + DSPy runs Compliance, Policy, and Agentic Pattern signatures using your local model keys. No external API touches your conversations.
INSIDE YOUR VPC
04
Presidio runs offline NLP redaction
Hatchet worker runs heavy Presidio NLP models in memory, replacing any remaining entities with typed placeholders.
INSIDE YOUR VPC
05
Only mathematical scores egress
Trace ID, CAS score, violation flags, sanitised reasons. Zero raw data. Proven by architecture, not policy.
ZERO-EGRESS CONFIRMED
// THE COMPLETE EGRESS PAYLOAD
// ✅ Everything that leaves your VPC { "trace_id": "a3f8b2c1-e4d2-...", "agent_name": "SupervisorAgent", "cas_score": 0.92, "compliance_score": 0.96, "policy_score": 0.88, "patterns_score": 0.91, "violation_flags": [], "raw_prompt_present": false, "raw_pii_present": false // ← No prompts. No PII. Pure math. }
✓ ENTERPRISE ARCHITECTURE APPROVED
Zero-Egress by architecture means your legal, security, and compliance reviews become straightforward — there is nothing sensitive to review in the data flow. Full architecture diagrams →
Developer Experience

Your Developers
Never Touch Their Repos.

Platform Engineering owns the deployment. AI Engineers add one label. That's the entire integration surface.

01
Platform team installs one Helm chart
A single helm install deploys the Mutating Webhook, Eval Engine, and Control Plane to your cluster.
5-MINUTE DEPLOY
02
AI Engineers add one label to their Deployment
cas-framework.ai/inject: "enabled" — the entire integration commitment from an AI engineering team.
1 LABEL ONLY
03
Mutating Webhook auto-injects sidecars
Intercepts Pod creation, injects the OTel Collector + CAS Eval Sidecar automatically. No containers to manage, no volumes to configure.
ZERO-TOUCH
04
Evaluation starts immediately. Non-blocking.
If the sidecar crashes, your agent keeps running. Compliance evaluation is observational — never in the agent's critical execution path.
ZERO CODE CHANGE
// THE ENTIRE DEVELOPER INTEGRATION
# Platform team: one-time cluster setup helm upgrade --install seeti-core \ ./k8s/chart/seeti-core/ \ --namespace default # AI Engineering team: one label per agent metadata: labels: cas-framework.ai/inject: "enabled" # That's literally it. No SDK. No PR. No release.
✓ ZERO DEVELOPER FRICTION
No proprietary dependencies in agent codebases. Fully reversible by removing the label. No blast radius on your agents if evaluation infra has issues.
Dynamic Policy Engine

New MCP Tool Adopted.
Policy Live in 4 Seconds.

When your agent fleet adopts a new tool dynamically, evaluation coverage follows in milliseconds — not sprints.

01
Policy Author writes rule in plain language
The CAS Policy Commander accepts natural language input: "Block all agents from writing to financial_db unless user_intent confirms authorisation."
NATURAL LANGUAGE
02
LLM Meta-Compiler produces a DSPy Signature
An internal LLM translates the natural language rule into a validated JSON DSL that maps to a strict dspy.Signature class, ready for evaluation.
AUTO-COMPILED
03
Shadow Mode: backtest on 30 days history
Before enforcement, the new signature runs in shadow mode against historical traces. Review what it would have caught — without impacting live agents.
SAFE TESTING
04
One-click propagation. Zero pod restarts.
gRPC/SSE Sync Engine pushes the compiled signature to every connected Sidecar globally in milliseconds. No CI/CD. No deployment window. No engineers paged.
<5ms · ZERO DOWNTIME
// POLICY LIFECYCLE
# 1. Deploy in shadow mode — safe observation POST /v1/policies/deploy { "rule": "Block financial_db writes without auth", "target_agents": ["*"], "mode": "shadow" } # 2. Review 30-day backtest in dashboard # 3. Flip to enforce — no redeploy PATCH /v1/policies/{id} { "mode": "enforce" } # → propagation_ms: 3.1 # → sidecars_updated: 847 # → pods_restarted: 0
✓ COMPLIANCE AT ENGINEERING SPEED
Full policy version history. Rollback in one click. Every enforcement action logged as an immutable audit record.
OTel Native

One Integration.
Every Agent Framework.

OpenTelemetry is the universal observability standard. CAS Framework speaks it natively — so any OTel-emitting agent is automatically supported.

01
Any framework emitting OTel spans is supported
Google ADK, LangGraph, CrewAI, AutoGen, or any custom agent. If it speaks OTel, CAS Framework can evaluate it.
CNCF STANDARD
02
Two-line instrumentation for Python agents
import openlit; openlit.init(endpoint="cas-sidecar:4318") — two lines covers all three evaluation dimensions on any Python agent.
2 LINES
03
Default signatures run on every agent
PII Density, Data Residency, and Forbidden State Mutation signatures apply to all agents automatically — guaranteed baseline coverage regardless of framework.
BASELINE COVERAGE
04
One DAG, one dashboard, across all frameworks
ADK, LangGraph, and CrewAI agents appear on the same DAG, evaluated against the same CAS standards. One compliance posture for your entire fleet.
ZERO LOCK-IN
// SAME INTEGRATION. EVERY FRAMEWORK.
# Google ADK agent import openlit openlit.init(endpoint="cas-sidecar:4318") # LangGraph agent import openlit openlit.init(endpoint="cas-sidecar:4318") # CrewAI agent import openlit openlit.init(endpoint="cas-sidecar:4318") # Custom agent built in-house import openlit openlit.init(endpoint="cas-sidecar:4318") # The Sidecar evaluates them all identically.
✓ FRAMEWORK-AGNOSTIC BY ARCHITECTURE
Switch agent frameworks without re-instrumenting compliance. Your evaluation coverage migrates automatically.
The Aha Moment

Your Fleet Adopted
a New MCP Tool.
Evaluation Live in 4 Seconds.

Every compliance cycle traditionally requires engineering tickets, DSL rewrites, PRs reviewed, staging tested, production windows coordinated. By the time a new threat is addressed, your fleet has already been running unprotected for two weeks.

CAS Framework's Dynamic Signature Sync Engine closes that window to seconds. A policy author writes in plain language, an internal LLM compiles it to a validated dspy.Signature, and a gRPC/SSE persistent connection propagates it globally — without restarting a single Kubernetes pod.

Old workflow
🔴 2–4 weeks: ticket → DSL → PR → staging → prod
CAS Framework
🟢 4 seconds: write → compile → propagate globally
Downtime
🔴 Full deployment window, pod restarts
🟢 Zero restarts. Zero dropped evaluations.
New MCP tool
🔴 Manual signature per tool, each sprint
🟢 DSPy auto-compiled, live in milliseconds
Who acts
🔴 On-call engineer required
🟢 Policy author acts directly, no eng involved
CAS Policy Commander
STEP 1 — POLICY AUTHOR INPUT (NATURAL LANGUAGE)
↓  LLM Meta-Compiler  ↓
STEP 2 — COMPILED DSPy SIGNATURE
class EvaluateMCPToolSafety(dspy.Signature): context: str # tool_call + user_intent user_authorized: bool violation_reason: Optional[str] # fires on: MCP spans where # tool.target contains "financial_db"
↓  gRPC/SSE Sync Engine  ↓
3.2ms
Propagated to 847 Sidecars globally. Zero pods restarted. Policy is live now.
📋 ADR AUTO-GENERATED ON FIRST ENFORCEMENT
agent: DataAnalysisAgent
policy: EvaluateMCPToolSafety v1.2
verdict: BLOCKED
reason: financial_db.write called without
        user_intent authorization key
cas_score: 0.31
Generated automatically · 0.4s after enforcement
Governance Automation

Every Agent Decision.
Documented Automatically.

Architecture Decision Records were always manual — engineers documenting why a system behaved the way it did. For an agent fleet making thousands of routing decisions per second, that's impossible without automation.

CAS Framework generates structured ADRs from every evaluation. When an agent chooses one MCP over another, routes to a specialist, or gets blocked — the reasoning, score, applied policy, and recommendation are captured as an immutable record.

🌐
Global Fleet ADRs
Organisation-wide compliance posture. Aggregate CAS trends across all agents and all projects. Feeds directly into leadership dashboards.
LEADERSHIP · GOVERNANCE · BOARD REPORTING
📁
Project / Agent ADRs
Per-agent decision history. Why did SupervisorAgent choose this MCP 847 times this week? What triggered the pattern violation on DataAnalysisAgent?
ENGINEERING · AI LEADS · AUDITORS
Real-Time Violation ADRs
Immediate structured record of every blocked action with DSPy reasoning, affected policy, severity score, and remediation suggestion.
SECURITY OPERATIONS · COMPLIANCE RESPONSE
📋
ADR-2026-0341 · AUTO-GENERATED · 0.4s ago
BLOCKED
Agent
DataAnalysisAgent / FinanceOps-Prod
Decision Under Review
Agent attempted to execute Python importing requests for an external network call during code sandbox execution.
Evaluation Dimension
Agentic Patterns · Code Execution Sandbox
DSPy Signature Applied
EvaluateADKCodeExecutionSandbox
CAS Score
0.29 — Red Zone (threshold 0.75)
DSPy Reasoning
requests is in forbidden_libraries for this agent's policy
Network egress attempt detected: allow_network_egress: false
Execution halted before any external call completed
Recommendation
Update agent task prompt to use internal DB adapter. Escalate to AI Engineering for sandbox policy review.
Over-The-Horizon Intelligence

Fleet Health, Risk Posture,
and Cost Projections — in One View.

Aggregate CAS scores, per-agent drift detection, 90-day risk forecasts, and cost shield metrics — from real-time operational data to strategic leadership dashboards.

📊 CAS Framework — Executive Risk Dashboard · FinanceOps Org
LIVE
GLOBAL CAS SCORE
0.89
Fleet-wide average
↑ +0.04 vs 30d
COST SHIELD
$41k
Saved via LLM Cascade
↑ 89% on local models
BLOCKED TODAY
14
Policy violations blocked
↓ −38% vs yesterday
CAS SCORE BY AGENT
SupervisorAgent
0.96
PII_Scanner
0.99
MCPTool.github
0.91
DataAnalysisAgent
0.74
Slack_Notification
0.51
30-DAY TREND
30d ago↗ improvingToday
AI RISK POSTURE — FLAGGED AGENTS
AgentCASRiskTop ViolationRecommended Action
DataAnalysisAgent0.74MEDIUMCode Sandbox: forbidden libraryReview agent prompt + sandbox policy
Slack_Notification0.51HIGHPolicy: state mutation violationImmediate review. Enable shadow mode.
BillingAgent0.77MEDIUMA2A delegation to unknown targetUpdate allowed specialist targets

Operational (live) · Tactical (weekly trends) · Strategic (90-day AI risk posture projections)

Integrations

Every Framework.
Two Lines of Code.

OpenTelemetry is the universal language. Any agent that speaks it gets full evaluation coverage — compliance, policy adherence, and agentic pattern scoring — automatically.

🤖
Google ADK
First-class support. Pre-built DSPy signatures for all four ADK agent patterns — Workflow, Tool Safety, A2A Delegation, Code Sandbox.
4 SIGNATURES INCLUDED
🔗
LangGraph
Graph-based workflows emit OTel spans natively. Full DAG reconstruction, compliance scoring, and ADR generation out of the box.
OTEL NATIVE
🚀
CrewAI
Role-based multi-agent crews instrumented in two lines. Evaluate crew coordination, task delegation, and tool usage compliance per-agent.
OTEL NATIVE
⚙️
Any OTel Agent
AutoGen, Semantic Kernel, custom in-house agents — if it emits OTel spans, the Sidecar evaluates it. Bring your own DSPy signatures for custom patterns.
BRING YOUR OWN
OpenTelemetry CNCF + DSPy Semantic Eval + Kubernetes Helm + ClickHouse Analytics + Presidio NLP Redaction Enterprise Evaluation Platform
Built For

The Three Teams Building
Enterprise AI.

Compliance infra your engineering org will actually adopt.

The Problem: Every observability vendor asks you to fork agent codebases, install proprietary SDKs, manage vendor lifecycle, and coordinate releases across 12 teams. Compliance infra becomes a toil machine that nobody wants to maintain.
CAS Framework: One Helm chart. One Kubernetes label. Mutating Webhook handles the rest. Zero SDK in any agent repo. Fully reversible. Non-blocking if the eval infra has issues.
1 label
The entire integration surface for an AI engineering team to get full evaluation coverage.
Non-blocking
Sidecar failure drops evaluations — your agents never pause. Compliance is never in the critical path.
HPA native
Evaluation workers scale automatically with Kubernetes HPA. No custom autoscaling logic.

Evaluate your fleet the same day you ship it.

The Problem: Your agent patterns evolve weekly. You're adopting new MCP tools constantly. Every compliance update is a sprint. And you have no way to visualise whether your agents are actually following the routing logic you designed.
CAS Framework: Pre-built DSPy signatures for ADK cover your four core agent patterns from day one. The DAG visualiser shows you exactly how your fleet executed — node by node, score by score. Dynamic Signatures mean new tools get eval coverage in 4 seconds.
DAG + scores
Visualise your agent's execution graph with per-node CAS scores on every run.
4 seconds
New MCP tool adopted by your agents? Evaluation signature compiled and live in 4 seconds.
90% cost ↓
LLM Cascade routes 90% of eval calls to local cheap models. Eval at fleet scale stays affordable.

Strategic visibility into your AI fleet's risk posture.

The Problem: You're deploying agents across every business unit. Your board wants to understand AI risk. You have no aggregate view of compliance posture, no cost projection, and no leading indicator of which agents are drifting before they cause incidents.
CAS Framework: Leadership dashboards that roll up CAS scores from every agent, every project, every org unit — into global fleet health, cost shield metrics, 90-day risk projections, and per-agent drift alerts that belong in board decks, not engineering standups.
Fleet HUD
Master CAS dial, cost shield metrics, fleet topology map — one executive view across every agent in your org.
90-day
AI risk posture projections from trend analysis and agent drift pattern detection.
BYO-Vault
Deep-link from any ADR directly into your team's existing Datadog or Splunk instance using the trace ID.
Market Position

Designed for Agent Fleets.
Not Retrofitted from APM.

CapabilityCAS FrameworkTraditional Observability / APM
Evaluation Model3-dimension CAS: Compliance, Policy, Agentic PatternsMetrics and traces only — no semantic evaluation
Agent DAG VisualisationOTel trace → DAG reconstruction with per-node scoresFlat trace waterfalls, no agent topology
Data PrivacyZero-Egress — local eval, offline NLP redactionRaw prompts and responses egressed to vendor cloud
Default SignaturesPre-built DSPy signatures for ADK patterns, OOTBNo semantic evaluation capability
Policy UpdatesDynamic gRPC/SSE push — <5ms, zero pod restartsFull CI/CD redeploy, 2–4 week cycle
Evaluation CostLLM Cascade — 90% on fast/cheap local modelsFrontier API per evaluation, unbounded cost
Compliance ADRsAuto-generated per violation — structured, immutableManual documentation if it exists at all
Agent Framework SupportOTel-native — ADK, LangGraph, CrewAI, any agentFramework-specific, proprietary SDK per vendor
Developer OverheadOne K8s label — no SDK, no PR, no releaseSDK install and maintenance across every repo
Get Started

Evaluate Every Agent.
Score Every Decision.

CAS Framework deploys in 5 minutes. Your first DAG with per-node CAS scores appears immediately after.

# Deploy in 5 minutes helm repo add aidatarefinery https://charts.aidatarefinery.io cd aidatarefinery helm upgrade --install seeti-core \ ./k8s/chart/seeti-core/ \ --namespace default \ --set sidecar.llmApiKey="your_api_key" # Your agent fleet DAG appears automatically.