A production-grade autonomous multi-agent AI pipeline built on Claude's Agentic AI framework — turning raw telecom call transcripts into board-ready intelligence. Every call analysed. Every decision traceable. Every stage secured and governed.
Contact centres analyse 2–5% of calls manually. The other 95% are invisible. Supervisors make coaching decisions, product teams make roadmap decisions, and operations leaders make cost decisions based on sampled gut-feel.
This pipeline analyses 100% of calls, autonomously. Every transcript becomes structured data. Every run produces board-level KPIs, cost-lever estimates, and strategically grounded AI recommendations — in minutes, end to end, with a full audit trail of every decision taken.
This system is a direct implementation of Anthropic's Agentic AI framework: agents that perceive, reason, act, observe, and adapt in orchestrated multi-agent pipelines — with the safety, traceability, and governance that production deployment demands.
| Claude Agentic Principle | How it's implemented |
|---|---|
| Tool use | Formal JSON-schema tool registry; each agent calls only its authorised tools via AgentScopeGuard |
| Multi-agent orchestration | 6 specialised agents + a human approval gate in a 7-node LangGraph StateGraph; each stateless, composable, and replaceable |
| Reasoning loops | ReAct (Observe→Reason→Act) per transcript; 3-pass deliberation (Analyze→Critique→Synthesize) for insights |
| Chain-of-Thought | _cot_reasoning as the mandatory first JSON field forces structured reasoning before every extraction and recommendation |
| Long-term memory | Flat JSON run history + Gemini-embedded semantic vector store; top-K similar historical runs injected into InsightsAgent context |
| Decision traceability | DecisionLogger captures WHY every autonomous decision was made — routing choices, exclusions, provider selections, gap-fills — all serialised to decisions_{ts}.json |
| Human-in-the-loop | Configurable approval gate before export; auto-approves in CI, interactive prompt with timeout in production |
| Defence in depth | InputSanitizer, OutputSanitizer, AgentScopeGuard, SecretGuard, RateLimiter — protecting every agent boundary |
| Graceful degradation | Every component has a fallback path; the pipeline always completes with a full audit trail |
Local CSV (telecom_200k.csv — primary) · HuggingFace stream (fallback, 3.7M turns)
│
▼
┌───────────────────────────────────────────────────────────────────────────┐
│ LangGraph StateGraph · Multi-Agent Pipeline v4.0 │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────────────────────┐ │
│ │ DataIngestion │───▶│ Extraction Agent (2/7) │ │
│ │ Agent (1/7) │ │ Claude Haiku 4.5 · 70 fields · CoT │ │
│ │ stream · validate │ │ ┌─────────── ReAct loop ──────────────┐ │ │
│ │ PII scan · audit │ │ │ Observe : score_field_coverage() │ │ │
│ │ decision log │ │ │ Reason : identify null fields │ │ │
│ └─────────────────────┘ │ │ Act : targeted gap-fill call │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────────┐ │
│ │ Aggregation │◀───│ Quality Agent (3/7) │ │
│ │ Agent (4/7) │ │ 100-pt QA model │ │
│ │ KPIs · cost levers │ │ dynamic routing │ │
│ └──────────┬──────────┘ │ exclusion logging │ │
│ │ └─────────────────────────┘ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Insights Agent (5/7) · Vector memory → top-K historical runs │ │
│ │ Provider: NVIDIA NIM (primary) · Claude (fallback) · Rule-based │ │
│ │ ┌──────────── Deliberation loop (self-reflection) ─────────────┐ │ │
│ │ │ Pass 1 Analyze : CoT KPI analysis → initial insights │ │ │
│ │ │ Pass 2 Critique : self-grade each recommendation (A/B/C) │ │ │
│ │ │ Pass 3 Synthesize: rewrite weak/generic recommendations │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────────┐ │
│ │ Approval Gate │───▶│ Export Agent (7/7) │ │
│ │ (6/7) │ │ CSV · JSON · decisions │ │
│ │ human sign-off │ │ manifest · audit log │ │
│ │ auto-approve CI │ │ decision log │ │
│ └─────────────────────┘ └──────────┬──────────────┘ │
└──────────────────────────────────────── │ ───────────────────────────── ┘
│
outputs/ directory
├── summary.json ← Streamlit dashboard
├── call_results_{ts}.csv
├── full_results_{ts}.json
├── qa_report_{ts}.json
├── insights_{ts}.json
├── decisions_{ts}.json ← agent reasoning audit
├── run_manifest_{ts}.json
└── audit_log_{ts}.json
Normal path: DataIngestion → Extraction → Quality → Aggregation → Insights → ApprovalGate → Export
Quality gate failure path: Quality → Export (bypasses aggregation and insights; always completes with audit)
Every component exists because production autonomous systems need it.
| Capability | Implementation | Why it matters |
|---|---|---|
| ReAct extraction loop | ExtractionAgent: Observe (field-coverage score) → Reason (identify null fields) → Act (targeted gap-fill) up to REACT_MAX_ITERATIONS |
Industry-standard agentic control loop; recovers critical fields missed in the first extraction pass |
| Chain-of-Thought | _cot_reasoning is the mandatory first field in every LLM response — the model must reason before extracting |
Forces structured reasoning at zero extra API cost; measurably reduces enum/boolean hallucination |
| Self-reflection deliberation | InsightsAgent: Pass 1 Analyze → Pass 2 Critique → Pass 3 Synthesize | AI critiques its own output before delivering to the user — produces data-grounded, board-ready recommendations, not platitudes |
| Decision traceability | DecisionLogger in every agent: records decision type, reasoning, evidence, alternatives considered, and confidence |
Full audit trail of WHY every autonomous action occurred — retracing, challenging, and explaining agent decisions |
| Plug-and-play model architecture | EXTRACTION_MODEL in config.py drives everything — rate limiters, API clients, cost estimates, provider labels — one change, zero regressions |
Switch from Claude to Gemini or any future model without touching agent code |
| Semantic long-term memory | Gemini text-embedding-004 embeds each run's KPI summary; numpy cosine similarity retrieves top-K most similar historical runs |
Agents learn from history — InsightsAgent receives relevant context, not just averages |
| Multi-layer security | InputSanitizer (injection + PII), OutputSanitizer (code execution + secrets), AgentScopeGuard (tool access), SecretGuard, RateLimiter |
Prompt injection, cross-agent tool hijacking, secret exfiltration, response bombs — all blocked before they reach downstream agents |
| Human approval gate | Configurable checkpoint before export: interactive in production, auto-approve in CI | Autonomous systems need human oversight options; this is where operators review KPIs before outputs are written |
| Governance layer | BudgetGuard (reads BUDGET_USD from config), QualityGate, PIIScanner, AuditLog |
Cost caps, quality thresholds, PII compliance, and event traceability — without manual intervention |
| LangSmith tracing | One env var (LANGCHAIN_TRACING_V2=true) enables full node-level span capture |
End-to-end observability across all 7 pipeline nodes and every LLM provider |
| Dynamic routing | LangGraph conditional edge after QualityAgent | Catastrophic extraction failure routes to safe export; the pipeline never silently fails |
| Checkpoint/resume | Every API call persisted to outputs/.checkpoint_{key}.jsonl immediately |
Kill a 100-call job at call 73 — restart and it resumes from 74 with zero duplicated API spend |
| Process isolation | run_batches.py → Orchestrator → N subprocesses |
A crashed batch cannot corrupt other batches; full batch-level retry with health monitoring |
| 342 unit tests | Security, governance, decision log, memory, orchestrator, tools, config, graph — all tested without API calls | CI completes in under 7 seconds; tests gate every push to main |
| Role | Primary | Fallback 1 | Fallback 2 |
|---|---|---|---|
| Extraction (70 fields/call) | Claude Haiku 4.5 | Gemini 2.0 Flash Lite | — |
| Insights (strategic recs) | NVIDIA NIM llama-3.3-70b-instruct |
Claude Haiku 4.5 | Rule-based |
| Embeddings (vector memory) | Gemini text-embedding-004 |
TF-IDF bag-of-words (offline) | — |
Switching models: Change EXTRACTION_MODEL in pipeline/config.py — all API clients, rate limiters, cost estimates, provider labels, and budget guards update automatically.
| Metric | Value |
|---|---|
| Calls analysed | 100 |
| Fields extracted per call | 70+ |
| QA pass rate | Typically 90–99% |
| Avg QA score | 85–99 / 100 |
| Pipeline runtime | ~8 min (5 × 20 batches, 2s inter-call delay) |
| Extraction cost (Claude Haiku) | |
| Extraction cost (Gemini free tier) | ~$0.00 / 100 calls (free-tier eligible) |
| Deliberation passes | 3 per insights run |
| Decision records per run | 15–60 traceable decisions |
| Test suite | 364 tests · < 7 seconds |
| Checkpoint overhead | Zero — resume is instantaneous |
Prerequisites: Python 3.11+
git clone https://github.com/vindon/telecom-call-intelligence.git
cd telecom-call-intelligence
python -m venv .venv && source .venv/bin/activate
make install-dev
cp .env.example .envAdd your API keys to .env:
# Required for extraction (Claude Haiku — primary)
ANTHROPIC_API_KEY=sk-ant-...
# Required for insights embeddings (Gemini — fallback extraction + vector memory)
GEMINI_API_KEY=AIza...
# Optional: NVIDIA NIM for InsightsAgent primary (falls back to Claude if absent)
NVIDIA_API_KEY=nvapi-...
# Optional: LangSmith tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__...make run # 3-call smoke test (~30 seconds)
make run-batches # 100-call production run (5 × 20 batches)
make dashboard # executive dashboard at localhost:8501make test # 342 unit tests — no API calls required (< 7 seconds)
make lint # ruff linter across all source files
make check # lint + type-check + test (full pre-push gate)
make test-cov # tests with HTML coverage report
make dashboard # Streamlit dashboard on localhost:8501
make clean # remove __pycache__, .pyc, pytest cache| Pattern | Files | Description |
|---|---|---|
| ReAct | extraction_agent.py, analyzer.py |
Per-transcript Observe → Reason → Act control loop with targeted gap-fill retries |
| Chain-of-Thought | analyzer.py, insights_agent.py |
_cot_reasoning as mandatory first JSON field across all LLM responses |
| Self-reflection / deliberation | insights_agent.py |
3-pass Analyze → Critique → Synthesize loop with A/B/C recommendation grading |
| Decision traceability | decision_log.py, all agents |
Structured WHY records for every autonomous decision, with evidence and alternatives |
| Semantic long-term memory | vector_memory.py |
Gemini-embedded KPI histories with cosine similarity retrieval |
| Tool access control | security.py |
AgentScopeGuard enforces per-agent authorised tool sets |
| Human-in-the-loop | graph.py |
Configurable approval gate before any outputs are written to disk |
| Distributed tracing | graph.py |
LangSmith integration via LANGCHAIN_TRACING_V2 — full span capture |
| Plug-and-play providers | config.py, analyzer.py, token_tracker.py |
Single EXTRACTION_MODEL config value drives all downstream provider logic |
| Category | Fields |
|---|---|
| Resolution | fcr_indicator, all_issues_resolved, primary_issue_resolved, escalation_required |
| Effort | total_duration_seconds, total_issues_count, hold_count, phase_hold_total_seconds |
| Intent | issue_1..5_category, avoidable_call, agentic_ai_resolvable, could_be_self_served |
| Sentiment | customer_sentiment_start, customer_sentiment_end, customer_sentiment_improved |
| Risk | repeat_call_risk, customer_expressed_dissatisfaction |
| Commercial | upsell_attempted, upsell_outcome, plans_discussed |
| Operations | primary_cost_driver, agent_skill_rating, agent_tool_struggle_detected, handle_time_efficiency |
| Phase durations | phase_welcome/discovery/diagnosis/resolution/upsell/closing_duration_seconds |
| Reasoning | _cot_reasoning (LLM step-by-step analysis before field extraction) |
The executive dashboard (dashboard/app.py) is a professional light-theme Streamlit application with 6 sections, reading live from outputs/summary.json. Falls back to built-in demo data when no pipeline output is present.
streamlit run dashboard/app.py # http://localhost:8501| Section | Content |
|---|---|
| Hero | "Telecom Cost Intelligence for Care Calls" — eyebrow, headline, and subline sourced from live analysis |
| Cost Panels | Insights from N Calls Analysed (Cost to Serve P1–P4 / Cost to Sell P5 / Cost to Retain — allocated by phase-time share of AHT) · AI Recovery Opportunity (monthly saving vs baseline) |
| 1 — The Evidence | Issue category bar chart · phase-time waterfall |
| 2 — Phase Drill-Down | Per-phase tabs (Discovery/Diagnosis/Resolution/Upsell) — top-5 intents by avg phase duration, with agent stall rate |
| 3 — Resolution Opportunity | PREVENT · AUTOMATE · HUMAN REQUIRED — with cost impact per segment |
| 4 — AI Agents to Build | Ranked roadmap: intent · calls/month · saving/month · build effort |
| 5 — Actual Performance | FCR · AHT · escalation rate · sentiment improved · avoidable rate |
| 6 — Issue Tree | Per-call-type breakdown: what the agent did (issue_1_resolution_method) → Prevent / Automate / Human segment attribution → ranked build queue (top-4 opportunities by monthly $ impact) + full-picture overview bar |
telecom-call-intelligence/
│
├── pipeline/
│ ├── config.py ← Single source of truth for all constants
│ ├── security.py ← InputSanitizer, OutputSanitizer, AgentScopeGuard,
│ │ SecretGuard, RateLimiter — full security layer
│ ├── decision_log.py ← DecisionRecord, DecisionLogger, summarize_decisions
│ │ — agent reasoning audit trail
│ ├── vector_memory.py ← Semantic KPI memory (Gemini embeddings + cosine sim)
│ ├── agents/
│ │ ├── data_agent.py ← Agent 1: DataIngestionAgent
│ │ ├── extraction_agent.py ← Agent 2: ExtractionAgent (Claude Haiku + ReAct)
│ │ ├── quality_agent.py ← Agent 3: QualityAgent (100-pt QA scoring)
│ │ ├── aggregation_agent.py ← Agent 4: AggregationAgent (KPIs + cost levers)
│ │ ├── insights_agent.py ← Agent 5: InsightsAgent (NVIDIA NIM deliberation)
│ │ └── export_agent.py ← Agent 6: ExportAgent (CSV/JSON/decisions/audit)
│ ├── graph.py ← LangGraph StateGraph (7 nodes, conditional routing,
│ │ approval gate, LangSmith tracing)
│ ├── orchestrator.py ← WorkPlanner, AgentHealthMonitor, adaptive retry
│ ├── governance.py ← BudgetGuard, QualityGate, PIIScanner, AuditLog
│ ├── memory.py ← Persistent cross-run flat JSON agent memory
│ ├── tools.py ← Formal tool registry with JSON schemas
│ ├── analyzer.py ← LLM client (Claude/Gemini, CoT, ReAct, checkpoint)
│ ├── aggregator.py ← KPI computation + cost-lever estimates
│ ├── hf_loader.py ← CSV/HuggingFace streaming + offset batching
│ ├── token_tracker.py ← Model-aware token cost accounting
│ └── logger.py ← Structured logging (INFO→stdout, DEBUG→file)
│
├── tests/ ← 342 unit tests (zero API calls, < 7 seconds)
│ ├── test_config.py
│ ├── test_decision_log.py ← 24 decision traceability tests
│ ├── test_governance.py
│ ├── test_memory.py
│ ├── test_orchestrator.py
│ ├── test_tools.py
│ ├── test_graph.py
│ └── test_security.py ← 51 security tests
│
├── dashboard/app.py ← Streamlit executive dashboard (6 sections)
├── api/main.py ← FastAPI wrapper (/health, /summary, /analyze)
├── prompts/system_prompt.txt ← 70-field extraction schema + CoT instructions
│
├── run_pipeline.py ← Single-batch entry point (model-aware key validation)
├── run_batches.py ← Multi-batch orchestrator (5 × 20 default)
├── merge_outputs.py ← Merge batch JSONs into combined dataset
├── qa_audit.py ← Standalone QA scoring tool
│
├── pyproject.toml ← ruff, mypy, pytest configuration
├── Makefile ← Developer shortcuts
├── CLAUDE.md ← AI assistant guide
├── ARCHITECTURE.md ← Full technical reference
├── SECURITY.md ← Threat model, PII policy, disclosure process
└── CHANGELOG.md ← Versioned history (v0.1 → v4.0)
# 1. Replace the data loader with your source
# pipeline/agents/data_agent.py — swap load_telecom_transcripts()
# with your Genesys, S3, Snowflake, SFTP, or Twilio connector.
# Transcripts need: call_id, transcript_text, call_date
# 2. Switch models in one line
# pipeline/config.py → EXTRACTION_MODEL = "claude-opus-4-7"
# All downstream pricing, clients, rate limiters update automatically.
# 3. Calibrate cost model to your actuals
# pipeline/aggregator.py → COST_PER_CALL_USD = <your figure>
# 4. Add domain-specific enums to the extraction schema
# prompts/system_prompt.txt → intent categories, cost_driver taxonomy, etc.The 7-node architecture is designed for this. Only data_agent.py changes when you swap data sources. QA, aggregation, insights, security, and export are data-source agnostic.
talkmap/telecom-conversation-corpus — MIT License
3.73M turns · ~200K conversations · synthetic telecom customer care (realistic, not real recordings). Pre-downloaded to telecom_200k.csv for zero-latency local runs; HuggingFace streaming activates automatically when the local file is absent.
ARCHITECTURE.md— Multi-agent design, agentic control loops, decision traceability, security layer, state schemaSECURITY.md— Threat model, PII handling, API key security, vulnerability disclosureCHANGELOG.md— Version history from v0.1 to v4.1
Vinoth N — AI systems engineer with hands-on experience designing and shipping production-grade autonomous agentic AI systems.
This project demonstrates complete ownership of a v4.0 agentic AI system aligned with Anthropic's agentic AI framework: ReAct control loops, Chain-of-Thought prompting, self-reflective deliberation, decision traceability, semantic vector memory, multi-layer security, plug-and-play model architecture, LangGraph orchestration, LLM prompt engineering, quality assurance, governance, observability, and developer tooling — built at the standard a production agentic AI company would actually ship.
Open to partnerships in building the agentic AI future:
| Mode | What that looks like |
|---|---|
| Employee | AI/ML Engineer · LLM Platform Engineer · Agentic Systems Engineer at an AI-first company |
| Co-founder | Technical co-founder for an agentic AI or enterprise SaaS venture |
| Consultant | Agentic AI system design · LLM pipeline architecture · multi-agent frameworks (LangGraph, Claude, NVIDIA NIM) |
| Contract | Fixed-scope delivery: pipeline builds, LLM integrations, autonomous agent systems |
Copyright © 2026 Vinoth N. All rights reserved. Proprietary — not open source.
Dataset: talkmap/telecom-conversation-corpus — MIT License (original authors).