Skip to content

pushpitkamboj/AnimAI

Repository files navigation

VTX Labs

AnimAI

Prompt -> researched lesson -> planned animation -> Manim code -> rendered STEM video.

Website | Gallery | About | Pipeline | Run locally

Python 3.10+ FastAPI LangGraph Manim Docker Langfuse Cloudflare R2

AnimAI is the backend and rendering system behind VTX Labs, a prompt-to-video product for STEM explanations. The product goal is simple: let someone type a topic like "draw the equation of SHM" or "explain binary search" and get a short animated video that teaches the idea visually.

The engineering work is less simple. A good educational video needs factual grounding, scene planning, Manim-aware code generation, render isolation, retry logic, video storage, and observability. This repo turns that complexity into a staged production pipeline.

Product Snapshot

VTX Labs chat interface showing a generated circle explanation video

The frontend experience is a chat-style prompt box that returns a playable generated video. The screenshot above shows the product flow after the prompt draw a circle: AnimAI returns a Manim-rendered explainer video directly inside the chat.

Visual Architecture

This is the current architecture sketch used for the system overview.

Architecture At A Glance

AnimAI architecture sketch showing frontend, auth, Chroma DB, FastAPI pipeline, observability, R2 storage, and the Manim worker

The Story: From Prompt To Video

The system is easiest to understand as one handoff chain. Each step shrinks a different kind of risk: bad prompts, weak facts, vague visuals, hallucinated APIs, failed renders, or lost video artifacts.

Step Agent / layer Receives Produces Why it matters Source
1 API boundary POST /run with prompt, language, and client request Cached URL, non-animation reply, or LangGraph invocation Owns rate limits, cache lookup, tracing, and HTTP error handling before expensive work starts src/api/main.py
2 Prompt Gatekeeper Raw user prompt animation=true or short out-of-scope reply Prevents greetings, nonsense, backend probing, or unsupported requests from reaching the render pipeline analyze_user_prompt.py
3 Grounding Router Accepted animation prompt Domain, route, time sensitivity, named entities Keeps generic concepts fast while forcing real-world or time-sensitive topics to be grounded research_router.py
4 Topic Researcher Prompt, route info, optional web evidence Factual brief with key facts, visual elements, process steps, misconceptions, sources Turns the user's idea into a lesson plan before visuals or code are invented research_topic.py
5 Scene Director Topic brief and route info One SceneSpec plus ordered ShotPlan items Converts knowledge into screen continuity: objects, narration, movement, and teaching goals plan_video.py
6 Manim Librarian Shot plan, scene spec, topic brief Manim API chunks, examples, allowed symbols, retrieval notes Reduces hallucinated Manim calls by grounding each shot in real docs and examples src/rag/retriever.py
7 Code Architect Scene plan and retrieval evidence Code outline: scene class, helpers, shot functions, transitions Locks structure before final code so the scene does not drift or reset randomly generate_code.py
8 Manim Coder Code outline, shot evidence, target language Executable Manim Python scene class Writes renderable code with VoiceoverScene, GTTSService, localized labels, and deterministic validation generate_code.py
9 Render Runner Code, scene name, request id, trace context Worker job id, status polling, render result Keeps generated-code execution outside the API and turns worker failures back into graph state execute_code.py
10 Render Worker Worker job payload MP4 artifact or render error Runs Manim in a request-scoped temp directory with timeout, artifact lookup, and cleanup manim-worker/app.py
11 Repair Specialist / Simplifier Render error, failed code, scene plan Repaired code or simpler fallback code Gives the system another shot when Manim fails, then favors a simpler successful video over a perfect broken one regenerate_code.py
12 API response + cache Final workflow state Public video URL, cached result, or error payload Returns something the frontend can play later and avoids regenerating near-identical successful prompts src/api/main.py

That full handoff is wired in src/agent/graph.py, with shared state defined in src/agent/graph_state.py. Observability hooks live in src/observability/langfuse.py.

Few examples how videos are generated based on what prompt: checkout https://vtxlabs.space/gallery

Prompt idea What AnimAI has to produce Video URL
Draw the equation of SHM Graph, oscillator intuition, narrated motion Watch
Plot a 3D spiral curve expanding along the Z-axis 3D path, camera-safe framing, readable labels Watch
Explain binary search in an array Step-by-step algorithm animation Watch
Draw y = x^3 Clean axes, curve reveal, key behavior Watch
Visualize electron orbits in a hydrogen atom Conceptual atomic model with caveats Watch
Draw y = sin(x) from -pi to pi Periodic curve, marked extrema, narration Watch

Agentic RAG Layer

AnimAI does not ask the model to "just write Manim." It first breaks the Manim docs into AST-aware parent/child chunks, adds synthetic symbol chunks, and stores embeddings in Chroma so retrieval has real API structure to aim at.

Inside the LangGraph get_chunks node, retrieval is shot-specific: query building follows the planned scene, hybrid search mixes dense Chroma lookup with BM25 and exact symbol matches, and reranking narrows the allowed Manim surface before code generation starts.

To go deeper, read src/rag/chunks.py for AST chunking, src/chroma_utils.py for Chroma storage, src/agent/map_reduce.py for the get_chunks node, and src/rag/retriever.py with src/rag/query_builder.py for retrieval strategy.

Local Setup

Docker

docker compose up --build

Services:

Service URL
API http://localhost:8000
Manim worker http://localhost:8080

Direct Run

python -m uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

Run the worker separately:

uvicorn app:app --app-dir manim-worker --host 0.0.0.0 --port 8080 --reload

Try A Prompt

curl -X POST http://localhost:8000/run \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"Draw y = sin(x) from -pi to pi","language":"en"}'

Success response:

{
  "result": "https://.../scene.mp4",
  "status": "success"
}

Other expected statuses:

Status Meaning
non_animation The prompt was outside AnimAI's animation scope.
error Generation or rendering failed after recovery attempts.

Environment

Group Variables
Core OPENAI_API_KEY, MANIM_WORKER_URL
Cache + dense retrieval SEMANTIC_CACHE_ENABLED, CHROMA_OPENAI_API_KEY, CHROMA_OPENAI_EMBEDDING_MODEL, CHROMA_API_KEY, CHROMA_HOST, CHROMA_TENANT, CHROMA_DATABASE
Worker MANIM_RENDER_TIMEOUT_SECONDS, MANIM_QUALITY_FLAG, MANIM_WORKER_POLL_SECONDS, MANIM_WORKER_MAX_WAIT_SECONDS, KEEP_RENDER_ARTIFACTS
Publishing R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET, R2_PUBLIC_BASE_URL, SKIP_UPLOAD, PUBLIC_MEDIA_BASE_URL
Tracing LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL, LANGFUSE_HOST, LANGFUSE_TIMEOUT, LANGFUSE_FLUSH_AT, LANGFUSE_FLUSH_INTERVAL, LANGFUSE_TRACING_ENVIRONMENT, LANGFUSE_AUTH_CHECK_ON_STARTUP

Deploy

Production backend deployment is configured for Hostinger VPS.

Piece File
Docker Compose stack docker-compose.hostinger.yml
GitHub deployment workflow .github/workflows/deploy-hostinger.yml

The deployed stack runs:

api -> http://manim-worker:8080 -> Cloudflare R2 or local media publishing

Tests

pytest -q

Coverage focuses on API behavior, graph compilation, configuration, worker behavior, language registry, retrieval, research topic flow, Chroma utilities, and render-client behavior.

Langfuse

Langfuse gives us prompt versions, traces, and logs across the API and worker, so we can inspect failures as one connected run instead of chasing them across services.

It also gives us cost visibility per request and prompt path, which helps us trim expensive branches, compare prompt changes, and keep generation quality high without wasting tokens.

Legacy

These files are retained as older fine-tune graph experiments and are not the active production workflow:

Legacy path Status
src/agent/graph_fine_tune.py Reference only
src/agent/generate_code_fine_tune.py Reference only
src/agent/regenerate_code_fine_tune.py Reference only
src/agent/fine_tune_agent Reference only

The active production workflow is src/agent/graph.py.

License

See LICENSE.

About

generate educational videos with prompts (more accurate than Sora for maths and science related videos)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors