PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Jun 10, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
A schema-driven framework for LLM structured extraction enhanced by multi-stage RL training (SFT→DPO→GRPO), with interpretable reward design and end-to-end reproducibility.
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.
Collection of purpose-built MCP servers for AI agent workflows.
Send your low-confidence document extractions. A human reviews them against the PDF and returns a typed Pydantic/Zod response. Managed document verification for AI agents. PDF + handwritten OCR. Client-side fragmentation: full document never leaves your machine. $0.80/page + $5 free credit. Express 30-min SLA. Built on open source awaithumans.
A simple llm library
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
Structured CV extraction with strict JSON schema and anti-hallucination guarantees.
Local-first eval harness for unstructured-document extraction — compare LLMs, OCR/IDP tools, and strategies (cascade/ensemble/verify) on the same cohort.
schema-driven evaluation for LLM JSON extraction, json evaluation, structured-extraction, benchmark
Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.
Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item
AI-powered travel agency assistant (*) a LangGraph stateful agent on Telegram that captures preferences through natural conversation, generates personalized itineraries via Groq/Llama 3.3, auto-manages leads in Excel, and remembers returning users. Built with LangChain, FastAPI, and python-telegram-bot.
ReAct-based intelligent analysis Agent with 4-layer architecture (Skill-Agent-LLMService-Tool), dual tool-calling modes (Native FC / Prompt-based), triple execution engine (Offline/Fast/Agent), incremental reflection with convergence detection, Skill template system, SSE streaming, Prometheus monitoring, and SFT trajectory export.
Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."