structured-extraction

Here are 40 public repositories matching this topic...

NameetP / pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

python pdf ocr mcp self-healing structured-extraction rag pdf-to-json pdf-extraction ai-agent llm document-parsing pdf-to-markdown docling opendataloader

Updated Jun 10, 2026
Python

jndiogo / sibila

Star

Extract structured data from local or remote LLM models

python ai openai structured-data gpt structured-extraction dataclasses local-models pydantic large-language-models llamacpp llm-inference local-ai gguf structured-generation

Updated Jun 21, 2024
Python

xinyuran / RLRefine

Star

A schema-driven framework for LLM structured extraction enhanced by multi-stage RL training (SFT→DPO→GRPO), with interpretable reward design and end-to-end reproducibility.

nlp reinforcement-learning chinese-nlp lora structured-extraction rlhf reward-model qwen grpo

Updated May 9, 2026
Python

sputnicyoji / Structured-Extractor

Star

Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.

python nlp knowledge-graph structured-extraction claude-code post-processing-pipeline

Updated Feb 9, 2026
Python

doctruthhq / DocTruth

Star

Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.

Updated Jun 7, 2026
Java

adi2355 / MCP-Server-Collection

Star

Collection of purpose-built MCP servers for AI agent workflows.

python typescript mcp web-scraping data-extraction jsonpath ai-agents structured-extraction llm deepseek firecrawl model-context-protocol mcp-server codebase-analysis agent-workflows

Updated Apr 7, 2026
HTML

awaithumans / awaitverify-managed-document-verification-pdf-ocr-extraction

Star

Send your low-confidence document extractions. A human reviews them against the PDF and returns a typed Pydantic/Zod response. Managed document verification for AI agents. PDF + handwritten OCR. Client-side fragmentation: full document never leaves your machine. $0.80/page + $5 free credit. Express 30-min SLA. Built on open source awaithumans.

Updated Jun 2, 2026

vikyw89 / llmtext

Star

A simple llm library

python agent async asynchronous openai gpt structured-extraction tool-use instructor large-language-models llm chatgpt prompt-optimization agentic-ai

Updated May 19, 2026
Python

chigwell / news-summizr

Sponsor

Star

news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.

pattern-matching data-analysis structured-extraction reporting-tools news-summary key-information-extraction workflow-integration headline-analysis retry-mechanisms reliable-output concise-summarization labeled-summaries

Updated Dec 21, 2025
Python

chigwell / summaryxtract

Sponsor

Star

A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt

Updated Dec 21, 2025
Python

vigneshc94 / tutorscribe

Star

Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs

tutorial video whisper claude tradingview structured-extraction pine-script anthropic

Updated Apr 28, 2026
Python

BhaveshBytess / Research-Paper-Analyzer

Star

Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.

nlp machine-learning pdf-parsing scientific-papers structured-extraction academic-research evidence-extraction streamlit-app llm research-paper-analysis

Updated Nov 14, 2025
Python

Mathos34 / cv-extract-json

Star

Structured CV extraction with strict JSON schema and anti-hallucination guarantees.

nlp transformers pytorch information-extraction ner structured-extraction pydantic

Updated Jun 16, 2026
Python

jonmoubayed / ezpz-evals

Star

Local-first eval harness for unstructured-document extraction — compare LLMs, OCR/IDP tools, and strategies (cascade/ensemble/verify) on the same cohort.

python benchmark ocr information-extraction gemini openai idp structured-extraction document-extraction llm anthropic evals ollama llm-evaluation

Updated Jun 17, 2026
Python

FAIRmat-NFDI / extract-eval

Star

schema-driven evaluation for LLM JSON extraction, json evaluation, structured-extraction, benchmark

benchmark structured-extraction schema-driven-evaluation-for-llm-json-extraction extracted-json-evaluation

Updated Jun 11, 2026
Python

ademakdogan / prompt-optimizer

Star

Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.

openai structured-extraction ai-automation llm prompt-engineering prompt-versioning prompt-optimization llm-optimization

Updated Feb 2, 2026
Python

LLMSystems / SEC-10-K-Structured-Extraction

Star

Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item

information-extraction financial-data xbrl sec-edgar annual-reports structured-extraction rule-based-parsing form-10k

Updated Jun 10, 2026
Python

Adarsh-Menon / Zarik-Travel-Agency-Support

Star

AI-powered travel agency assistant (*) a LangGraph stateful agent on Telegram that captures preferences through natural conversation, generates personalized itineraries via Groq/Llama 3.3, auto-manages leads in Excel, and remembers returning users. Built with LangChain, FastAPI, and python-telegram-bot.

Updated May 1, 2026
Python

xinyuran / review-extract-agent

Star

ReAct-based intelligent analysis Agent with 4-layer architecture (Skill-Agent-LLMService-Tool), dual tool-calling modes (Native FC / Prompt-based), triple execution engine (Offline/Fast/Agent), incremental reflection with convergence detection, Skill template system, SSE streaming, Prometheus monitoring, and SFT trajectory export.

sentiment-analysis chinese-nlp keyword-extraction structured-extraction self-reflection pydantic fastapi rlhf vllm llm-agents react-agent tool-calling

Updated May 28, 2026
Python

vstorm-co / blog-content

Star

Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.

retrieval embeddings sparse-vectors structured-extraction rag vector-search vector-database

Updated Oct 30, 2025
Python

Improve this page

Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structured-extraction

Here are 40 public repositories matching this topic...

NameetP / pdfmux

jndiogo / sibila

xinyuran / RLRefine

sputnicyoji / Structured-Extractor

doctruthhq / DocTruth

adi2355 / MCP-Server-Collection

awaithumans / awaitverify-managed-document-verification-pdf-ocr-extraction

vikyw89 / llmtext

chigwell / news-summizr

chigwell / summaryxtract

vigneshc94 / tutorscribe

BhaveshBytess / Research-Paper-Analyzer

Mathos34 / cv-extract-json

jonmoubayed / ezpz-evals

FAIRmat-NFDI / extract-eval

ademakdogan / prompt-optimizer

LLMSystems / SEC-10-K-Structured-Extraction

Adarsh-Menon / Zarik-Travel-Agency-Support

xinyuran / review-extract-agent

vstorm-co / blog-content

Improve this page

Add this topic to your repo