Structured Paper Analysis and Research Memory for Knowledge-Grounded Research Agents
Let every idea have a source, and every judgment have an anchor.
π₯ ResearchFlow Community | π¬ WeChat / ResearchFlow WeChat Group
π₯ News: PaperBite is the public evidence vault derived from the ResearchFlow analysis framework, primarily covering
L0-L3. If you work on AI-related research, it is a strong starting point for building your own evidence vault.
What is ResearchFlow? ResearchFlow is a local-first workflow framework that transforms paper analysis into structured notes and builds a persistent, reusable research memory.
Who is this for? Researchers building paper-grounded knowledge bases, agent-assisted literature workflows, or evidence-backed idea generation.
π§ Knowledge first, not execution first. Many AI research tools focus on helping you run experiments or draft papers. ResearchFlow focuses on the upstream question: when an agent makes a research decision, does it have enough structured, searchable paper evidence in hand?
π§© Turn structured paper analysis into reusable research memory. ResearchFlow organizes paper PDFs and paper lists into layered local assets: source literature, single-paper evidence units, domain knowledge surfaces, cross-domain evidence accumulation, and downstream idea or experiment records.
πͺΆ Local-first with low lock-in. The default workflow is local files only: PDFs, Markdown notes, JSONL indexes, and idea notes all live under
obsidian-vault/. Normal use does not require a server, database, or service deployment.
π‘ ResearchFlow is a methodology and local knowledge workflow, not a closed platform. What matters is the layered research assets you keep accumulating.
ResearchFlow is not centered on idea generation in isolation. The core claim is that research directions should emerge from an accumulated, structured, and traceable evidence base, then be stress-tested before execution.
This diagram shows ResearchFlow's six-layer asset hierarchy: L0-L3 (knowledge building, powered by PaperBite), L4 (emergence), and L5 (validation).
The table below follows the diagram from bottom to top:
| Level | Output | Role |
|---|---|---|
L0 |
paper PDFs | preserve source literature |
L1 |
single-paper analysis | extract idea, design, and evidence |
L2 |
Domain Research Vault | support domain-level induction and deduction |
L3 |
Cross-Domain Research Vault | support transfer and idea emergence |
L4 |
Idea Vault | emergence layer |
L5 |
Experiment Vault | validation layer |
Give ResearchFlow a research direction, and it helps you build the knowledge base step by step:
collect candidate papers / import local PDFs
-> batch MinerU PDF parse
-> structured paper analysis
-> index
-> query / ideate / review / export
You can use it in four common modes:
| Mode | Purpose | Typical entry |
|---|---|---|
| Build | Collect candidates, batch-parse PDFs, analyze papers, and refresh the index | research-workflow |
| Query | Retrieve papers by topic, task, method, venue, year, title, or technique tags | papers-query-knowledge-base |
| Decision | Compare methods before choosing baselines, changing a design, or writing related work | papers-query-knowledge-base |
| Idea | Generate, focus, and stress-test research directions grounded in the local knowledge base | research-brainstorm-from-kb, idea-focus-coach, reviewer-stress-test |
git clone https://github.com/<your-username>/ResearchFlow.git
cd ResearchFlow
conda env create -f environment/environment.yml
conda activate researchflowCreate a repo-root .env when you need model keys, model names, or parser
overrides. Use environment/.env.example as a
reference.
MinerU is the upstream batch PDF parsing stage, not the structured analysis stage itself. ResearchFlow is designed to reuse MinerU outputs before running analysis. Minimal verification: mineru --help should run, or .env should set MINERU_CLI_PATH.
For medium and large paper collections, batch MinerU parsing should happen before structured analysis. ResearchFlow analysis should preferentially reuse prepared MinerU outputs through --mineru-output or --mineru-output-root instead of reparsing PDFs during analysis.
/research-workflow
I want to build a knowledge base for controllable motion generation from PDFs.
Please tell me the next step and the expected outputs.
Build a topic knowledge base from scratch
/research-workflow
I want to build a knowledge base for text-driven reactive motion generation.
Start by collecting candidate papers and tell me which skill to use at each stage.
Collect candidate papers from a GitHub paper list
/papers-collect-from-github-repo
Collect papers related to controllable human motion generation from this GitHub repository: <URL>
Keep only items related to diffusion, controllability, real-time generation, or long-form motion.
Write a candidate list suitable for the downstream download workflow.
Run the formal local analysis chain
Reuse existing MinerU output first when available:
python3 scripts/run_local_paper_analysis.py \
--mineru-output "<mineru_output_dir>" \
--paper-pdf "obsidian-vault/paperPDFs/<Category>/<Venue_Year>/<Paper>.pdf" \
--conf-year "<Venue_Year>" \
--export-vaultIf no cached parse exists, the runner can also invoke MinerU during a single-paper run:
python3 scripts/run_local_paper_analysis.py \
--pdf "obsidian-vault/paperPDFs/<Category>/<Venue_Year>/<Paper>.pdf" \
--conf-year "<Venue_Year>" \
--export-vaultFor batch analysis, require reuse of prepared MinerU outputs:
python3 scripts/run_paper_list_analysis.py \
--source obsidian-vault/paper_list.csv \
--state Downloaded \
--mineru-output-root "<mineru_output_root>" \
--require-existing-mineru-output| Need | Skill |
|---|---|
| Decide the next pipeline step | research-workflow |
| Collect candidates from web pages | papers-collect-from-web |
| Collect candidates from GitHub paper lists | papers-collect-from-github-repo |
| Download PDFs from a triage list | papers-download-from-list |
| Generate a deep single-paper report | paper-report |
| Rebuild the local index | papers-build-index |
| Query or compare papers from local notes | papers-query-knowledge-base |
| Generate grounded research ideas | research-brainstorm-from-kb |
| Focus an idea into an executable plan | idea-focus-coach |
| Run reviewer-style stress tests | reviewer-stress-test |
| Export share-ready Markdown | notes-export-share-version |
See .claude/skills/README.md for the full skill map.
ResearchFlow intentionally stays plain: folders, Markdown, JSONL, CSV, and
SKILL.md. The same research memory can therefore be shared by multiple agents:
- Claude Code / Cursor can read
.claude/skillsdirectly. - Codex CLI can use
scripts/setup_shared_skills.pyto generate local aliases. - Other agents can read
obsidian-vault/index/index.jsonlandobsidian-vault/analysis/directly.
<a id="codex-cli-compat"></a>
Codex CLI compatibility
Claude Code / Cursor does not need this step. Codex CLI does.
python3 scripts/setup_shared_skills.py
python3 scripts/setup_shared_skills.py --check<a id="obsidian-config"></a>
Obsidian setup
- Obsidian is optional but recommended as a visualization layer.
- Open
obsidian-vault/as an Obsidian vault if you want graph view, backlinks, and manual browsing. - Do not treat Obsidian pages as a separate source of truth.
@misc{lin2026researchflow,
title = {{ResearchFlow}: A Structured Paper Analysis Framework for Knowledge-Grounded Research},
author = {Jingzhong Lin and Ziheng Huang},
year = {2026},
howpublished = {\url{https://github.com/RipeMangoBox/ResearchFlow}},
note = {GitHub repository}
}MIT


