ResearchFlow

Structured Paper Analysis and Research Memory for Knowledge-Grounded Research Agents

Let every idea have a source, and every judgment have an anchor.

🔥 ResearchFlow Community | 💬 WeChat / ResearchFlow WeChat Group

🔥 News: PaperBite is the public evidence vault derived from the ResearchFlow analysis framework, primarily covering L0-L3. If you work on AI-related research, it is a strong starting point for building your own evidence vault.

What is ResearchFlow? ResearchFlow is a local-first workflow framework that transforms paper analysis into structured notes and builds a persistent, reusable research memory.

Who is this for? Researchers building paper-grounded knowledge bases, agent-assisted literature workflows, or evidence-backed idea generation.

🧠 Knowledge first, not execution first. Many AI research tools focus on helping you run experiments or draft papers. ResearchFlow focuses on the upstream question: when an agent makes a research decision, does it have enough structured, searchable paper evidence in hand?

🧩 Turn structured paper analysis into reusable research memory. ResearchFlow organizes paper PDFs and paper lists into layered local assets: source literature, single-paper evidence units, domain knowledge surfaces, cross-domain evidence accumulation, and downstream idea or experiment records.

🪶 Local-first with low lock-in. The default workflow is local files only: PDFs, Markdown notes, JSONL indexes, and idea notes all live under obsidian-vault/. Normal use does not require a server, database, or service deployment.

💡 ResearchFlow is a methodology and local knowledge workflow, not a closed platform. What matters is the layered research assets you keep accumulating.

🧠 Core Idea

ResearchFlow is not centered on idea generation in isolation. The core claim is that research directions should emerge from an accumulated, structured, and traceable evidence base, then be stress-tested before execution.

🗂️ Asset Levels

This diagram shows ResearchFlow's six-layer asset hierarchy: L0-L3 (knowledge building, powered by PaperBite), L4 (emergence), and L5 (validation).

The table below follows the diagram from bottom to top:

Level	Output	Role
`L0`	paper PDFs	preserve source literature
`L1`	single-paper analysis	extract idea, design, and evidence
`L2`	Domain Research Vault	support domain-level induction and deduction
`L3`	Cross-Domain Research Vault	support transfer and idea emergence
`L4`	Idea Vault	emergence layer
`L5`	Experiment Vault	validation layer

🎯 How It Works

Give ResearchFlow a research direction, and it helps you build the knowledge base step by step:

collect candidate papers / import local PDFs
  -> batch MinerU PDF parse
  -> structured paper analysis
  -> index
  -> query / ideate / review / export

You can use it in four common modes:

Mode	Purpose	Typical entry
Build	Collect candidates, batch-parse PDFs, analyze papers, and refresh the index	`research-workflow`
Query	Retrieve papers by topic, task, method, venue, year, title, or technique tags	`papers-query-knowledge-base`
Decision	Compare methods before choosing baselines, changing a design, or writing related work	`papers-query-knowledge-base`
Idea	Generate, focus, and stress-test research directions grounded in the local knowledge base	`research-brainstorm-from-kb`, `idea-focus-coach`, `reviewer-stress-test`

🚀 Quick Start

1. Create the conda environment

git clone https://github.com/<your-username>/ResearchFlow.git
cd ResearchFlow
conda env create -f environment/environment.yml
conda activate researchflow

2. Configure model and parser access

Create a repo-root .env when you need model keys, model names, or parser overrides. Use environment/.env.example as a reference.

3. Install or configure MinerU

MinerU is the upstream batch PDF parsing stage, not the structured analysis stage itself. ResearchFlow is designed to reuse MinerU outputs before running analysis. Minimal verification: mineru --help should run, or .env should set MINERU_CLI_PATH.

4. Batch-prepare MinerU outputs first

For medium and large paper collections, batch MinerU parsing should happen before structured analysis. ResearchFlow analysis should preferentially reuse prepared MinerU outputs through --mineru-output or --mineru-output-root instead of reparsing PDFs during analysis.

5. Start from the workflow skill

/research-workflow
I want to build a knowledge base for controllable motion generation from PDFs.
Please tell me the next step and the expected outputs.

📚 Further Reading

📖 Usage Examples

Build a topic knowledge base from scratch

/research-workflow
I want to build a knowledge base for text-driven reactive motion generation.
Start by collecting candidate papers and tell me which skill to use at each stage.

Collect candidate papers from a GitHub paper list

/papers-collect-from-github-repo
Collect papers related to controllable human motion generation from this GitHub repository: <URL>
Keep only items related to diffusion, controllability, real-time generation, or long-form motion.
Write a candidate list suitable for the downstream download workflow.

Run the formal local analysis chain

Reuse existing MinerU output first when available:

python3 scripts/run_local_paper_analysis.py \
  --mineru-output "<mineru_output_dir>" \
  --paper-pdf "obsidian-vault/paperPDFs/<Category>/<Venue_Year>/<Paper>.pdf" \
  --conf-year "<Venue_Year>" \
  --export-vault

If no cached parse exists, the runner can also invoke MinerU during a single-paper run:

python3 scripts/run_local_paper_analysis.py \
  --pdf "obsidian-vault/paperPDFs/<Category>/<Venue_Year>/<Paper>.pdf" \
  --conf-year "<Venue_Year>" \
  --export-vault

For batch analysis, require reuse of prepared MinerU outputs:

python3 scripts/run_paper_list_analysis.py \
  --source obsidian-vault/paper_list.csv \
  --state Downloaded \
  --mineru-output-root "<mineru_output_root>" \
  --require-existing-mineru-output

✨ Core Capabilities

Need	Skill
Decide the next pipeline step	`research-workflow`
Collect candidates from web pages	`papers-collect-from-web`
Collect candidates from GitHub paper lists	`papers-collect-from-github-repo`
Download PDFs from a triage list	`papers-download-from-list`
Generate a deep single-paper report	`paper-report`
Rebuild the local index	`papers-build-index`
Query or compare papers from local notes	`papers-query-knowledge-base`
Generate grounded research ideas	`research-brainstorm-from-kb`
Focus an idea into an executable plan	`idea-focus-coach`
Run reviewer-style stress tests	`reviewer-stress-test`
Export share-ready Markdown	`notes-export-share-version`

See .claude/skills/README.md for the full skill map.

🤖 Agent Compatibility

ResearchFlow intentionally stays plain: folders, Markdown, JSONL, CSV, and SKILL.md. The same research memory can therefore be shared by multiple agents:

Claude Code / Cursor can read .claude/skills directly.
Codex CLI can use scripts/setup_shared_skills.py to generate local aliases.
Other agents can read obsidian-vault/index/index.jsonl and obsidian-vault/analysis/ directly.

Advanced Config

<a id="codex-cli-compat"></a>

Codex CLI compatibility

Claude Code / Cursor does not need this step. Codex CLI does.

python3 scripts/setup_shared_skills.py
python3 scripts/setup_shared_skills.py --check

<a id="obsidian-config"></a>

Obsidian setup

Obsidian is optional but recommended as a visualization layer.
Open obsidian-vault/ as an Obsidian vault if you want graph view, backlinks, and manual browsing.
Do not treat Obsidian pages as a separate source of truth.

Citation

@misc{lin2026researchflow,
  title        = {{ResearchFlow}: A Structured Paper Analysis Framework for Knowledge-Grounded Research},
  author       = {Jingzhong Lin and Ziheng Huang},
  year         = {2026},
  howpublished = {\url{https://github.com/RipeMangoBox/ResearchFlow}},
  note         = {GitHub repository}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.claude		.claude
.obsidian		.obsidian
assets		assets
docs		docs
environment		environment
image		image
linkedCodebases		linkedCodebases
obsidian-vault		obsidian-vault
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
WECHAT.md		WECHAT.md
WECHAT_CN.md		WECHAT_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResearchFlow

🧠 Core Idea

🗂️ Asset Levels

🎯 How It Works

🚀 Quick Start

1. Create the conda environment

2. Configure model and parser access

3. Install or configure MinerU

4. Batch-prepare MinerU outputs first

5. Start from the workflow skill

📚 Further Reading

📖 Usage Examples

✨ Core Capabilities

🤖 Agent Compatibility

Advanced Config

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ResearchFlow

🧠 Core Idea

🗂️ Asset Levels

🎯 How It Works

🚀 Quick Start

1. Create the conda environment

2. Configure model and parser access

3. Install or configure MinerU

4. Batch-prepare MinerU outputs first

5. Start from the workflow skill

📚 Further Reading

📖 Usage Examples

✨ Core Capabilities

🤖 Agent Compatibility

Advanced Config

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages