Self-hosted LLM model merging β entirely from your browser.
Combine open-weight language models without writing a single line of code.
Profile your hardware, pick your models, hit merge, download the result.
No GPU? CPU-only mode is a first-class citizen.
What is MergeForge? Β· Features Β· Installation Β· Quick Start Β· API Reference Β· Roadmap
The top open-weight models on the HuggingFace leaderboards are merges of merges of merges β but the tooling that makes that possible has always lived at the Python power-user level. Notebooks, YAML files, command-line incantations. MergeForge tears down that wall.
It's a production-grade, self-hosted web application that wraps the mergekit library and adds everything it's missing: a real UI, hardware awareness, real-time progress, automatic quality evaluation, GGUF compression, and multi-user rate limiting. You open a browser, pick models, click merge, and walk away.
Blend a coding model + a reasoning model β download a 3Γ smaller GGUF, ready for llama.cpp
Every other option asks you to discover problems the hard way. MergeForge tells you before you start.
| The pain | What other tools do | What MergeForge does |
|---|---|---|
| "Do I have enough RAM?" | You find out 30 min into a crash. | Hardware profiled at boot β impossible merges are hidden before you start. |
| "Will this take 5 min or 5 hours?" | Vague README guesses. | Honest ETA per tier based on your actual CPU/GPU/RAM. |
| A 7B merge silently hangs | The process freezes; you SSH in to investigate. | Stall watchdog + auto-retry + cache cleanup β fails loud with a real error, never hangs. |
| "Is the merged model any good?" | You manually prompt-test it and eyeball the results. | Automatic perplexity score (0β100) + 3 inference probes on every completed merge. |
| A 14 GB merged model to share | You upload raw safetensors and hope. | Built-in GGUF Q4_K_M export via llama.cpp β typically 3Γ smaller, works with llama.cpp / ollama / LM Studio. |
| One person abusing a shared server | Single-user notebook. | Tier-based daily rate limits (free / pro / enterprise) with admin override. |
- Wraps mergekit β the de-facto open-source LLM merging library
- Merge methods:
linear,slerp,ties,dare_ties,passthrough - Arbitrary per-source weight and density configuration
- Hard timeout per attempt (stall watchdog) + 2-hour absolute cap
- Auto-retry on transient failures (lazy_unpickle bug, OOM, partial downloads)
- HF cache hygiene β orphaned
.incompletefiles swept before and after every job
- Perplexity score computed on a built-in validation corpus
- 3 inference probes for coherence checks
quality_score(0β100) with human-readable summary stored on every job- Colour-coded in the UI: π’ > 80 Β· π‘ > 60 Β· π΄ < 60
- Automatic GGUF Q4_K_M export via
llama.cppconvert + quantize after every successful merge - Both formats (SafeTensors + GGUF) downloadable from one page
- Failure-safe β a broken GGUF export never blocks or invalidates the merge itself
| Tier | Daily merges | Intended for |
|---|---|---|
free |
3 / day | Hobbyist, evaluation |
pro |
20 / day | Power user, small team |
enterprise |
Unlimited | Organisation-wide deployment |
- Token-based auth via a 30-word mnemonic β no passwords, no email required
- Admin endpoint to change any user's tier via
X-Admin-Secretheader - Daily counters reset at UTC midnight
- Auto-detects CPU cores, RAM, swap, VRAM (via nvidia-smi), and free disk
- Maps the machine to one of four tiers (CPU-only β Ultra-scale)
- Incompatible models are hidden in the catalog with a clear explanation, not surfaced as runtime errors
- Live resource monitor on the dashboard
- Top 10 merges ranked by automated quality score
- Per-job
is_publictoggle (private by default) - No auth required to read β ideal for community discovery
- All subprocesses run in their own process group β cancellable from the UI
- Background async worker keeps the API responsive while a merge runs
- Restart-resilient: queued jobs re-enqueued on boot; stale "running" states automatically marked failed
- CORS, env-variable-driven config, MongoDB indexes on all hot paths
| Capability | MergeForge | mergekit CLI | LM Studio | Axolotl | HF AutoTrain |
|---|---|---|---|---|---|
| Browser UI for merging | β | β | β | β | |
| Hardware-aware model filtering | β | β | β | β | |
| Real-time progress + stall watchdog | β | β | β | β | |
| Auto perplexity scoring | β | β | β | β | β |
| Auto GGUF Q4_K_M export | β | β | β | β | β |
| Multi-user with rate limits | β | β | β | β | β cloud |
| Cancellable jobs from UI | β | β | β | β | β |
| Self-hosted, MIT licensed | β | β | β | β | β |
| CPU-only first-class support | β | β | β | β | β |
The short version: mergekit is the engine. MergeForge is the UI, multi-tenancy, quality scoring, compression, and deployment layer on top.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BROWSER (React 18) β
β Landing Β· Auth Β· Dashboard Β· Models Β· Create Β· Jobs Β· β
β Job Detail Β· Hardware Β· Leaderboard β
βββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β REST (Bearer token)
βββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND (port 8001) β
β Auth Β· Catalog Β· Validation Β· Job Queue Β· Rate Limiting Β· β
β Worker Β· Quality Eval Β· GGUF Export Β· Admin Β· Leaderboard β
ββββββββ¬ββββββββββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββββββ βββββββββββββββ
β MongoDB β β mergekit β β llama.cpp β
β users, β β (subprocess) β β convert + β
β jobs β ββββββββ¬ββββββββ β quantize β
βββββββββββ β βββββββββββββββ
βββββββΌββββββββ
β HF Cache β
β (workspace) β
βββββββββββββββ
How a merge flows through the system:
POST /api/merge/createβ validate inputs, check rate limit, enqueue job- Async worker picks the job and writes a mergekit YAML config
- Each HF model is downloaded sequentially under the stall watchdog
mergekit-yamlsubprocess runs; stdout is streamed in real-time to MongoDB logs- On success: perplexity eval β GGUF convert + quantize run as background tasks
- HF cache is wiped, output is packaged as a tar, download links become available
# System packages
sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3-pip git \
build-essential cmake nodejs mongodb-org curl
# Yarn (frontend package manager)
sudo npm install -g yarn
# Clone
git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge
# Backend
python3.11 -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt
# Frontend
cd frontend && yarn install && yarn build && cd ..
# MongoDB
sudo systemctl enable --now mongod
# Config
cp backend/.env.example backend/.env
# Edit backend/.env β set your paths and change ADMIN_SECRET
# Start (two terminals)
cd backend && ../venv/bin/python -m uvicorn server:app --host 0.0.0.0 --port 8001
cd frontend && yarn preview --host 0.0.0.0 --port 7070Open http://localhost:7070 π
brew install python@3.11 node yarn mongodb-community cmake git
brew services start mongodb-community
git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge
python3.11 -m venv venv && source venv/bin/activate
pip install -r backend/requirements.txt
cd frontend && yarn install && yarn build && cd ..
cp backend/.env.example backend/.env
cd backend && ../venv/bin/python -m uvicorn server:app --host 0.0.0.0 --port 8001 &
cd frontend && yarn preview --host 0.0.0.0 --port 7070Apple Silicon: mergekit uses PyTorch, which has full MPS support. On M-series hardware the profiler will classify you as Tier 2 and unlock 30B merges.
Kali is Debian-based, so the Ubuntu steps apply. The only difference is that you need to add MongoDB's official apt repo manually first:
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
echo "deb [signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg] \
https://repo.mongodb.org/apt/debian bookworm/mongodb-org/7.0 main" | \
sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
sudo apt-get update && sudo apt-get install -y mongodb-org python3.11 \
python3.11-venv nodejs yarn build-essential cmake git
sudo systemctl enable --now mongod
# Then follow Ubuntu steps from "Clone" onward.sudo pacman -Syu --needed python python-pip nodejs yarn \
mongodb-bin cmake base-devel git
sudo systemctl enable --now mongodb
git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge
python -m venv venv && source venv/bin/activate
pip install -r backend/requirements.txt
cd frontend && yarn install && yarn build && cd ..
cp backend/.env.example backend/.env
# Then start backend + frontend (see Ubuntu step 8).git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge
docker compose up -dSpins up three containers: mongo, mergeforge-backend (port 8001), and mergeforge-frontend (port 7070). Persistent volumes for the HF cache and merge output are configured by default.
Give the backend container as much CPU and RAM as you can. 8 GB minimum for 1B-scale merges; 24 GB+ recommended for 7B-scale.
- Visit
http://localhost:7070 - Click Generate signup token β save the 30-word phrase
- Open Model Catalog β pick two models marked compatible with your hardware
- Open New Merge β choose a method (start with
linear), set weights, click Create - Watch real-time logs in Merge Jobs β [your job]
- On completion you'll see the quality score, the SafeTensors download, and (after ~30 s) the GGUF download
- Toggle Public to enter your merge on the leaderboard at
/leaderboard
backend/.env
MONGO_URL=mongodb://127.0.0.1:27017
DB_NAME=mergeforge
BACKEND_PORT=8001
# Point these at a disk with plenty of space β models are large
WORKSPACE_DIR=/var/lib/mergeforge/workspace
HF_CACHE_DIR=/var/lib/mergeforge/workspace/hf_cache
# Public-facing frontend URL (used for CORS and share links)
PUBLIC_BASE_URL=http://localhost:7070
# Change this to a random string before exposing the server to a network
ADMIN_SECRET=please-generate-a-random-string
# Optional: HuggingFace token for gated models
# HF_TOKEN=hf_xxxfrontend/.env
VITE_BACKEND_URL=http://localhost:8001| Method | Path | Auth | Description |
|---|---|---|---|
POST |
/api/auth/signup |
β | Create user, receive 30-word token |
POST |
/api/auth/login |
β | Login with token |
GET |
/api/auth/me |
Bearer | Current user including tier |
GET |
/api/usage/today |
Bearer | Daily merge usage and limit |
GET |
/api/models |
β | Catalog filtered by host hardware |
POST |
/api/merge/validate |
Bearer | Dry-run validation + ETA |
POST |
/api/merge/create |
Bearer | Enqueue a merge (rate-limited) |
GET |
/api/merge/jobs |
Bearer | List your jobs |
GET |
/api/merge/jobs/{id} |
Bearer | Job detail including quality score and GGUF status |
POST |
/api/merge/jobs/{id}/cancel |
Bearer | Kill a running merge |
PATCH |
/api/merge/jobs/{id}/visibility |
Bearer | Toggle public / private |
GET |
/api/merge/jobs/{id}/download |
Token in query | Stream SafeTensors tar |
GET |
/api/merge/jobs/{id}/download/gguf |
Token in query | Stream GGUF Q4_K_M |
GET |
/api/leaderboard |
β | Top 10 public merges (no auth required) |
POST |
/api/admin/tier |
X-Admin-Secret |
Set a user's tier |
GET |
/api/hardware/profile |
β | Static hardware tier |
GET |
/api/hardware/live |
β | Live CPU / RAM / disk |
GET |
/api/dashboard/stats |
Bearer | All-in-one dashboard payload |
The repo ships a self-contained pipeline test that exercises the full happy path against two tiny (<200 MB) models:
cd MergeForge
venv/bin/python backend/test_pipeline.pyExpected output:
[PASS] signup creates token
[PASS] create merge accepts request
[PASS] merge job reaches terminal state in time :: final=completed
[PASS] merge job completed successfully
[PASS] output directory exists
[PASS] download endpoint returns >1MB file :: bytes=271656960
[PASS] quality_score is computed :: score=79.5 summary=Good β perplexity 33.86, 3/3 inference tests passed
=== 7 passed, 0 failed ===
Exit code 0 on full pass β drop it straight into CI.
| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError on backend start |
Venv not activated | Run source venv/bin/activate first |
| Frontend shows a blank page | VITE_BACKEND_URL wrong or CORS mismatch |
Check frontend/.env, rebuild with yarn build |
| Merge stuck at 30% on 7B models | Disk pressure or OOM | Free disk space, increase swap, retry β the watchdog logs the exact cause |
| GGUF column shows "convertingβ¦" indefinitely | llama.cpp build failed |
Install cmake + build-essential, then re-run the merge |
| Quality score never appears | Eval subprocess ran out of memory | Use a machine with 4 GB+ free RAM; null defaults to "Eval failed" |
401 Invalid token on every request |
Token expired or cleared | Re-login at /auth |
| Daily limit error on your first merge | UTC midnight rollover | The counter is UTC-based, not local time |
Still stuck? Check logs/backend.err.log β every subprocess line is mirrored there.
- WebSocket-based live log streaming (replace polling)
- Additional merge methods:
dare_linear,model_stock,breadcrumbs - Direct push of merged model to HuggingFace Hub
- Multi-host distributed merging (split layers across nodes)
- In-browser chat playground to test merged models without downloading
- Stripe-backed paid tiers
- Community ratings and comments on the leaderboard
PRs welcome. Before submitting:
- Run
backend/test_pipeline.pyβ all 7 tests must pass - Keep components small and readable
- Do not break the CPU-only Tier 1 path β it's the whole point
MIT β do whatever you want with it.
Built with π₯ for the open-weight model community.
If MergeForge helped you ship a banger, drop a star β