Skip to content

vikrant-project/MergeForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ MergeForge v2

Self-hosted LLM model merging β€” entirely from your browser.

Combine open-weight language models without writing a single line of code.
Profile your hardware, pick your models, hit merge, download the result.
No GPU? CPU-only mode is a first-class citizen.

License: MIT Python 3.11+ React 18 FastAPI MongoDB

What is MergeForge? Β· Features Β· Installation Β· Quick Start Β· API Reference Β· Roadmap


What is MergeForge?

The top open-weight models on the HuggingFace leaderboards are merges of merges of merges β€” but the tooling that makes that possible has always lived at the Python power-user level. Notebooks, YAML files, command-line incantations. MergeForge tears down that wall.

It's a production-grade, self-hosted web application that wraps the mergekit library and adds everything it's missing: a real UI, hardware awareness, real-time progress, automatic quality evaluation, GGUF compression, and multi-user rate limiting. You open a browser, pick models, click merge, and walk away.

Blend a coding model + a reasoning model β†’ download a 3Γ— smaller GGUF, ready for llama.cpp

Why MergeForge?

Every other option asks you to discover problems the hard way. MergeForge tells you before you start.

The pain What other tools do What MergeForge does
"Do I have enough RAM?" You find out 30 min into a crash. Hardware profiled at boot β€” impossible merges are hidden before you start.
"Will this take 5 min or 5 hours?" Vague README guesses. Honest ETA per tier based on your actual CPU/GPU/RAM.
A 7B merge silently hangs The process freezes; you SSH in to investigate. Stall watchdog + auto-retry + cache cleanup β€” fails loud with a real error, never hangs.
"Is the merged model any good?" You manually prompt-test it and eyeball the results. Automatic perplexity score (0–100) + 3 inference probes on every completed merge.
A 14 GB merged model to share You upload raw safetensors and hope. Built-in GGUF Q4_K_M export via llama.cpp β€” typically 3Γ— smaller, works with llama.cpp / ollama / LM Studio.
One person abusing a shared server Single-user notebook. Tier-based daily rate limits (free / pro / enterprise) with admin override.

Features

Merging Engine

  • Wraps mergekit β€” the de-facto open-source LLM merging library
  • Merge methods: linear, slerp, ties, dare_ties, passthrough
  • Arbitrary per-source weight and density configuration
  • Hard timeout per attempt (stall watchdog) + 2-hour absolute cap
  • Auto-retry on transient failures (lazy_unpickle bug, OOM, partial downloads)
  • HF cache hygiene β€” orphaned .incomplete files swept before and after every job

Post-Merge Quality Evaluation

  • Perplexity score computed on a built-in validation corpus
  • 3 inference probes for coherence checks
  • quality_score (0–100) with human-readable summary stored on every job
  • Colour-coded in the UI: 🟒 > 80 Β· 🟑 > 60 Β· πŸ”΄ < 60

GGUF Compression

  • Automatic GGUF Q4_K_M export via llama.cpp convert + quantize after every successful merge
  • Both formats (SafeTensors + GGUF) downloadable from one page
  • Failure-safe β€” a broken GGUF export never blocks or invalidates the merge itself

Tier-Based Access Control

Tier Daily merges Intended for
free 3 / day Hobbyist, evaluation
pro 20 / day Power user, small team
enterprise Unlimited Organisation-wide deployment
  • Token-based auth via a 30-word mnemonic β€” no passwords, no email required
  • Admin endpoint to change any user's tier via X-Admin-Secret header
  • Daily counters reset at UTC midnight

Hardware Awareness

  • Auto-detects CPU cores, RAM, swap, VRAM (via nvidia-smi), and free disk
  • Maps the machine to one of four tiers (CPU-only β†’ Ultra-scale)
  • Incompatible models are hidden in the catalog with a clear explanation, not surfaced as runtime errors
  • Live resource monitor on the dashboard

Public Leaderboard

  • Top 10 merges ranked by automated quality score
  • Per-job is_public toggle (private by default)
  • No auth required to read β€” ideal for community discovery

Everything Else

  • All subprocesses run in their own process group β†’ cancellable from the UI
  • Background async worker keeps the API responsive while a merge runs
  • Restart-resilient: queued jobs re-enqueued on boot; stale "running" states automatically marked failed
  • CORS, env-variable-driven config, MongoDB indexes on all hot paths

Comparison

Capability MergeForge mergekit CLI LM Studio Axolotl HF AutoTrain
Browser UI for merging βœ… ❌ ❌ ❌ ⚠️ paid
Hardware-aware model filtering βœ… ❌ ⚠️ ❌ ❌
Real-time progress + stall watchdog βœ… ❌ β€” ❌ ⚠️
Auto perplexity scoring βœ… ❌ ❌ ❌ ❌
Auto GGUF Q4_K_M export βœ… ❌ ❌ ❌ ❌
Multi-user with rate limits βœ… ❌ ❌ ❌ βœ… cloud
Cancellable jobs from UI βœ… ❌ β€” ❌ βœ…
Self-hosted, MIT licensed βœ… βœ… ❌ βœ… ❌
CPU-only first-class support βœ… βœ… βœ… ❌ ❌

The short version: mergekit is the engine. MergeForge is the UI, multi-tenancy, quality scoring, compression, and deployment layer on top.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     BROWSER  (React 18)                      β”‚
β”‚  Landing Β· Auth Β· Dashboard Β· Models Β· Create Β· Jobs Β·       β”‚
β”‚  Job Detail Β· Hardware Β· Leaderboard                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚  REST  (Bearer token)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 FASTAPI BACKEND  (port 8001)                  β”‚
β”‚  Auth Β· Catalog Β· Validation Β· Job Queue Β· Rate Limiting Β·   β”‚
β”‚  Worker Β· Quality Eval Β· GGUF Export Β· Admin Β· Leaderboard   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                           β”‚                  β”‚
       β–Ό                           β–Ό                  β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MongoDB β”‚              β”‚   mergekit   β”‚   β”‚  llama.cpp  β”‚
  β”‚ users,  β”‚              β”‚ (subprocess) β”‚   β”‚ convert +   β”‚
  β”‚  jobs   β”‚              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ quantize    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
                            β”‚  HF Cache   β”‚
                            β”‚ (workspace) β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How a merge flows through the system:

  1. POST /api/merge/create β€” validate inputs, check rate limit, enqueue job
  2. Async worker picks the job and writes a mergekit YAML config
  3. Each HF model is downloaded sequentially under the stall watchdog
  4. mergekit-yaml subprocess runs; stdout is streamed in real-time to MongoDB logs
  5. On success: perplexity eval β†’ GGUF convert + quantize run as background tasks
  6. HF cache is wiped, output is packaged as a tar, download links become available

Installation

Ubuntu / Debian 20.04+

# System packages
sudo apt-get update
sudo apt-get install -y python3.11 python3.11-venv python3-pip git \
                        build-essential cmake nodejs mongodb-org curl

# Yarn (frontend package manager)
sudo npm install -g yarn

# Clone
git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge

# Backend
python3.11 -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt

# Frontend
cd frontend && yarn install && yarn build && cd ..

# MongoDB
sudo systemctl enable --now mongod

# Config
cp backend/.env.example backend/.env
# Edit backend/.env β€” set your paths and change ADMIN_SECRET

# Start (two terminals)
cd backend && ../venv/bin/python -m uvicorn server:app --host 0.0.0.0 --port 8001
cd frontend && yarn preview --host 0.0.0.0 --port 7070

Open http://localhost:7070 πŸŽ‰


macOS (Ventura / Sonoma)

brew install python@3.11 node yarn mongodb-community cmake git
brew services start mongodb-community

git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge

python3.11 -m venv venv && source venv/bin/activate
pip install -r backend/requirements.txt

cd frontend && yarn install && yarn build && cd ..
cp backend/.env.example backend/.env

cd backend && ../venv/bin/python -m uvicorn server:app --host 0.0.0.0 --port 8001 &
cd frontend && yarn preview --host 0.0.0.0 --port 7070

Apple Silicon: mergekit uses PyTorch, which has full MPS support. On M-series hardware the profiler will classify you as Tier 2 and unlock 30B merges.


Kali Linux

Kali is Debian-based, so the Ubuntu steps apply. The only difference is that you need to add MongoDB's official apt repo manually first:

curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
  sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor

echo "deb [signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg] \
  https://repo.mongodb.org/apt/debian bookworm/mongodb-org/7.0 main" | \
  sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

sudo apt-get update && sudo apt-get install -y mongodb-org python3.11 \
  python3.11-venv nodejs yarn build-essential cmake git

sudo systemctl enable --now mongod
# Then follow Ubuntu steps from "Clone" onward.

Arch / Manjaro

sudo pacman -Syu --needed python python-pip nodejs yarn \
                          mongodb-bin cmake base-devel git
sudo systemctl enable --now mongodb

git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge

python -m venv venv && source venv/bin/activate
pip install -r backend/requirements.txt

cd frontend && yarn install && yarn build && cd ..
cp backend/.env.example backend/.env
# Then start backend + frontend (see Ubuntu step 8).

Docker

git clone https://github.com/vikrant-project/MergeForge.git
cd MergeForge
docker compose up -d

Spins up three containers: mongo, mergeforge-backend (port 8001), and mergeforge-frontend (port 7070). Persistent volumes for the HF cache and merge output are configured by default.

Give the backend container as much CPU and RAM as you can. 8 GB minimum for 1B-scale merges; 24 GB+ recommended for 7B-scale.


Quick Start

  1. Visit http://localhost:7070
  2. Click Generate signup token β†’ save the 30-word phrase
  3. Open Model Catalog β†’ pick two models marked compatible with your hardware
  4. Open New Merge β†’ choose a method (start with linear), set weights, click Create
  5. Watch real-time logs in Merge Jobs β†’ [your job]
  6. On completion you'll see the quality score, the SafeTensors download, and (after ~30 s) the GGUF download
  7. Toggle Public to enter your merge on the leaderboard at /leaderboard

Configuration

backend/.env

MONGO_URL=mongodb://127.0.0.1:27017
DB_NAME=mergeforge
BACKEND_PORT=8001

# Point these at a disk with plenty of space β€” models are large
WORKSPACE_DIR=/var/lib/mergeforge/workspace
HF_CACHE_DIR=/var/lib/mergeforge/workspace/hf_cache

# Public-facing frontend URL (used for CORS and share links)
PUBLIC_BASE_URL=http://localhost:7070

# Change this to a random string before exposing the server to a network
ADMIN_SECRET=please-generate-a-random-string

# Optional: HuggingFace token for gated models
# HF_TOKEN=hf_xxx

frontend/.env

VITE_BACKEND_URL=http://localhost:8001

API Reference

Method Path Auth Description
POST /api/auth/signup β€” Create user, receive 30-word token
POST /api/auth/login β€” Login with token
GET /api/auth/me Bearer Current user including tier
GET /api/usage/today Bearer Daily merge usage and limit
GET /api/models β€” Catalog filtered by host hardware
POST /api/merge/validate Bearer Dry-run validation + ETA
POST /api/merge/create Bearer Enqueue a merge (rate-limited)
GET /api/merge/jobs Bearer List your jobs
GET /api/merge/jobs/{id} Bearer Job detail including quality score and GGUF status
POST /api/merge/jobs/{id}/cancel Bearer Kill a running merge
PATCH /api/merge/jobs/{id}/visibility Bearer Toggle public / private
GET /api/merge/jobs/{id}/download Token in query Stream SafeTensors tar
GET /api/merge/jobs/{id}/download/gguf Token in query Stream GGUF Q4_K_M
GET /api/leaderboard β€” Top 10 public merges (no auth required)
POST /api/admin/tier X-Admin-Secret Set a user's tier
GET /api/hardware/profile β€” Static hardware tier
GET /api/hardware/live β€” Live CPU / RAM / disk
GET /api/dashboard/stats Bearer All-in-one dashboard payload

Smoke Tests

The repo ships a self-contained pipeline test that exercises the full happy path against two tiny (<200 MB) models:

cd MergeForge
venv/bin/python backend/test_pipeline.py

Expected output:

[PASS] signup creates token
[PASS] create merge accepts request
[PASS] merge job reaches terminal state in time :: final=completed
[PASS] merge job completed successfully
[PASS] output directory exists
[PASS] download endpoint returns >1MB file :: bytes=271656960
[PASS] quality_score is computed :: score=79.5 summary=Good β€” perplexity 33.86, 3/3 inference tests passed

=== 7 passed, 0 failed ===

Exit code 0 on full pass β€” drop it straight into CI.


Troubleshooting

Symptom Likely cause Fix
ModuleNotFoundError on backend start Venv not activated Run source venv/bin/activate first
Frontend shows a blank page VITE_BACKEND_URL wrong or CORS mismatch Check frontend/.env, rebuild with yarn build
Merge stuck at 30% on 7B models Disk pressure or OOM Free disk space, increase swap, retry β€” the watchdog logs the exact cause
GGUF column shows "converting…" indefinitely llama.cpp build failed Install cmake + build-essential, then re-run the merge
Quality score never appears Eval subprocess ran out of memory Use a machine with 4 GB+ free RAM; null defaults to "Eval failed"
401 Invalid token on every request Token expired or cleared Re-login at /auth
Daily limit error on your first merge UTC midnight rollover The counter is UTC-based, not local time

Still stuck? Check logs/backend.err.log β€” every subprocess line is mirrored there.


Roadmap

  • WebSocket-based live log streaming (replace polling)
  • Additional merge methods: dare_linear, model_stock, breadcrumbs
  • Direct push of merged model to HuggingFace Hub
  • Multi-host distributed merging (split layers across nodes)
  • In-browser chat playground to test merged models without downloading
  • Stripe-backed paid tiers
  • Community ratings and comments on the leaderboard

Contributing

PRs welcome. Before submitting:

  1. Run backend/test_pipeline.py β€” all 7 tests must pass
  2. Keep components small and readable
  3. Do not break the CPU-only Tier 1 path β€” it's the whole point

License

MIT β€” do whatever you want with it.


Built with πŸ”₯ for the open-weight model community.

If MergeForge helped you ship a banger, drop a star ⭐

About

πŸ”₯ Forge new LLMs by merging open-weight models β€” no GPU required. Self-hosted web UI for mergekit with live progress, crash-safe pipeline, and one-click .safetensors export.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors