You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zero-config entity resolution & record linkage. The zero-tuning Fellegi-Sunter path beats hand-tuned Splink head-to-head and scales from a CSV to a verified 100M-row dedupe in 9.2 min. Fuzzy/exact/probabilistic + PPRL + LLM + identity graph. Python + edge-safe TypeScript (WASM), SQL-native in Postgres & DuckDB, MCP/REST + dbt/Airflow.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
This repository contains a Python codebase dedicated to cleaning and standardizing CSV data, with a specific focus on preparing the dataset ready for Splink.
Entify is an early open-source workspace for entity resolution, record linkage, and data deduplication. It helps teams profile messy datasets, configure Splink-powered matching workflows, and review explainable match clusters.