TalkToMyExcel is a self-hostable Flask app for asking questions over Excel, CSV, TSV, Parquet, and other tabular business exports. It uses a hybrid engine:
- DuckDB for exact lookups, filters, counts, and aggregations.
- Chroma for semantic search on the text columns selected during import.
- Regolo.ai as the recommended OpenAI-compatible LLM and embedding provider.
- Docker sandboxing for file profiling, extraction, and generated Python analysis.
The project is MIT licensed.
The static public website lives in public-site and is configured for:
https://talktomyexcel.ercoliconsulting.eu/
It follows the same bilingual structure used by the RAGuardian public site:
public-site/index.htmlredirects visitors to Italian or English.public-site/it/contains the Italian pages.public-site/en/contains the English pages.public-site/css/andpublic-site/assets/contain shared styling and media.
Requirements:
- Python 3.11+
- Docker
- A Regolo.ai API key, unless you switch to local embeddings and another chat provider
git clone <your-fork-url> TalkToMyExcel
cd TalkToMyExcel
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
cp .env.example .env
# edit .env and set REGOLO_API_KEY
docker build -f Dockerfile.sandbox -t talktomyexcel-sandbox:latest .
python -m app.appThe sandbox image is required by the running app. It is used to profile uploaded files and to run generated Python analysis in a short-lived Docker container, separate from the Flask server process.
The standard build above starts from python:3.11-slim and installs the sandbox
dependencies declared in Dockerfile.sandbox (pandas, openpyxl, xlrd, and
pyarrow). Use this command unless you have a specific reason to reuse an
already prepared base image:
docker build -f Dockerfile.sandbox -t talktomyexcel-sandbox:latest .Advanced variant:
docker build -f Dockerfile.sandbox \
--build-arg SANDBOX_BASE=code-interpreter:latest \
--build-arg INSTALL_SANDBOX_DEPS=0 \
-t talktomyexcel-sandbox:latest .This variant exists for deployments that already maintain a compatible sandbox
base image, such as code-interpreter:latest, with the required Python data
libraries preinstalled. SANDBOX_BASE swaps the base image, and
INSTALL_SANDBOX_DEPS=0 skips reinstalling those dependencies during this
project's image build. If that base image is not available, or if you are setting
up TalkToMyExcel normally, use the standard build.
Open http://127.0.0.1:5001.
Default first admin comes from .env:
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=change-me-nowAdvanced calculation questions can use a short-lived Docker sandbox. The app exports the active DuckDB workspace tables to read-only CSV inputs, runs LLM-generated Python with networking disabled, reads the JSON result, and removes the container.
- Upload an Excel, CSV, TSV, or Parquet file.
- The file is profiled inside Docker, not in the main server process.
- Pick sheets/tables and semantic columns.
- Confirm import.
- Add more files to the same workspace when needed.
- Ask questions in English UI over the active workspace data.
Each imported file stays available as a dataset in the workspace. You can remove individual datasets from the UI, or use the API replace option when you intentionally want to clear the workspace and import a fresh file set.
gunicorn -c gunicorn.conf.py wsgi:applicationSee docs/DETAILED_GUIDE.md for architecture, provider settings, and troubleshooting.