A Text-to-SQL conversational assistant for an FMCG (beverages) business. Business users ask questions in plain English — promo performance, inventory health, regional comparisons, campaign impact — and get back a validated, auditable answer backed by a real SQL query against a real database, instead of waiting on the Data & AI team for ad-hoc analysis.
Built as an AI Engineering Assessment (Text-to-SQL Agent for FMCG analytics). Full design rationale, dataset reasoning, and reflection are in
docs/; the pitch deck is inslides/.
User question (plain English)
│
▼
LangChain orchestration ──► LLM generates a READ-ONLY SQL query
│ (schema + join rules injected into the prompt)
▼
Supabase / Postgres executes the query
│
▼
Validator ("agentic reflection loop")
• blocks anything that isn't SELECT/WITH
• flags risky INNER JOINs between sales_promotions and inventory
• checks for negative stock/units/revenue
• flags empty results and null-heavy aggregations
│
▼
LLM synthesizes the validated result into a plain-English answer
│
▼
Streamlit UI — shows the answer, the exact SQL used, the data, and a chart
Why Text-to-SQL instead of standard RAG / vector search: Promo uplift,
inventory drawdown, and regional sales comparisons require exact arithmetic
(SUM, AVG, GROUP BY) and reliable multi-table joins. Vector similarity
search is good at finding relevant text, not at guaranteeing a correct sum.
Routing every question to a relational database via generated SQL keeps the
math deterministic — full reasoning in docs/Problem_Definition.md.
.
├── app.py # Streamlit chat UI
├── src/
│ ├── agent.py # Orchestration: question -> SQL -> execution -> answer
│ ├── db.py # Postgres connection + read-only safety gate + schema context
│ ├── prompts.py # System prompts for SQL generation & answer synthesis
│ └── validator.py # Deterministic post-query sanity checks (the "reflection loop")
├── data/
│ ├── generate_fmcg_data.py # Synthetic data generator (pandas/numpy, seeded)
│ ├── product_master.csv # 15 SKUs across 5 categories
│ ├── store_master.csv # 30 stores across 4 regions
│ ├── sales_promotions.csv # 9,000 weekly rows
│ └── inventory.csv # 9,000 weekly rows
├── sql/
│ ├── schema.sql # Table definitions + the join-risk warning as a comment
│ └── load_data.sql # \copy commands to load the CSVs into Postgres
├── docs/
│ ├── Problem_Definition.md # Design brief: problem, objective, scope, constraints
│ ├── AI_Engineering_Brief.md # Original assessment brief
│ └── chat_export.md # AI-assisted thinking transcript (dataset generation)
└── slides/
└── Conversational-AI-for-Beverage-Insights.pptx # 6-slide pitch deck
1. Clone and install dependencies
git clone <this-repo-url>
cd fmcg-text-to-sql-agent
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt2. Set up the database (Supabase)
- Create a free project at supabase.com.
- In the SQL editor, run
sql/schema.sql. - Load the data — either use the Table Editor's CSV import per table, or
run
sql/load_data.sqlwithpsqlif you have a direct connection string.
3. Configure environment variables
cp .env.example .env
# then fill in DATABASE_URL and your Gemini API key (from Google AI Studio)4. (Re)generate the dataset (optional — CSVs are already included)
cd data && python generate_fmcg_data.pyThis re-runs the generator with its built-in integrity checks (asserts that
units_sold matches exactly between sales_promotions and inventory for
every key, that closing_stock is never negative, and that promo-week
units_sold is at least 1.5x non-promo weeks before writing any CSV).
5. Run the app
streamlit run app.pyGenerated synthetically (seeded, reproducible) to model realistic FMCG dynamics rather than flat random noise:
- Promotional stockouts — a successful promo spikes
units_sold2x–4x, which can depleteclosing_stockto zero, settingstockout_flag. - Strict mathematical integrity —
inventory.units_soldis copied from the matchingsales_promotionsrow for the same(product_id, store_id, week_start_date), never independently resampled, andclosing_stock = opening_stock + units_received - units_soldholds exactly, every row. - Varying campaign efficacy — not every promo lifts sales meaningfully, so the assistant can identify underperforming campaigns rather than assuming all promotions work.
Full rationale in docs/Problem_Definition.md.
During development, the SQL-generation step would sometimes produce a
syntactically valid query that used an INNER JOIN between
sales_promotions and inventory — which silently dropped fully
stocked-out products from results about promotional impact on stockouts.
No error was thrown; the answer was just quietly wrong.
Mitigations in this repo:
- The SQL-generation system prompt (
src/prompts.py) explicitly instructs the model to default to aLEFT JOINbetween these two tables. src/validator.pyindependently scans the generated SQL text for this exact join pattern and raises a warning even if the prompt is ignored — deliberately rule-based rather than another LLM call, so it doesn't inherit the same failure mode it's meant to catch.- The UI surfaces validator warnings next to the answer rather than hiding them, so the failure mode is visible to the end user, not just logged.
- No clarification step before execution — the agent only asks for
clarification when it can detect ambiguity up front (see the
CLARIFY:path insrc/prompts.py); a genuinely multi-intent question that spans beyond the four mapped tables can still produce a confidently incomplete answer. - Raw-table SQL generation — v2 would introduce a semantic layer (e.g. dbt metrics or Cube.js) so the agent queries pre-validated business metrics instead of re-deriving joins from scratch each time.
- No few-shot example retrieval — v2 would add a small vector store of validated NL→SQL example pairs purely for few-shot prompting (notably, not for querying the FMCG data itself — that distinction is the whole reason Text-to-SQL was chosen over RAG in the first place).
- Static charts only — v2 would let the LLM propose chart specs (e.g. Plotly) instead of a fixed bar chart in the UI.
Full reflection in the project write-up (see chat history / assessment answers).
Built for educational/assessment purposes.