Public project overview: https://dacameragirl.github.io/compass-ultra-web-intel/
A live company-intelligence workbench for Compass Ultra's release-readiness market.
Compass Ultra Web Intel turns a company name or website into a live research run:
- π resolve the company or source website
- π discover related public pages with Tavily
- π·οΈ crawl useful website content
- βοΈ load raw pages into Snowflake
- π§± rebuild dbt marts and tests
- π show prospect, competitor, and market signals in Streamlit
- π€ optionally generate sourced AI summaries with Anthropic, OpenAI, OpenRouter, or DeepSeek
The product is built for the same Compass Ultra world as the main app: feature flags, release gates, stale flag debt, approval workflows, rollback evidence, CAB handoffs, and safe production change.
Open the local app:
.\Start-CompassUltraWebIntel.ps1Or run Streamlit directly:
streamlit run app\streamlit_app.pyIn the app:
- Type a company name or website into Analyze company or website.
- Pick how many pages to crawl per discovered site.
- Click the full-width Run Analysis button.
- Watch the live discovery, crawl, Snowflake load, and dbt build logs.
- Review the refreshed domain table and ranked website signals.
The app uses one main input. That input drives both the live run and the focused results below it.
Compass Ultra Web Intel is a public data-engineering portfolio project that turns company web presence into structured market-intelligence signals using Python crawling, Snowflake loading, dbt modeling, and a Streamlit analytics layer. Public demo mode uses seeded data, while approved users can unlock live workflows through guarded access controls.
The active signal engine scores websites for Compass Ultra fit using signals like:
- π© feature flag mentions
- π release, deploy, rollback, canary, and CAB language
- π‘οΈ audit, compliance, SOC 2, change management, and review terms
- π workflow terms like Slack, Jira, GitHub, CI/CD, and runbooks
- π§Ή stale flag, flag debt, ownership, approval, and cleanup language
Output tables:
ANALYTICS.MART_WEBSITE_QUERY_INDEXANALYTICS.MART_PROSPECT_ACCOUNTSANALYTICS.FCT_WEBSITE_SIGNALS
| Layer | Tool | Job |
|---|---|---|
| π₯οΈ App | Streamlit | Live company runner and query UI |
| π Ingestion | Python | Discovery, crawling, parsing, Snowflake loading |
| βοΈ Warehouse | Snowflake | Raw public pages and analytics outputs |
| π§± Modeling | dbt Core + dbt Snowflake | Staging, marts, tests |
| π Discovery | Tavily | Company lookup and related-site search |
| π€ Optional AI | Anthropic, OpenAI, OpenRouter, DeepSeek | Sourced summaries over retrieved pages |
| π Optional ops | Fivetran, Stripe, Vercel, Compass backend | Future operational intelligence sources |
Company / website
-> Tavily discovery
-> Python crawler
-> Snowflake RAW_WEBSITE_INTEL.PAGES
-> dbt staging + marts
-> Streamlit Web Intel app
GitHub's language bar is tuned with .gitattributes so it emphasizes the real product code:
- π Python for the crawler, loaders, Streamlit app, and validation scripts
- π§± SQL for dbt staging and mart models
- π» PowerShell for Windows launchers
- βοΈ YAML/TOML for dbt, Streamlit, and config
- π¦ TypeScript/HTML only for lightweight store-wrapper scaffolding
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
Copy-Item .env.example .envFill .env with your real values. Never commit .env.
Validate configuration:
python scripts\validate_environment.pyCreate the raw website table:
python scripts\crawl_websites_to_snowflake.py --bootstrap-onlyRun the app:
streamlit run app\streamlit_app.pyRequired for the live workflow:
TAVILY_API_KEY
SNOWFLAKE_ACCOUNT
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ROLE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_DATABASE
SNOWFLAKE_SCHEMA
Optional AI answer providers:
ANTHROPIC_API_KEY
OPENROUTER_API_KEY
OPENAI_API_KEY
DEEPSEEK_API_KEY
Public deployment guardrails:
COMPASS_PUBLIC_MODE=true
COMPASS_ACCESS_CODE=choose-a-private-access-code
COMPASS_LIVE_RUNS_ENABLED=true
COMPASS_MAX_PAGES_PER_RUN=5
COMPASS_RUN_COOLDOWN_SECONDS=900
Local trusted runs can use:
COMPASS_ACCESS_MODE=local
Optional later sources:
FIVETRAN_API_KEY
FIVETRAN_API_SECRET
DATABASE_URL
STRIPE_SECRET_KEY
VERCEL_TOKEN
See GET_KEYS.md for account/key guidance.
Run the default Compass Ultra seeded discovery:
.\Run-WebsiteDiscovery.ps1Run another source website:
.\Run-WebsiteDiscovery.ps1 -SourceUrl https://www.example.com/ -MaxPages 5That command discovers related websites, crawls them, loads Snowflake, rebuilds dbt, and opens the local app.
Discover related websites:
python scripts\discover_websites.py --source-url https://www.example.com/Crawl discovered websites into Snowflake:
python scripts\crawl_websites_to_snowflake.py --urls-file targets\discovered_websites.txt --max-pages 25Load a crawler-safe JSON feed:
python scripts\crawl_websites_to_snowflake.py --feed-file C:\path\to\crawler-feed.json --skip-urls-fileBuild dbt marts:
$env:DBT_PROFILES_DIR = (Get-Location).Path
dbt build --select stg_web_pages fct_website_signals mart_prospect_accounts mart_website_query_indexGitHub Pages is not enough for this app because it only serves static files. Compass Ultra Web Intel needs a Python/Streamlit runtime plus secrets for Snowflake and Tavily.
Best free path:
- Push this repo to GitHub.
- Deploy it on Streamlit Community Cloud.
- Choose repo
DaCameraGirl/compass-ultra-web-intel, branchmain, and main file pathapp/streamlit_app.py. - Add secrets in Streamlit's secrets manager, not in GitHub.
- Share the Streamlit app URL.
For a polished public version, use Streamlit secrets for the keys above plus the public guardrail settings. Unauthenticated visitors see the seeded intelligence workspace and full product flow; approved users can unlock the live Snowflake, Tavily, crawler, dbt, and AI workflow with the access code.
These are wired for later, but not required for the active website-intelligence workflow:
scripts\compass_to_snowflake.py- Compass backend, Stripe, and Vercel datascripts\fivetran_to_snowflake.py- Fivetran connector and destination metadata
If another local Compass backend .env already has useful values, point this repo to it:
COMPASS_BACKEND_ENV_FILE=C:\Users\enter\Compass-Ultra-Backend\.env
The scripts read values locally and never print secret values.
Store-wrapper scaffolding lives in store/.
- π± Android and iOS:
store/capacitor - π PWA manifest:
store/pwa/manifest.webmanifest - πͺ Microsoft Store notes:
store/microsoft
Before a real store submission, host the app at a production HTTPS URL and set COMPASS_WEB_INTEL_URL for the Capacitor wrapper. Store submissions also require developer accounts, signing certificates, screenshots, privacy labels, and production icons.
- GET_KEYS.md - accounts and API key setup
- docs/TECH_STACK.md - stack notes and GitHub language detection