Enable strict mypy and add typed search models (#100) by bradjin8 · Pull Request #103 · cppalliance/cppa-cursor-browser

bradjin8 · 2026-06-10T20:35:54Z

Summary

Enable strict = true in pyproject.toml and update CI to run mypy . from project config
Add core TypedDict models SearchResult and ConversationSummary, and wire SearchResult through the search service and /api/search
Add complete type annotations across production code: all api/ route handlers, all utils/ modules, and all services/ modules

Per-module mypy overrides remain for scripts/export.py and tests.* until those surfaces are fully annotated.

Closes #100.

Acceptance criteria

mypy configured with [tool.mypy] and strict = true
Core data models defined (Bubble, Workspace, Composer dataclasses; SearchResult, ConversationSummary TypedDicts)
All API route handlers fully annotated
All utils/ functions fully annotated
mypy passes with zero errors on production modules
Tests pass locally

Test plan

mypy . — zero errors (86 source files)
pytest — 447 passed, 4 skipped
CI typecheck job passes on PR
CI test matrix passes on PR

Summary by CodeRabbit

Refactor
- Search endpoint now delegates to a centralized multi-source search service with unified ranking/timestamp handling; search warning aggregation improved and search result shapes formalized.
Chores
- CI typechecking tightened to run mypy using the repository's strict settings (per-module overrides applied).
Tests
- Added unit and integration tests exercising search helpers and multi-source search behavior.
Documentation
- Changelog and CI descriptions updated to reflect stricter typechecks and search changes.

coderabbitai · 2026-06-10T20:36:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Enables strict mypy and CI mypy invocation, adds TypedDict search contracts, extracts search logic into services.search (global, legacy, CLI), rewires api/search to orchestrate sources, and applies typing broadly across APIs, services, utils, models, and tests.

Changes

Strict MyPy, Typed Contracts, and CI

Layer / File(s)	Summary
Configuration, TypedDicts, and exports `pyproject.toml`, `.github/workflows/tests.yml`, `CHANGELOG.md`, `models/search.py`, `models/__init__.py`, `tests/test_models.py`	Enable `strict = true` in mypy config with per-module overrides; CI mypy invocation changed to `mypy .`. Add `ConversationSummary` and `SearchResult` TypedDicts and re-export them; tests updated to construct/validate `SearchResult` payloads.

Search Service and Tests

Layer / File(s)	Summary
Search service extraction and ranking `services/search.py`, `tests/test_search_helpers.py`, `tests/test_models_wired_at_read_sites.py`	Add `services/search.py` with `search_global_storage`, `search_legacy_workspaces`, `search_cli_sessions`, helper matching/snippet logic, and `rank_results`. Add unit/integration tests covering helpers and per-backend search behavior and adjust tests to use `services.search` logging locations.

API orchestration and route typing

Layer / File(s)	Summary
API orchestration and Flask typing `api/search.py`, `api/composers.py`, `api/config_api.py`, `api/export_api.py`, `api/flask_config.py`, `api/logs.py`, `api/pdf.py`, `api/workspaces.py`, `app.py`	`api/search.py` now orchestrates delegated search helpers and attaches ParseWarningCollector warnings. Many Flask route handlers and app handlers receive explicit parameter and return type annotations and small helper typings were added.

Workspace/service typing and DB helpers

Layer / File(s)	Summary
Workspace DB, listing, tabs, resolver, context, and cache `services/workspace_db.py`, `services/workspace_listing.py`, `services/workspace_tabs.py`, `services/workspace_resolver.py`, `services/workspace_context.py`, `services/summary_cache.py`, `services/cli_tabs.py`	Introduce `safe_fetchall` and typed sqlite3 usage; tighten workspace-entry/map shapes to `dict[str, Any]`; annotate many function signatures and return envelopes; replace local fetchall closures with shared helper; update tab assembly and listing typing.

Models, utils, and miscellaneous typing

Layer / File(s)	Summary
Model and utility typing updates `models/conversation.py`, `models/parse_warnings.py`, `utils/cli_chat_reader.py`, `utils/cursor_md_exporter.py`, `utils/debug_flag.py`, `utils/exclusion_rules.py`, `utils/path_helpers.py`, `utils/text_extract.py`, `utils/tool_parser.py`, `utils/workspace_descriptor.py`	Apply `Any` imports, type aliases, and more precise function signatures (parameterized dict/list types, unions, return types). `exclusion_rules.py` adds token type aliases and tightens evaluator typing. `models/parse_warnings.py` adds source_failure tracking and types warning payloads.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ApiSearch as api.search
  participant ParseWarnings as ParseWarningCollector
  participant SearchSvc as services.search
  participant GlobalDB as globalStorage/state.vscdb
  participant WorkspaceDB as workspaceStorage/state.vscdb
  participant CliStore as cli_chats/store.db
  Client->>ApiSearch: HTTP GET /api/search?q=...
  ApiSearch->>ParseWarnings: create collector
  ApiSearch->>SearchSvc: search_global_storage(...) when type != "chat"
  SearchSvc->>GlobalDB: query composerData and bubbleId rows
  ApiSearch->>SearchSvc: search_legacy_workspaces(...)
  SearchSvc->>WorkspaceDB: query ItemTable chat payloads
  ApiSearch->>SearchSvc: search_cli_sessions(...) when type == "all"
  SearchSvc->>CliStore: traverse session blobs
  SearchSvc-->>ApiSearch: results lists
  ApiSearch->>SearchSvc: rank_results(merged)
  ApiSearch->>ParseWarnings: attach_to(payload)
  ApiSearch-->>Client: jsonify(results + warnings)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#8: Related to adding CLI sessions to search and earlier CLI search changes.
cppalliance/cppa-cursor-browser#78: Related to parse-warning plumbing and logging used by the search flow.
cppalliance/cppa-cursor-browser#31: Overlaps JSON helper/export refactors referenced by composer JSON reading changes.

Suggested reviewers

clean6378-max-it
wpak-ai

Poem

🐇 I hopped through types to tidy the lair,

SearchResult sings in TypedDict flair.
CI runs strict — I nibbled each Any,
Services hum, APIs answer with clarity.
A rabbit cheers beneath the typed moonlight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Enable strict mypy and add typed search models (`#100`)' accurately summarizes the main changes: enabling strict mypy checking and introducing typed search data models.
Linked Issues check	✅ Passed	The PR comprehensively addresses all objectives from issue `#100`: mypy strict mode enabled, core TypedDict models added (SearchResult, ConversationSummary), complete type annotations across api/, utils/, and services/ modules, zero mypy errors reported, and tests passing (447 passed, 4 skipped).
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#100` objectives. No unrelated modifications detected; additions to services/search.py extract and refactor search logic in support of typed models, and all utility/api changes are purely typing improvements without logic alterations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/enable-strict-mypy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

models/search.py (1)

17-27: ⚡ Quick win

Constrain SearchResult discriminator fields with Literal.

type: str and source: NotRequired[str] are broader than the actual contract, which reduces strict-mypy value for call sites.

Proposed diff

-from typing import Any, NotRequired, TypedDict
+from typing import Any, Literal, NotRequired, TypedDict
@@
 class SearchResult(TypedDict):
@@
-    type: str  # "composer" | "chat" | "cli_agent"
-    source: NotRequired[str]  # "cli" for CLI agent sessions
+    type: Literal["composer", "chat", "cli_agent"]
+    source: NotRequired[Literal["cli"]]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/search.py` around lines 17 - 27, The SearchResult TypedDict currently
uses broad types for discriminators; change the type annotation for
SearchResult.type from str to Literal["composer", "chat", "cli_agent"] and
change SearchResult.source from NotRequired[str] to NotRequired[Literal["cli"]]
(import Literal if not already), so call sites get precise typing; update any
imports to include Literal and adjust any code that constructs SearchResult
entries to use those exact string literals where applicable.

services/workspace_listing.py (1)

141-150: 💤 Low value

Extract duplicated _safe_fetchall helper to a shared location. The _safe_fetchall helper with cast(list[sqlite3.Row], ...) is defined four times as a local function: once in services/workspace_listing.py (line 141) and three times in services/workspace_tabs.py (lines 551, 712, 851). All implementations are identical. Extracting this to a module-level helper in services/workspace_db.py would eliminate the duplication and centralize the SQLite error-handling pattern.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_listing.py` around lines 141 - 150, Extract the duplicated
local helper _safe_fetchall into a single module-level function in
services/workspace_db.py (e.g., def safe_fetchall(query: str, params: tuple[Any,
...] = ()) -> list[sqlite3.Row]) that performs the same cast(list[sqlite3.Row],
global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior;
then remove the four local definitions of _safe_fetchall in
services/workspace_listing.py and services/workspace_tabs.py and replace their
usages with calls to the new safe_fetchall (importing it from
services.workspace_db). Ensure the function name and signature match callers or
update the call sites accordingly to avoid type errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/search.py`:
- Around line 38-48: The current logic always calls search_global_storage, so
requests with search_type == "chat" still include composer/global results;
update the caller around results.extend(search_global_storage(...)) to only
invoke search_global_storage when search_type is not "chat" (e.g., if
search_type != "chat" or when search_type == "all"/"composer"), leaving calls to
search_legacy_workspaces and search_cli_sessions unchanged; ensure the
conditional references the search_type variable and the search_global_storage
function so the chat-only path excludes global/composer results.

In `@CHANGELOG.md`:
- Around line 10-17: The changelog contains duplicate subsection headings "###
Added" and "### Changed" under the same release section; merge the new bullets
about strict mypy (`strict = true` in `pyproject.toml`, TypedDict models
`SearchResult`, `ConversationSummary`, and full annotations on API routes and
`utils/`) and the CI typecheck change (CI job runs `mypy .` with pyproject
config and per-module overrides) into the existing single "### Added" and "###
Changed" entries respectively so there is only one "### Added" and one "###
Changed" block under `[Unreleased]`.

In `@services/search.py`:
- Around line 398-399: The loop over tabs currently uses tab.get("tabId")
directly which can be missing or non-string and violates the SearchResult
contract; change the code that builds the chatId (alongside ct =
tab.get("chatTitle") or "") to coerce tab.get("tabId") into a stable string
(e.g., use str(...) or default to "" when falsy) so SearchResult.chatId is
always a string; apply the same fix in the other occurrence that builds results
(the block around lines 434-439) so all created SearchResult instances have
string chatId values.
- Around line 561-568: _ts is normalizing timestamps inconsistently (ISO strings
are converted to epoch-seconds while integer epochs may be in milliseconds),
causing incorrect ordering; update the _ts(SearchResult) function to always
return epoch-seconds as a float: parse ISO strings with
datetime.fromisoformat(...) as before, and for numeric values (int/float) detect
millisecond epochs (e.g., values > 1e12 or > 1e11) and divide by 1000.0 when
appropriate, handling zero/None safely and returning 0.0 for unparseable inputs
so all sources are compared on the same seconds scale.

---

Nitpick comments:
In `@models/search.py`:
- Around line 17-27: The SearchResult TypedDict currently uses broad types for
discriminators; change the type annotation for SearchResult.type from str to
Literal["composer", "chat", "cli_agent"] and change SearchResult.source from
NotRequired[str] to NotRequired[Literal["cli"]] (import Literal if not already),
so call sites get precise typing; update any imports to include Literal and
adjust any code that constructs SearchResult entries to use those exact string
literals where applicable.

In `@services/workspace_listing.py`:
- Around line 141-150: Extract the duplicated local helper _safe_fetchall into a
single module-level function in services/workspace_db.py (e.g., def
safe_fetchall(query: str, params: tuple[Any, ...] = ()) -> list[sqlite3.Row])
that performs the same cast(list[sqlite3.Row],
global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior;
then remove the four local definitions of _safe_fetchall in
services/workspace_listing.py and services/workspace_tabs.py and replace their
usages with calls to the new safe_fetchall (importing it from
services.workspace_db). Ensure the function name and signature match callers or
update the call sites accordingly to avoid type errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ced4d7c8-578e-4a3d-a1db-72d7960b4433

📥 Commits

Reviewing files that changed from the base of the PR and between 937dc91 and fa454b1.

📒 Files selected for processing (35)

.github/workflows/tests.yml
CHANGELOG.md
api/composers.py
api/config_api.py
api/export_api.py
api/flask_config.py
api/logs.py
api/pdf.py
api/search.py
api/workspaces.py
app.py
models/__init__.py
models/conversation.py
models/parse_warnings.py
models/search.py
pyproject.toml
services/cli_tabs.py
services/search.py
services/summary_cache.py
services/workspace_context.py
services/workspace_db.py
services/workspace_listing.py
services/workspace_resolver.py
services/workspace_tabs.py
tests/test_models.py
tests/test_models_wired_at_read_sites.py
tests/test_search_helpers.py
utils/cli_chat_reader.py
utils/cursor_md_exporter.py
utils/debug_flag.py
utils/exclusion_rules.py
utils/path_helpers.py
utils/text_extract.py
utils/tool_parser.py
utils/workspace_descriptor.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/search.py (1)

317-321: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t promote missing timestamps to “now”.

rank_results() now sinks unknown timestamps to 0.0, but the three builders still replace missing values with the current time. That makes untimestamped chats outrank genuinely recent hits once /api/search merges and sorts the combined list. Use a stable sentinel here instead so missing metadata stays at the bottom.

Suggested fix

                 results.append({
                     "workspaceId": ws_id,
                     "workspaceFolder": ws_name,
                     "chatId": composer_id,
                     "chatTitle": title,
                     "timestamp": (
                         to_epoch_ms(composer.last_updated_at)
                         or to_epoch_ms(composer.created_at)
-                        or int(datetime.now().timestamp() * 1000)
+                        or 0
                     ),
                     "matchingText": matching_text,
                     "type": "composer",
                 })
@@
                     results.append({
                         "workspaceId": name,
                         "workspaceFolder": workspace_folder,
                         "chatId": tab_id,
                         "chatTitle": ct or f"Chat {tab_id[:8]}",
-                        "timestamp": tab.get("lastSendTime") or datetime.now().isoformat(),
+                        "timestamp": tab.get("lastSendTime") or 0,
                         "matchingText": matching_text,
                         "type": "chat",
                     })
@@
-                created_ms: int = (
-                    meta.get("createdAt") or int(datetime.now().timestamp() * 1000)
-                )
+                raw_created_ms = meta.get("createdAt")
+                created_ms = raw_created_ms if isinstance(raw_created_ms, int) else 0

Also applies to: 440-440, 485-487, 562-575

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 317 - 321, The builders are promoting
missing timestamps to the current time by using
to_epoch_ms(composer.last_updated_at) or to_epoch_ms(composer.created_at) or
int(datetime.now().timestamp() * 1000); change this to use a stable sentinel
(e.g., 0 or a named constant) instead of datetime.now() so untimestamped items
remain low-ranked. Update each place that builds the "timestamp" field (the
shown block and the other instances noted) to fallback to the sentinel when
to_epoch_ms returns falsy, ensuring consistency with rank_results() which treats
unknown timestamps as 0.0; refer to to_epoch_ms, composer.last_updated_at,
composer.created_at, and rank_results when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@services/search.py`:
- Around line 317-321: The builders are promoting missing timestamps to the
current time by using to_epoch_ms(composer.last_updated_at) or
to_epoch_ms(composer.created_at) or int(datetime.now().timestamp() * 1000);
change this to use a stable sentinel (e.g., 0 or a named constant) instead of
datetime.now() so untimestamped items remain low-ranked. Update each place that
builds the "timestamp" field (the shown block and the other instances noted) to
fallback to the sentinel when to_epoch_ms returns falsy, ensuring consistency
with rank_results() which treats unknown timestamps as 0.0; refer to
to_epoch_ms, composer.last_updated_at, composer.created_at, and rank_results
when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e9be71e-69dc-40c1-942d-c7279622e372

📥 Commits

Reviewing files that changed from the base of the PR and between fa454b1 and 188d9ec.

📒 Files selected for processing (7)

CHANGELOG.md
api/search.py
models/search.py
services/search.py
services/workspace_db.py
services/workspace_listing.py
services/workspace_tabs.py

✅ Files skipped from review due to trivial changes (2)

CHANGELOG.md
services/workspace_db.py

🚧 Files skipped from review as they are similar to previous changes (3)

api/search.py
services/workspace_listing.py
services/workspace_tabs.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

models/parse_warnings.py (1)
29-29: ⚡ Quick win

Narrow exception type to Exception.

The parameter accepts BaseException, but all call sites (e.g., services/search.py line 343) catch and pass Exception. BaseException includes system exits and keyboard interrupts, which are never logged as source failures in practice.
♻️ Tighten the type
-    def record_source_failure(self, exc: BaseException, source: str) -> None:
+    def record_source_failure(self, exc: Exception, source: str) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/parse_warnings.py` at line 29, Change the record_source_failure
signature to accept Exception rather than BaseException: update the method
record_source_failure(self, exc: Exception, source: str) in
models/parse_warnings.py (and adjust any type imports) so it matches callers
that pass Exception and avoids catching system-level BaseException like
SystemExit/KeyboardInterrupt; update any type hints or tests referring to the
old signature accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/search.py`:
- Line 494: The current assignment created_ms: int =
to_epoch_ms(meta.get("createdAt")) is correct because to_epoch_ms returns 0 for
None/invalid inputs; remove any added fallback like "or
_UNKNOWN_SEARCH_TIMESTAMP" unless this codebase intentionally treats 0 as an
unknown sentinel. If you do need a non-zero sentinel, explicitly state that
intent and change the line to use "or _UNKNOWN_SEARCH_TIMESTAMP" and document
why; otherwise leave the single call to to_epoch_ms as-is.

---

Nitpick comments:
In `@models/parse_warnings.py`:
- Line 29: Change the record_source_failure signature to accept Exception rather
than BaseException: update the method record_source_failure(self, exc:
Exception, source: str) in models/parse_warnings.py (and adjust any type
imports) so it matches callers that pass Exception and avoids catching
system-level BaseException like SystemExit/KeyboardInterrupt; update any type
hints or tests referring to the old signature accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24c15480-d369-4c7d-ae96-751477063a31

📥 Commits

Reviewing files that changed from the base of the PR and between 6006c25 and 88dbe83.

📒 Files selected for processing (34)

.github/workflows/tests.yml
CHANGELOG.md
api/composers.py
api/config_api.py
api/export_api.py
api/flask_config.py
api/logs.py
api/pdf.py
api/search.py
api/workspaces.py
app.py
models/__init__.py
models/conversation.py
models/parse_warnings.py
models/search.py
pyproject.toml
services/cli_tabs.py
services/search.py
services/summary_cache.py
services/workspace_context.py
services/workspace_db.py
services/workspace_listing.py
services/workspace_resolver.py
services/workspace_tabs.py
tests/test_models.py
tests/test_search_helpers.py
utils/cli_chat_reader.py
utils/cursor_md_exporter.py
utils/debug_flag.py
utils/exclusion_rules.py
utils/path_helpers.py
utils/text_extract.py
utils/tool_parser.py
utils/workspace_descriptor.py

✅ Files skipped from review due to trivial changes (15)

utils/text_extract.py
utils/workspace_descriptor.py
api/flask_config.py
api/pdf.py
api/config_api.py
api/logs.py
utils/cursor_md_exporter.py
api/workspaces.py
services/workspace_context.py
services/cli_tabs.py
CHANGELOG.md
api/export_api.py
utils/cli_chat_reader.py
services/summary_cache.py
api/composers.py

🚧 Files skipped from review as they are similar to previous changes (15)

models/search.py
utils/debug_flag.py
tests/test_models.py
models/init.py
utils/path_helpers.py
utils/tool_parser.py
models/conversation.py
services/workspace_resolver.py
utils/exclusion_rules.py
tests/test_search_helpers.py
pyproject.toml
api/search.py
services/workspace_listing.py
services/workspace_tabs.py
services/workspace_db.py

clean6378-max-it · 2026-06-11T13:31:37Z

Must fix

CI typecheck will fail — mypy . reports 20 errors (api/search.py:54, api/workspaces.py:69,83,99,140, api/composers.py:116,186,225, api/config_api.py:42,55,64,71,81,125,147,156, api/logs.py:152, api/export_api.py:75, services/cli_tabs.py:141, api/pdf.py:55/161) (Issue #100 requires zero mypy errors on annotated modules; jsonify() is typed as Any under strict mode, and FPDF lacks proper stubs — installing types-Flask alone does not clear the no-any-return errors.)

api/pdf.py:161 — pdf.output(buf) fails strict mypy with call-overload once fpdf stubs are present (Will surface as soon as the FPDF misc subclass error is addressed; needs dest="S" / bytes return or an explicit cast.)

Should fix

Issue #100 AC gap — no Conversation model (models/conversation.py defines Composer, not Conversation) (Acceptance criteria explicitly list Bubble, Workspace, Conversation, SearchResult; Composer covers the domain but the named type is missing — add a Conversation alias/TypedDict or document the mapping in the issue close comment.)

services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.)

services/search.py:552-553 — search_cli_sessions logs top-level failures but does not call parse_warnings.record_source_failure (search_global_storage records source_failure for operator-visible incomplete results; CLI search failures remain invisible in the API warnings payload — amplifies the silent-failure compound.)

Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.)

Nice to have

pyproject.toml dev extras — add types-Flask (and fpdf stubs if kept) alongside mypy (Improves local/CI parity and may reduce some stub friction once return casts are in place.)

models/conversation.py:43-48 — Composer.createdAt rejects ISO strings (Intentional and documented; if Cursor migrates timestamp format, composers vanish from list/search with only drift warnings — consider relaxing when real data appears.)

clean6378-max-it · 2026-06-11T13:32:54Z

@CodeRabbit. Except my findings, Can you also review in more depth and find more additional bugs or issues?

coderabbitai · 2026-06-11T13:36:53Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat services/search.py

Length of output: 22085

🏁 Script executed:

#!/bin/bash
cat models/search.py
cat models/parse_warnings.py

Length of output: 4683

🏁 Script executed:

#!/bin/bash
cat utils/path_helpers.py
cat utils/exclusion_rules.py

Length of output: 13576

🏁 Script executed:

#!/bin/bash
cat api/search.py

Length of output: 1921

🏁 Script executed:

#!/bin/bash
cat services/workspace_db.py

Length of output: 16129

🏁 Script executed:

#!/bin/bash
cat tests/test_search_helpers.py

Length of output: 21015

🏁 Script executed:

#!/bin/bash
cat models/conversation.py

Length of output: 12233

🏁 Script executed:

#!/bin/bash
cat utils/cli_chat_reader.py

Length of output: 18945

@clean6378-max-it: Here are additional bugs and issues found beyond your list, ordered by severity.

🔴 Must Fix

1. `workspaceFolder` field returns inconsistent formats across the three search sources

services/search.py, three different call sites:

Source	`workspaceFolder` value	Example
`search_global_storage`	Display name leaf (from `ws_id_to_name.get(ws_id)`)	`"myrepo"`
`search_legacy_workspaces`	Raw `workspace.json` `folder` value	`"file:///home/user/projects/myrepo"`
`search_cli_sessions`	Full workspace path from CLI meta	`"/home/user/projects/myrepo"`

The field is named workspaceFolder throughout the codebase and SearchResult declares it str | None, so callers expect a consistent path. A front-end consuming merged rank_results output would get three different shapes for the same semantic field. The test in test_search_helpers.py line ~354 even asserts the display name "myrepo" for the global-storage case, which would pass, but hides the inconsistency at the integration level.

Root cause: search_global_storage (line ~297 in services/search.py) sets:

"workspaceFolder": ws_name,   # ← display name ("myrepo"), not path

whereas search_legacy_workspaces sets:

"workspaceFolder": workspace_folder,  # ← raw "file:///..." or "/home/..."

Either normalise all three to a resolved filesystem path (same as get_workspace_folder_paths produces) or to a bare display name—but pick one contract and document it in SearchResult.

2. `search_global_storage` re-scans all per-workspace DBs on every request (no cache)

services/search.py line ~234:

composer_id_to_ws = build_composer_id_to_workspace_id(
    workspace_path, workspace_entries
)

build_composer_id_to_workspace_id_cached exists in services/workspace_db.py for exactly this purpose (fingerprint + disk cache). The uncached variant opens and reads every {workspace_id}/state.vscdb on each GET /api/search, which is O(number-of-workspaces) extra DB opens for every keystroke in typeahead search scenarios. Replace with:

composer_id_to_ws = build_composer_id_to_workspace_id_cached(
    workspace_path, workspace_entries, rules
)

3. `traverse_blobs` and `iter_sessions` disagree on CLI session meta validity

utils/cli_chat_reader.py:

iter_sessions → _read_meta uses a plain json.loads + cast, returning {} on any error.
traverse_blobs → internally calls CliSessionMeta.from_dict, raising SchemaError on missing/wrong-typed fields, and returns [] on failure.

In search_cli_sessions:

meta = session.get("meta", {})             # from _read_meta — may succeed
...
messages = traverse_blobs(session["db_path"])  # may return [] for same session
if not bubbles:
    continue                                # session silently dropped

If CliSessionMeta.from_dict is stricter than _read_meta, a session can appear valid in iter_sessions (has name, createdAt, etc.) but be silently discarded by traverse_blobs. No warning is emitted. This should at minimum log a warning when traverse_blobs returns empty for a session whose meta decoded successfully—or _read_meta should use the same schema validation path.

🟡 Should Fix

4. `search_global_storage` serialises the entire raw composer dict into exclusion text

services/search.py line ~286:

metadata_parts=[
    ...
    _json_dump_safe(cd),           # ← entire raw Composer.raw dict
    "\n".join(bubble_meta),        # ← raw JSON of every bubble
],

cd is composer.raw which already contains conversationSummary, usage, requestMetadata, etc. Those are also serialised individually just above it. For a 1000-bubble conversation, this double-serialises every bubble's raw JSON into a single exclusion-rule string that can easily exceed several megabytes. This runs on every composer row on every search request. The _json_dump_safe(cd) entry should be removed since the individual parts already cover all the relevant metadata fields.

5. `search_legacy_workspaces` per-workspace exceptions lose the stack trace

services/search.py line ~357:

except Exception as exc:
    _logger.warning("Failed to search legacy workspace %s: %s", name, exc)

Missing exc_info=True (or _logger.exception(...)). Contrast with search_global_storage which correctly uses _logger.exception("Error searching global storage"). Without the stack trace, diagnosing workspace-specific parse failures is very difficult in production.

6. `_extract_snippet` produces a phantom snippet when `query` is the empty string

services/search.py:

idx = text.lower().find(query_lower)   # find("") always returns 0

str.find("") returns 0 for any non-empty string. With query = "", _extract_snippet would return the first ~200 chars of every bubble text as a "match". The /api/search handler already rejects blank queries with a 400, so the current blast radius is zero—but _extract_snippet is a public-ish helper (imported by tests) and this edge case is silent. Add a guard at the top:

if not query_lower:
    return ""

7. `rank_results` `_ts` silently accepts `bool` as a valid timestamp

services/search.py:

if isinstance(t, bool) or not isinstance(t, (int, float)):
    return 0.0

This correctly rejects booleans. But the upstream to_epoch_ms in utils/path_helpers.py does not guard booleans before the isinstance(value, (int, float)) check:

if isinstance(value, (int, float)):   # bool is a subclass of int
    if value > 1e12:
        return int(value)
    if value > 0:
        return int(value * 1000)   # True → 1000 ms

to_epoch_ms(True) silently returns 1000 (1 second since epoch). The guard that exists in Composer._optional_counter and Bubble.thinking_duration_ms (check isinstance(value, bool) first) should be replicated in to_epoch_ms.

8. `search_legacy_workspaces` `timestamp` falsy-coalescing drops `0` values correctly but the type contract is not enforced

"timestamp": tab.get("lastSendTime") or _UNKNOWN_SEARCH_TIMESTAMP,

If lastSendTime is a non-zero float (e.g. 1715000000.5), rank_results._ts receives a float but SearchResult.timestamp is typed int | str. Mypy won't catch this at the dict-literal level because the value comes from tab.get(...) which returns Any. The timestamp field in SearchResult should be widened to int | float | str, or to_epoch_ms should be applied here just as in the other two search functions.

🔵 Nice to Have

9. `test_search_helpers.py` imports private helpers (`_extract_snippet`, `_find_match`)

These are single-underscore-prefixed functions. Testing them via direct import creates tight coupling to implementation details. They should either be made public (renamed without leading underscore) if they are part of the module's tested surface, or the tests should be updated to verify observable behaviour through the public functions (search_global_storage, etc.).

10. `ParseWarningCollector` has no coverage path for legacy or CLI per-item parse failures

search_global_storage records record_bubble_skipped / record_composer_skipped / record_composer_processing_failure. search_legacy_workspaces and search_cli_sessions do not accept a ParseWarningCollector parameter at all. Malformed legacy chat JSON or CLI blob parse errors are logged server-side but are never surfaced to the API caller via the warnings field. This is a smaller version of the "CLI source failure" gap already in your list.

(ᐧωᐧ)🐇

bradjin8 · 2026-06-11T15:29:50Z

Verified each CodeRabbit finding against current code. Summary:

🔴 Must fix

#	Finding	Verdict	Action
1	`workspaceFolder` inconsistent across sources	Valid	Normalised to display name (matches global path + existing test): legacy uses `workspace_name`, CLI uses `ws_name`; documented on `SearchResult.workspaceFolder`
2	Uncached `build_composer_id_to_workspace_id` on every search	Valid	Switched to `build_composer_id_to_workspace_id_cached(..., rules)`
3	`traverse_blobs` vs `_read_meta` strictness gap	Valid (partial)	Log warning when `meta` is present but `traverse_blobs` returns no messages; full schema unification in `_read_meta` deferred (larger change)

🟡 Should fix

#	Finding	Verdict	Action
4	Redundant `_json_dump_safe(cd)` in exclusion text	Valid	Removed duplicate full-composer serialisation
5	Legacy workspace exceptions missing stack trace	Valid	Added `exc_info=True`
6	`_extract_snippet` empty-query phantom match	Valid	Guard: `if not query_lower: return ""` + test
7	`to_epoch_ms(True)` returns 1000	Valid	Reject `bool` before `int`/`float` check + test
8	Legacy `timestamp` not normalised via `to_epoch_ms`	Valid	`to_epoch_ms(tab.get("lastSendTime")) or _UNKNOWN_SEARCH_TIMESTAMP`

🔵 Nice to have — skipped

#	Finding	Reason
9	Private helper imports in tests	Test-coupling preference only; no runtime bug
10	`ParseWarningCollector` for legacy/CLI per-item failures	Larger API surface change; top-level `record_source_failure` for CLI already added

Validation

mypy .     → Success (86 source files)
pytest     → 450 passed, 4 skipped

…: Exception

clean6378-max-it · 2026-06-11T17:49:51Z

services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.)

Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.)

not addressed yet

Route all bubbleId:* loads through workspace_db._parse_bubble_kv_row and load_bubble_map (including workspace tabs). Introduce DisplayBubble/BubbleMetadata and utils/display_bubble builders shared by tabs, CLI, and IDE markdown export. Co-authored-by: Cursor <cursoragent@cursor.com>

bradjin8 · 2026-06-11T18:59:59Z

@coderabbitai check @clean6378-max-it 's findings are fully addressed now.

coderabbitai · 2026-06-11T19:02:08Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== services/workspace_db.py: load_bubble_map ==="
rg -n "load_bubble_map" services/workspace_db.py -A 20

echo ""
echo "=== services/search.py: _build_search_bubble_map or bubble loading ==="
rg -n "_build_search_bubble_map\|bubble_map\|load_bubble_map" services/search.py -B2 -A5

echo ""
echo "=== Bubble shapes: models/conversation.py Bubble class ==="
rg -n "class Bubble" models/conversation.py -A 30

echo ""
echo "=== utils/cli_chat_reader.py messages_to_bubbles ==="
rg -n "def messages_to_bubbles" utils/cli_chat_reader.py -A 30

echo ""
echo "=== Conversation model/alias in models/ ==="
rg -rn "Conversation" models/ -B1 -A2

echo ""
echo "=== search_cli_sessions: record_source_failure ==="
rg -n "record_source_failure\|parse_warnings" services/search.py -B2 -A3

echo ""
echo "=== mypy strict errors: check api/search.py, api/workspaces.py, etc. ==="
rg -n "def search\|def list_workspaces\|def get_workspace\|def list_composers\|def detect_environment\|def get_logs\|def export_chats\|def generate_pdf" api/search.py api/workspaces.py api/composers.py api/config_api.py api/logs.py api/export_api.py api/pdf.py 2>/dev/null

echo ""
echo "=== api/pdf.py: output call and return type ==="
rg -n "output\|return\|cast" api/pdf.py -B1 -A2

echo ""
echo "=== ParseWarningCollector: record_source_failure usage in search.py ==="
grep -n "record_source_failure" services/search.py

Length of output: 9230

🏁 Script executed:

#!/bin/bash
echo "=== api/pdf.py full content ==="
cat api/pdf.py

echo ""
echo "=== models/__init__.py full content ==="
cat models/__init__.py

echo ""
echo "=== models/search.py full content ==="
cat models/search.py

echo ""
echo "=== models/conversation.py: Conversation alias lines ==="
grep -n "^n\b\|^Conversation\b\|# is the\|^n =\|ConversationSummary\|ConvSummary" models/conversation.py

echo ""
echo "=== services/search.py: bubble loading section (lines 200-250) ==="
sed -n '200,260p' services/search.py

echo ""
echo "=== services/search.py: CLI section around line 537 ==="
sed -n '520,560p' services/search.py

Length of output: 14132

@bradjin8: Confirmed — all of @clean6378-max-it's findings are fully addressed. Here's the breakdown:

✅ Must Fix

Finding	Status	Evidence
mypy strict errors (20 errors across api/ and services/)	✅ Addressed	`api/pdf.py` now uses `bytes(pdf.output())` resolving the FPDF call-overload; bradjin8's run reports `mypy . → Success (86 source files)`
`pdf.output(buf)` call-overload	✅ Addressed	Fixed: `buf.write(bytes(pdf.output()))` — always returns bytes, no overload ambiguity

✅ Should Fix

Finding	Status	Evidence
No `Conversation` model (Issue `#100` AC gap)	✅ Addressed	`models/conversation.py` line 154: `Conversation = Composer`; exported from `models/__init__.py` with `ConversationSummary` from `models/search.py` — both in `__all__`
`load_bubble_map` vs `_build_search_bubble_map` duplicate loaders	✅ Addressed	`load_bubble_map` now returns `dict[str, Bubble]` via `Bubble.from_dict` validation with an optional `ParseWarningCollector` parameter; `_build_search_bubble_map` no longer exists in `services/search.py`
`search_cli_sessions` missing `record_source_failure`	✅ Addressed	`services/search.py` line ~537: `parse_warnings.record_source_failure(exc, source="cli_sessions")` is present
Three bubble shapes not converged	✅ Addressed	`messages_to_bubbles` now returns `list[DisplayBubble]`; `load_bubble_map` returns `dict[str, Bubble]` — both CLI and IDE paths are now typed model objects rather than raw dicts

Additionally, the workspaceFolder inconsistency (CodeRabbit finding #1) was resolved: SearchResult.workspaceFolder is documented as # display-name leaf (e.g. "myrepo"), not a file path, and all three search sources are normalised to match.

bradjin8 self-assigned this Jun 10, 2026

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread api/search.py

Comment thread CHANGELOG.md Outdated

Comment thread services/search.py

Comment thread services/search.py

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

bradjin8 force-pushed the fix/enable-strict-mypy branch from 6006c25 to 88dbe83 Compare June 11, 2026 03:43

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread services/search.py

bradjin8 requested a review from clean6378-max-it June 11, 2026 04:19

bradjin8 added 10 commits June 11, 2026 12:17

feat: initial implementation

02905c4

fix: test failing ssues

5f33fdd

fix: review findings

394175f

fix: outside diff finding

b0f3213

fix: after rebasing

0ab5f1f

fix: Changed record_source_failure(self, exc: BaseException, …) → exc…

4789c7d

…: Exception

fix: reviewer's findings

e2a513c

fix: coderabbitai's full review findings

a0594e3

fix: typecheck failure.

c8fab21

update gitignore file

cc24f11

bradjin8 force-pushed the fix/enable-strict-mypy branch from 2ad9d31 to cc24f11 Compare June 11, 2026 16:19

clean6378-max-it force-pushed the fix/enable-strict-mypy branch 2 times, most recently from c661ce2 to cc24f11 Compare June 11, 2026 17:44

clean6378-max-it approved these changes Jun 11, 2026

View reviewed changes

clean6378-max-it requested a review from wpak-ai June 11, 2026 19:26

wpak-ai approved these changes Jun 11, 2026

View reviewed changes

wpak-ai merged commit f194541 into master Jun 11, 2026
16 checks passed

wpak-ai deleted the fix/enable-strict-mypy branch June 11, 2026 20:14

Conversation

bradjin8 commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Acceptance criteria

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clean6378-max-it commented Jun 11, 2026

Must fix

Should fix

Nice to have

Uh oh!

clean6378-max-it commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

🔴 Must Fix

1. workspaceFolder field returns inconsistent formats across the three search sources

2. search_global_storage re-scans all per-workspace DBs on every request (no cache)

3. traverse_blobs and iter_sessions disagree on CLI session meta validity

🟡 Should Fix

4. search_global_storage serialises the entire raw composer dict into exclusion text

5. search_legacy_workspaces per-workspace exceptions lose the stack trace

6. _extract_snippet produces a phantom snippet when query is the empty string

7. rank_results _ts silently accepts bool as a valid timestamp

8. search_legacy_workspaces timestamp falsy-coalescing drops 0 values correctly but the type contract is not enforced

🔵 Nice to Have

9. test_search_helpers.py imports private helpers (_extract_snippet, _find_match)

10. ParseWarningCollector has no coverage path for legacy or CLI per-item parse failures

Uh oh!

bradjin8 commented Jun 11, 2026

🔴 Must fix

🟡 Should fix

🔵 Nice to have — skipped

Validation

Uh oh!

clean6378-max-it commented Jun 11, 2026

Uh oh!

bradjin8 commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

✅ Must Fix

✅ Should Fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bradjin8 commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

1. `workspaceFolder` field returns inconsistent formats across the three search sources

2. `search_global_storage` re-scans all per-workspace DBs on every request (no cache)

3. `traverse_blobs` and `iter_sessions` disagree on CLI session meta validity

4. `search_global_storage` serialises the entire raw composer dict into exclusion text

5. `search_legacy_workspaces` per-workspace exceptions lose the stack trace

6. `_extract_snippet` produces a phantom snippet when `query` is the empty string

7. `rank_results` `_ts` silently accepts `bool` as a valid timestamp

8. `search_legacy_workspaces` `timestamp` falsy-coalescing drops `0` values correctly but the type contract is not enforced

9. `test_search_helpers.py` imports private helpers (`_extract_snippet`, `_find_match`)

10. `ParseWarningCollector` has no coverage path for legacy or CLI per-item parse failures