Skip to content

Enable strict mypy and add typed search models (#100)#103

Merged
wpak-ai merged 11 commits into
masterfrom
fix/enable-strict-mypy
Jun 11, 2026
Merged

Enable strict mypy and add typed search models (#100)#103
wpak-ai merged 11 commits into
masterfrom
fix/enable-strict-mypy

Conversation

@bradjin8

@bradjin8 bradjin8 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Enable strict = true in pyproject.toml and update CI to run mypy . from project config
  • Add core TypedDict models SearchResult and ConversationSummary, and wire SearchResult through the search service and /api/search
  • Add complete type annotations across production code: all api/ route handlers, all utils/ modules, and all services/ modules

Per-module mypy overrides remain for scripts/export.py and tests.* until those surfaces are fully annotated.

Closes #100.

Acceptance criteria

  • mypy configured with [tool.mypy] and strict = true
  • Core data models defined (Bubble, Workspace, Composer dataclasses; SearchResult, ConversationSummary TypedDicts)
  • All API route handlers fully annotated
  • All utils/ functions fully annotated
  • mypy passes with zero errors on production modules
  • Tests pass locally

Test plan

  • mypy . — zero errors (86 source files)
  • pytest — 447 passed, 4 skipped
  • CI typecheck job passes on PR
  • CI test matrix passes on PR

Summary by CodeRabbit

  • Refactor

    • Search endpoint now delegates to a centralized multi-source search service with unified ranking/timestamp handling; search warning aggregation improved and search result shapes formalized.
  • Chores

    • CI typechecking tightened to run mypy using the repository's strict settings (per-module overrides applied).
  • Tests

    • Added unit and integration tests exercising search helpers and multi-source search behavior.
  • Documentation

    • Changelog and CI descriptions updated to reflect stricter typechecks and search changes.

@bradjin8 bradjin8 self-assigned this Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Enables strict mypy and CI mypy invocation, adds TypedDict search contracts, extracts search logic into services.search (global, legacy, CLI), rewires api/search to orchestrate sources, and applies typing broadly across APIs, services, utils, models, and tests.

Changes

Strict MyPy, Typed Contracts, and CI

Layer / File(s) Summary
Configuration, TypedDicts, and exports
pyproject.toml, .github/workflows/tests.yml, CHANGELOG.md, models/search.py, models/__init__.py, tests/test_models.py
Enable strict = true in mypy config with per-module overrides; CI mypy invocation changed to mypy .. Add ConversationSummary and SearchResult TypedDicts and re-export them; tests updated to construct/validate SearchResult payloads.

Search Service and Tests

Layer / File(s) Summary
Search service extraction and ranking
services/search.py, tests/test_search_helpers.py, tests/test_models_wired_at_read_sites.py
Add services/search.py with search_global_storage, search_legacy_workspaces, search_cli_sessions, helper matching/snippet logic, and rank_results. Add unit/integration tests covering helpers and per-backend search behavior and adjust tests to use services.search logging locations.

API orchestration and route typing

Layer / File(s) Summary
API orchestration and Flask typing
api/search.py, api/composers.py, api/config_api.py, api/export_api.py, api/flask_config.py, api/logs.py, api/pdf.py, api/workspaces.py, app.py
api/search.py now orchestrates delegated search helpers and attaches ParseWarningCollector warnings. Many Flask route handlers and app handlers receive explicit parameter and return type annotations and small helper typings were added.

Workspace/service typing and DB helpers

Layer / File(s) Summary
Workspace DB, listing, tabs, resolver, context, and cache
services/workspace_db.py, services/workspace_listing.py, services/workspace_tabs.py, services/workspace_resolver.py, services/workspace_context.py, services/summary_cache.py, services/cli_tabs.py
Introduce safe_fetchall and typed sqlite3 usage; tighten workspace-entry/map shapes to dict[str, Any]; annotate many function signatures and return envelopes; replace local fetchall closures with shared helper; update tab assembly and listing typing.

Models, utils, and miscellaneous typing

Layer / File(s) Summary
Model and utility typing updates
models/conversation.py, models/parse_warnings.py, utils/cli_chat_reader.py, utils/cursor_md_exporter.py, utils/debug_flag.py, utils/exclusion_rules.py, utils/path_helpers.py, utils/text_extract.py, utils/tool_parser.py, utils/workspace_descriptor.py
Apply Any imports, type aliases, and more precise function signatures (parameterized dict/list types, unions, return types). exclusion_rules.py adds token type aliases and tightens evaluator typing. models/parse_warnings.py adds source_failure tracking and types warning payloads.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ApiSearch as api.search
  participant ParseWarnings as ParseWarningCollector
  participant SearchSvc as services.search
  participant GlobalDB as globalStorage/state.vscdb
  participant WorkspaceDB as workspaceStorage/state.vscdb
  participant CliStore as cli_chats/store.db
  Client->>ApiSearch: HTTP GET /api/search?q=...
  ApiSearch->>ParseWarnings: create collector
  ApiSearch->>SearchSvc: search_global_storage(...) when type != "chat"
  SearchSvc->>GlobalDB: query composerData and bubbleId rows
  ApiSearch->>SearchSvc: search_legacy_workspaces(...)
  SearchSvc->>WorkspaceDB: query ItemTable chat payloads
  ApiSearch->>SearchSvc: search_cli_sessions(...) when type == "all"
  SearchSvc->>CliStore: traverse session blobs
  SearchSvc-->>ApiSearch: results lists
  ApiSearch->>SearchSvc: rank_results(merged)
  ApiSearch->>ParseWarnings: attach_to(payload)
  ApiSearch-->>Client: jsonify(results + warnings)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • clean6378-max-it
  • wpak-ai

Poem

🐇 I hopped through types to tidy the lair,

SearchResult sings in TypedDict flair.
CI runs strict — I nibbled each Any,
Services hum, APIs answer with clarity.
A rabbit cheers beneath the typed moonlight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Enable strict mypy and add typed search models (#100)' accurately summarizes the main changes: enabling strict mypy checking and introducing typed search data models.
Linked Issues check ✅ Passed The PR comprehensively addresses all objectives from issue #100: mypy strict mode enabled, core TypedDict models added (SearchResult, ConversationSummary), complete type annotations across api/, utils/, and services/ modules, zero mypy errors reported, and tests passing (447 passed, 4 skipped).
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #100 objectives. No unrelated modifications detected; additions to services/search.py extract and refactor search logic in support of typed models, and all utility/api changes are purely typing improvements without logic alterations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/enable-strict-mypy

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
models/search.py (1)

17-27: ⚡ Quick win

Constrain SearchResult discriminator fields with Literal.

type: str and source: NotRequired[str] are broader than the actual contract, which reduces strict-mypy value for call sites.

Proposed diff
-from typing import Any, NotRequired, TypedDict
+from typing import Any, Literal, NotRequired, TypedDict
@@
 class SearchResult(TypedDict):
@@
-    type: str  # "composer" | "chat" | "cli_agent"
-    source: NotRequired[str]  # "cli" for CLI agent sessions
+    type: Literal["composer", "chat", "cli_agent"]
+    source: NotRequired[Literal["cli"]]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/search.py` around lines 17 - 27, The SearchResult TypedDict currently
uses broad types for discriminators; change the type annotation for
SearchResult.type from str to Literal["composer", "chat", "cli_agent"] and
change SearchResult.source from NotRequired[str] to NotRequired[Literal["cli"]]
(import Literal if not already), so call sites get precise typing; update any
imports to include Literal and adjust any code that constructs SearchResult
entries to use those exact string literals where applicable.
services/workspace_listing.py (1)

141-150: 💤 Low value

Extract duplicated _safe_fetchall helper to a shared location. The _safe_fetchall helper with cast(list[sqlite3.Row], ...) is defined four times as a local function: once in services/workspace_listing.py (line 141) and three times in services/workspace_tabs.py (lines 551, 712, 851). All implementations are identical. Extracting this to a module-level helper in services/workspace_db.py would eliminate the duplication and centralize the SQLite error-handling pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_listing.py` around lines 141 - 150, Extract the duplicated
local helper _safe_fetchall into a single module-level function in
services/workspace_db.py (e.g., def safe_fetchall(query: str, params: tuple[Any,
...] = ()) -> list[sqlite3.Row]) that performs the same cast(list[sqlite3.Row],
global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior;
then remove the four local definitions of _safe_fetchall in
services/workspace_listing.py and services/workspace_tabs.py and replace their
usages with calls to the new safe_fetchall (importing it from
services.workspace_db). Ensure the function name and signature match callers or
update the call sites accordingly to avoid type errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/search.py`:
- Around line 38-48: The current logic always calls search_global_storage, so
requests with search_type == "chat" still include composer/global results;
update the caller around results.extend(search_global_storage(...)) to only
invoke search_global_storage when search_type is not "chat" (e.g., if
search_type != "chat" or when search_type == "all"/"composer"), leaving calls to
search_legacy_workspaces and search_cli_sessions unchanged; ensure the
conditional references the search_type variable and the search_global_storage
function so the chat-only path excludes global/composer results.

In `@CHANGELOG.md`:
- Around line 10-17: The changelog contains duplicate subsection headings "###
Added" and "### Changed" under the same release section; merge the new bullets
about strict mypy (`strict = true` in `pyproject.toml`, TypedDict models
`SearchResult`, `ConversationSummary`, and full annotations on API routes and
`utils/`) and the CI typecheck change (CI job runs `mypy .` with pyproject
config and per-module overrides) into the existing single "### Added" and "###
Changed" entries respectively so there is only one "### Added" and one "###
Changed" block under `[Unreleased]`.

In `@services/search.py`:
- Around line 398-399: The loop over tabs currently uses tab.get("tabId")
directly which can be missing or non-string and violates the SearchResult
contract; change the code that builds the chatId (alongside ct =
tab.get("chatTitle") or "") to coerce tab.get("tabId") into a stable string
(e.g., use str(...) or default to "" when falsy) so SearchResult.chatId is
always a string; apply the same fix in the other occurrence that builds results
(the block around lines 434-439) so all created SearchResult instances have
string chatId values.
- Around line 561-568: _ts is normalizing timestamps inconsistently (ISO strings
are converted to epoch-seconds while integer epochs may be in milliseconds),
causing incorrect ordering; update the _ts(SearchResult) function to always
return epoch-seconds as a float: parse ISO strings with
datetime.fromisoformat(...) as before, and for numeric values (int/float) detect
millisecond epochs (e.g., values > 1e12 or > 1e11) and divide by 1000.0 when
appropriate, handling zero/None safely and returning 0.0 for unparseable inputs
so all sources are compared on the same seconds scale.

---

Nitpick comments:
In `@models/search.py`:
- Around line 17-27: The SearchResult TypedDict currently uses broad types for
discriminators; change the type annotation for SearchResult.type from str to
Literal["composer", "chat", "cli_agent"] and change SearchResult.source from
NotRequired[str] to NotRequired[Literal["cli"]] (import Literal if not already),
so call sites get precise typing; update any imports to include Literal and
adjust any code that constructs SearchResult entries to use those exact string
literals where applicable.

In `@services/workspace_listing.py`:
- Around line 141-150: Extract the duplicated local helper _safe_fetchall into a
single module-level function in services/workspace_db.py (e.g., def
safe_fetchall(query: str, params: tuple[Any, ...] = ()) -> list[sqlite3.Row])
that performs the same cast(list[sqlite3.Row],
global_db.execute(...).fetchall()) and except sqlite3.Error: return [] behavior;
then remove the four local definitions of _safe_fetchall in
services/workspace_listing.py and services/workspace_tabs.py and replace their
usages with calls to the new safe_fetchall (importing it from
services.workspace_db). Ensure the function name and signature match callers or
update the call sites accordingly to avoid type errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ced4d7c8-578e-4a3d-a1db-72d7960b4433

📥 Commits

Reviewing files that changed from the base of the PR and between 937dc91 and fa454b1.

📒 Files selected for processing (35)
  • .github/workflows/tests.yml
  • CHANGELOG.md
  • api/composers.py
  • api/config_api.py
  • api/export_api.py
  • api/flask_config.py
  • api/logs.py
  • api/pdf.py
  • api/search.py
  • api/workspaces.py
  • app.py
  • models/__init__.py
  • models/conversation.py
  • models/parse_warnings.py
  • models/search.py
  • pyproject.toml
  • services/cli_tabs.py
  • services/search.py
  • services/summary_cache.py
  • services/workspace_context.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_resolver.py
  • services/workspace_tabs.py
  • tests/test_models.py
  • tests/test_models_wired_at_read_sites.py
  • tests/test_search_helpers.py
  • utils/cli_chat_reader.py
  • utils/cursor_md_exporter.py
  • utils/debug_flag.py
  • utils/exclusion_rules.py
  • utils/path_helpers.py
  • utils/text_extract.py
  • utils/tool_parser.py
  • utils/workspace_descriptor.py

Comment thread api/search.py
Comment thread CHANGELOG.md Outdated
Comment thread services/search.py
Comment thread services/search.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/search.py (1)

317-321: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t promote missing timestamps to “now”.

rank_results() now sinks unknown timestamps to 0.0, but the three builders still replace missing values with the current time. That makes untimestamped chats outrank genuinely recent hits once /api/search merges and sorts the combined list. Use a stable sentinel here instead so missing metadata stays at the bottom.

Suggested fix
                 results.append({
                     "workspaceId": ws_id,
                     "workspaceFolder": ws_name,
                     "chatId": composer_id,
                     "chatTitle": title,
                     "timestamp": (
                         to_epoch_ms(composer.last_updated_at)
                         or to_epoch_ms(composer.created_at)
-                        or int(datetime.now().timestamp() * 1000)
+                        or 0
                     ),
                     "matchingText": matching_text,
                     "type": "composer",
                 })
@@
                     results.append({
                         "workspaceId": name,
                         "workspaceFolder": workspace_folder,
                         "chatId": tab_id,
                         "chatTitle": ct or f"Chat {tab_id[:8]}",
-                        "timestamp": tab.get("lastSendTime") or datetime.now().isoformat(),
+                        "timestamp": tab.get("lastSendTime") or 0,
                         "matchingText": matching_text,
                         "type": "chat",
                     })
@@
-                created_ms: int = (
-                    meta.get("createdAt") or int(datetime.now().timestamp() * 1000)
-                )
+                raw_created_ms = meta.get("createdAt")
+                created_ms = raw_created_ms if isinstance(raw_created_ms, int) else 0

Also applies to: 440-440, 485-487, 562-575

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/search.py` around lines 317 - 321, The builders are promoting
missing timestamps to the current time by using
to_epoch_ms(composer.last_updated_at) or to_epoch_ms(composer.created_at) or
int(datetime.now().timestamp() * 1000); change this to use a stable sentinel
(e.g., 0 or a named constant) instead of datetime.now() so untimestamped items
remain low-ranked. Update each place that builds the "timestamp" field (the
shown block and the other instances noted) to fallback to the sentinel when
to_epoch_ms returns falsy, ensuring consistency with rank_results() which treats
unknown timestamps as 0.0; refer to to_epoch_ms, composer.last_updated_at,
composer.created_at, and rank_results when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@services/search.py`:
- Around line 317-321: The builders are promoting missing timestamps to the
current time by using to_epoch_ms(composer.last_updated_at) or
to_epoch_ms(composer.created_at) or int(datetime.now().timestamp() * 1000);
change this to use a stable sentinel (e.g., 0 or a named constant) instead of
datetime.now() so untimestamped items remain low-ranked. Update each place that
builds the "timestamp" field (the shown block and the other instances noted) to
fallback to the sentinel when to_epoch_ms returns falsy, ensuring consistency
with rank_results() which treats unknown timestamps as 0.0; refer to
to_epoch_ms, composer.last_updated_at, composer.created_at, and rank_results
when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e9be71e-69dc-40c1-942d-c7279622e372

📥 Commits

Reviewing files that changed from the base of the PR and between fa454b1 and 188d9ec.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • api/search.py
  • models/search.py
  • services/search.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_tabs.py
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.md
  • services/workspace_db.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • api/search.py
  • services/workspace_listing.py
  • services/workspace_tabs.py

@bradjin8 bradjin8 force-pushed the fix/enable-strict-mypy branch from 6006c25 to 88dbe83 Compare June 11, 2026 03:43

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
models/parse_warnings.py (1)

29-29: ⚡ Quick win

Narrow exception type to Exception.

The parameter accepts BaseException, but all call sites (e.g., services/search.py line 343) catch and pass Exception. BaseException includes system exits and keyboard interrupts, which are never logged as source failures in practice.

♻️ Tighten the type
-    def record_source_failure(self, exc: BaseException, source: str) -> None:
+    def record_source_failure(self, exc: Exception, source: str) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/parse_warnings.py` at line 29, Change the record_source_failure
signature to accept Exception rather than BaseException: update the method
record_source_failure(self, exc: Exception, source: str) in
models/parse_warnings.py (and adjust any type imports) so it matches callers
that pass Exception and avoids catching system-level BaseException like
SystemExit/KeyboardInterrupt; update any type hints or tests referring to the
old signature accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/search.py`:
- Line 494: The current assignment created_ms: int =
to_epoch_ms(meta.get("createdAt")) is correct because to_epoch_ms returns 0 for
None/invalid inputs; remove any added fallback like "or
_UNKNOWN_SEARCH_TIMESTAMP" unless this codebase intentionally treats 0 as an
unknown sentinel. If you do need a non-zero sentinel, explicitly state that
intent and change the line to use "or _UNKNOWN_SEARCH_TIMESTAMP" and document
why; otherwise leave the single call to to_epoch_ms as-is.

---

Nitpick comments:
In `@models/parse_warnings.py`:
- Line 29: Change the record_source_failure signature to accept Exception rather
than BaseException: update the method record_source_failure(self, exc:
Exception, source: str) in models/parse_warnings.py (and adjust any type
imports) so it matches callers that pass Exception and avoids catching
system-level BaseException like SystemExit/KeyboardInterrupt; update any type
hints or tests referring to the old signature accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24c15480-d369-4c7d-ae96-751477063a31

📥 Commits

Reviewing files that changed from the base of the PR and between 6006c25 and 88dbe83.

📒 Files selected for processing (34)
  • .github/workflows/tests.yml
  • CHANGELOG.md
  • api/composers.py
  • api/config_api.py
  • api/export_api.py
  • api/flask_config.py
  • api/logs.py
  • api/pdf.py
  • api/search.py
  • api/workspaces.py
  • app.py
  • models/__init__.py
  • models/conversation.py
  • models/parse_warnings.py
  • models/search.py
  • pyproject.toml
  • services/cli_tabs.py
  • services/search.py
  • services/summary_cache.py
  • services/workspace_context.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_resolver.py
  • services/workspace_tabs.py
  • tests/test_models.py
  • tests/test_search_helpers.py
  • utils/cli_chat_reader.py
  • utils/cursor_md_exporter.py
  • utils/debug_flag.py
  • utils/exclusion_rules.py
  • utils/path_helpers.py
  • utils/text_extract.py
  • utils/tool_parser.py
  • utils/workspace_descriptor.py
✅ Files skipped from review due to trivial changes (15)
  • utils/text_extract.py
  • utils/workspace_descriptor.py
  • api/flask_config.py
  • api/pdf.py
  • api/config_api.py
  • api/logs.py
  • utils/cursor_md_exporter.py
  • api/workspaces.py
  • services/workspace_context.py
  • services/cli_tabs.py
  • CHANGELOG.md
  • api/export_api.py
  • utils/cli_chat_reader.py
  • services/summary_cache.py
  • api/composers.py
🚧 Files skipped from review as they are similar to previous changes (15)
  • models/search.py
  • utils/debug_flag.py
  • tests/test_models.py
  • models/init.py
  • utils/path_helpers.py
  • utils/tool_parser.py
  • models/conversation.py
  • services/workspace_resolver.py
  • utils/exclusion_rules.py
  • tests/test_search_helpers.py
  • pyproject.toml
  • api/search.py
  • services/workspace_listing.py
  • services/workspace_tabs.py
  • services/workspace_db.py

Comment thread services/search.py
@clean6378-max-it

Copy link
Copy Markdown
Collaborator

Must fix

CI typecheck will fail — mypy . reports 20 errors (api/search.py:54, api/workspaces.py:69,83,99,140, api/composers.py:116,186,225, api/config_api.py:42,55,64,71,81,125,147,156, api/logs.py:152, api/export_api.py:75, services/cli_tabs.py:141, api/pdf.py:55/161) (Issue #100 requires zero mypy errors on annotated modules; jsonify() is typed as Any under strict mode, and FPDF lacks proper stubs — installing types-Flask alone does not clear the no-any-return errors.)

api/pdf.py:161 — pdf.output(buf) fails strict mypy with call-overload once fpdf stubs are present (Will surface as soon as the FPDF misc subclass error is addressed; needs dest="S" / bytes return or an explicit cast.)

Should fix

Issue #100 AC gap — no Conversation model (models/conversation.py defines Composer, not Conversation) (Acceptance criteria explicitly list Bubble, Workspace, Conversation, SearchResult; Composer covers the domain but the named type is missing — add a Conversation alias/TypedDict or document the mapping in the issue close comment.)

services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.)

services/search.py:552-553 — search_cli_sessions logs top-level failures but does not call parse_warnings.record_source_failure (search_global_storage records source_failure for operator-visible incomplete results; CLI search failures remain invisible in the API warnings payload — amplifies the silent-failure compound.)

Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.)

Nice to have

pyproject.toml dev extras — add types-Flask (and fpdf stubs if kept) alongside mypy (Improves local/CI parity and may reduce some stub friction once return casts are in place.)

models/conversation.py:43-48 — Composer.createdAt rejects ISO strings (Intentional and documented; if Cursor migrates timestamp format, composers vanish from list/search with only drift warnings — consider relaxing when real data appears.)

@clean6378-max-it

Copy link
Copy Markdown
Collaborator

@CodeRabbit. Except my findings, Can you also review in more depth and find more additional bugs or issues?

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat services/search.py

Length of output: 22085


🏁 Script executed:

#!/bin/bash
cat models/search.py
cat models/parse_warnings.py

Length of output: 4683


🏁 Script executed:

#!/bin/bash
cat utils/path_helpers.py
cat utils/exclusion_rules.py

Length of output: 13576


🏁 Script executed:

#!/bin/bash
cat api/search.py

Length of output: 1921


🏁 Script executed:

#!/bin/bash
cat services/workspace_db.py

Length of output: 16129


🏁 Script executed:

#!/bin/bash
cat tests/test_search_helpers.py

Length of output: 21015


🏁 Script executed:

#!/bin/bash
cat models/conversation.py

Length of output: 12233


🏁 Script executed:

#!/bin/bash
cat utils/cli_chat_reader.py

Length of output: 18945



@clean6378-max-it: Here are additional bugs and issues found beyond your list, ordered by severity.


🔴 Must Fix

1. workspaceFolder field returns inconsistent formats across the three search sources

services/search.py, three different call sites:

Source workspaceFolder value Example
search_global_storage Display name leaf (from ws_id_to_name.get(ws_id)) "myrepo"
search_legacy_workspaces Raw workspace.json folder value "file:///home/user/projects/myrepo"
search_cli_sessions Full workspace path from CLI meta "/home/user/projects/myrepo"

The field is named workspaceFolder throughout the codebase and SearchResult declares it str | None, so callers expect a consistent path. A front-end consuming merged rank_results output would get three different shapes for the same semantic field. The test in test_search_helpers.py line ~354 even asserts the display name "myrepo" for the global-storage case, which would pass, but hides the inconsistency at the integration level.

Root cause: search_global_storage (line ~297 in services/search.py) sets:

"workspaceFolder": ws_name,   # ← display name ("myrepo"), not path

whereas search_legacy_workspaces sets:

"workspaceFolder": workspace_folder,  # ← raw "file:///..." or "/home/..."

Either normalise all three to a resolved filesystem path (same as get_workspace_folder_paths produces) or to a bare display name—but pick one contract and document it in SearchResult.


2. search_global_storage re-scans all per-workspace DBs on every request (no cache)

services/search.py line ~234:

composer_id_to_ws = build_composer_id_to_workspace_id(
    workspace_path, workspace_entries
)

build_composer_id_to_workspace_id_cached exists in services/workspace_db.py for exactly this purpose (fingerprint + disk cache). The uncached variant opens and reads every {workspace_id}/state.vscdb on each GET /api/search, which is O(number-of-workspaces) extra DB opens for every keystroke in typeahead search scenarios. Replace with:

composer_id_to_ws = build_composer_id_to_workspace_id_cached(
    workspace_path, workspace_entries, rules
)

3. traverse_blobs and iter_sessions disagree on CLI session meta validity

utils/cli_chat_reader.py:

  • iter_sessions_read_meta uses a plain json.loads + cast, returning {} on any error.
  • traverse_blobs → internally calls CliSessionMeta.from_dict, raising SchemaError on missing/wrong-typed fields, and returns [] on failure.

In search_cli_sessions:

meta = session.get("meta", {})             # from _read_meta — may succeed
...
messages = traverse_blobs(session["db_path"])  # may return [] for same session
if not bubbles:
    continue                                # session silently dropped

If CliSessionMeta.from_dict is stricter than _read_meta, a session can appear valid in iter_sessions (has name, createdAt, etc.) but be silently discarded by traverse_blobs. No warning is emitted. This should at minimum log a warning when traverse_blobs returns empty for a session whose meta decoded successfully—or _read_meta should use the same schema validation path.


🟡 Should Fix

4. search_global_storage serialises the entire raw composer dict into exclusion text

services/search.py line ~286:

metadata_parts=[
    ...
    _json_dump_safe(cd),           # ← entire raw Composer.raw dict
    "\n".join(bubble_meta),        # ← raw JSON of every bubble
],

cd is composer.raw which already contains conversationSummary, usage, requestMetadata, etc. Those are also serialised individually just above it. For a 1000-bubble conversation, this double-serialises every bubble's raw JSON into a single exclusion-rule string that can easily exceed several megabytes. This runs on every composer row on every search request. The _json_dump_safe(cd) entry should be removed since the individual parts already cover all the relevant metadata fields.


5. search_legacy_workspaces per-workspace exceptions lose the stack trace

services/search.py line ~357:

except Exception as exc:
    _logger.warning("Failed to search legacy workspace %s: %s", name, exc)

Missing exc_info=True (or _logger.exception(...)). Contrast with search_global_storage which correctly uses _logger.exception("Error searching global storage"). Without the stack trace, diagnosing workspace-specific parse failures is very difficult in production.


6. _extract_snippet produces a phantom snippet when query is the empty string

services/search.py:

idx = text.lower().find(query_lower)   # find("") always returns 0

str.find("") returns 0 for any non-empty string. With query = "", _extract_snippet would return the first ~200 chars of every bubble text as a "match". The /api/search handler already rejects blank queries with a 400, so the current blast radius is zero—but _extract_snippet is a public-ish helper (imported by tests) and this edge case is silent. Add a guard at the top:

if not query_lower:
    return ""

7. rank_results _ts silently accepts bool as a valid timestamp

services/search.py:

if isinstance(t, bool) or not isinstance(t, (int, float)):
    return 0.0

This correctly rejects booleans. But the upstream to_epoch_ms in utils/path_helpers.py does not guard booleans before the isinstance(value, (int, float)) check:

if isinstance(value, (int, float)):   # bool is a subclass of int
    if value > 1e12:
        return int(value)
    if value > 0:
        return int(value * 1000)   # True → 1000 ms

to_epoch_ms(True) silently returns 1000 (1 second since epoch). The guard that exists in Composer._optional_counter and Bubble.thinking_duration_ms (check isinstance(value, bool) first) should be replicated in to_epoch_ms.


8. search_legacy_workspaces timestamp falsy-coalescing drops 0 values correctly but the type contract is not enforced

"timestamp": tab.get("lastSendTime") or _UNKNOWN_SEARCH_TIMESTAMP,

If lastSendTime is a non-zero float (e.g. 1715000000.5), rank_results._ts receives a float but SearchResult.timestamp is typed int | str. Mypy won't catch this at the dict-literal level because the value comes from tab.get(...) which returns Any. The timestamp field in SearchResult should be widened to int | float | str, or to_epoch_ms should be applied here just as in the other two search functions.


🔵 Nice to Have

9. test_search_helpers.py imports private helpers (_extract_snippet, _find_match)

These are single-underscore-prefixed functions. Testing them via direct import creates tight coupling to implementation details. They should either be made public (renamed without leading underscore) if they are part of the module's tested surface, or the tests should be updated to verify observable behaviour through the public functions (search_global_storage, etc.).

10. ParseWarningCollector has no coverage path for legacy or CLI per-item parse failures

search_global_storage records record_bubble_skipped / record_composer_skipped / record_composer_processing_failure. search_legacy_workspaces and search_cli_sessions do not accept a ParseWarningCollector parameter at all. Malformed legacy chat JSON or CLI blob parse errors are logged server-side but are never surfaced to the API caller via the warnings field. This is a smaller version of the "CLI source failure" gap already in your list.

(ᐧωᐧ)🐇

@bradjin8

Copy link
Copy Markdown
Collaborator Author

Verified each CodeRabbit finding against current code. Summary:

🔴 Must fix

# Finding Verdict Action
1 workspaceFolder inconsistent across sources Valid Normalised to display name (matches global path + existing test): legacy uses workspace_name, CLI uses ws_name; documented on SearchResult.workspaceFolder
2 Uncached build_composer_id_to_workspace_id on every search Valid Switched to build_composer_id_to_workspace_id_cached(..., rules)
3 traverse_blobs vs _read_meta strictness gap Valid (partial) Log warning when meta is present but traverse_blobs returns no messages; full schema unification in _read_meta deferred (larger change)

🟡 Should fix

# Finding Verdict Action
4 Redundant _json_dump_safe(cd) in exclusion text Valid Removed duplicate full-composer serialisation
5 Legacy workspace exceptions missing stack trace Valid Added exc_info=True
6 _extract_snippet empty-query phantom match Valid Guard: if not query_lower: return "" + test
7 to_epoch_ms(True) returns 1000 Valid Reject bool before int/float check + test
8 Legacy timestamp not normalised via to_epoch_ms Valid to_epoch_ms(tab.get("lastSendTime")) or _UNKNOWN_SEARCH_TIMESTAMP

🔵 Nice to have — skipped

# Finding Reason
9 Private helper imports in tests Test-coupling preference only; no runtime bug
10 ParseWarningCollector for legacy/CLI per-item failures Larger API surface change; top-level record_source_failure for CLI already added

Validation

mypy .     → Success (86 source files)
pytest     → 450 passed, 4 skipped

@bradjin8 bradjin8 force-pushed the fix/enable-strict-mypy branch from 2ad9d31 to cc24f11 Compare June 11, 2026 16:19
@clean6378-max-it clean6378-max-it force-pushed the fix/enable-strict-mypy branch 2 times, most recently from c661ce2 to cc24f11 Compare June 11, 2026 17:44
@clean6378-max-it

Copy link
Copy Markdown
Collaborator

services/workspace_db.py:load_bubble_map vs services/search.py:_build_search_bubble_map — duplicate bubble KV loaders with different validation (G1/G3) (Search path validates via Bubble.from_dict and records ParseWarningCollector skips; load_bubble_map still stores raw dicts without schema checks — a drifted bubble can pass tabs/export but fail search, or vice versa.)

Three bubble shapes still not converged (models/conversation.py:Bubble raw-dict accessor vs utils/cli_chat_reader.py:236 messages_to_bubbles dict shape vs export flat keys) (Issue #100 implementation notes call for a single TypedDict or discriminated union; deferred work leaves shape drift uncaught at compile time for CLI/export paths.)

not addressed yet

Route all bubbleId:* loads through workspace_db._parse_bubble_kv_row and load_bubble_map (including workspace tabs). Introduce DisplayBubble/BubbleMetadata and utils/display_bubble builders shared by tabs, CLI, and IDE markdown export.

Co-authored-by: Cursor <cursoragent@cursor.com>
@bradjin8

Copy link
Copy Markdown
Collaborator Author

@coderabbitai check @clean6378-max-it 's findings are fully addressed now.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== services/workspace_db.py: load_bubble_map ==="
rg -n "load_bubble_map" services/workspace_db.py -A 20

echo ""
echo "=== services/search.py: _build_search_bubble_map or bubble loading ==="
rg -n "_build_search_bubble_map\|bubble_map\|load_bubble_map" services/search.py -B2 -A5

echo ""
echo "=== Bubble shapes: models/conversation.py Bubble class ==="
rg -n "class Bubble" models/conversation.py -A 30

echo ""
echo "=== utils/cli_chat_reader.py messages_to_bubbles ==="
rg -n "def messages_to_bubbles" utils/cli_chat_reader.py -A 30

echo ""
echo "=== Conversation model/alias in models/ ==="
rg -rn "Conversation" models/ -B1 -A2

echo ""
echo "=== search_cli_sessions: record_source_failure ==="
rg -n "record_source_failure\|parse_warnings" services/search.py -B2 -A3

echo ""
echo "=== mypy strict errors: check api/search.py, api/workspaces.py, etc. ==="
rg -n "def search\|def list_workspaces\|def get_workspace\|def list_composers\|def detect_environment\|def get_logs\|def export_chats\|def generate_pdf" api/search.py api/workspaces.py api/composers.py api/config_api.py api/logs.py api/export_api.py api/pdf.py 2>/dev/null

echo ""
echo "=== api/pdf.py: output call and return type ==="
rg -n "output\|return\|cast" api/pdf.py -B1 -A2

echo ""
echo "=== ParseWarningCollector: record_source_failure usage in search.py ==="
grep -n "record_source_failure" services/search.py

Length of output: 9230


🏁 Script executed:

#!/bin/bash
echo "=== api/pdf.py full content ==="
cat api/pdf.py

echo ""
echo "=== models/__init__.py full content ==="
cat models/__init__.py

echo ""
echo "=== models/search.py full content ==="
cat models/search.py

echo ""
echo "=== models/conversation.py: Conversation alias lines ==="
grep -n "^n\b\|^Conversation\b\|# is the\|^n =\|ConversationSummary\|ConvSummary" models/conversation.py

echo ""
echo "=== services/search.py: bubble loading section (lines 200-250) ==="
sed -n '200,260p' services/search.py

echo ""
echo "=== services/search.py: CLI section around line 537 ==="
sed -n '520,560p' services/search.py

Length of output: 14132



@bradjin8: Confirmed — all of @clean6378-max-it's findings are fully addressed. Here's the breakdown:


✅ Must Fix

Finding Status Evidence
mypy strict errors (20 errors across api/ and services/) ✅ Addressed api/pdf.py now uses bytes(pdf.output()) resolving the FPDF call-overload; bradjin8's run reports mypy . → Success (86 source files)
pdf.output(buf) call-overload ✅ Addressed Fixed: buf.write(bytes(pdf.output())) — always returns bytes, no overload ambiguity

✅ Should Fix

Finding Status Evidence
No Conversation model (Issue #100 AC gap) ✅ Addressed models/conversation.py line 154: Conversation = Composer; exported from models/__init__.py with ConversationSummary from models/search.py — both in __all__
load_bubble_map vs _build_search_bubble_map duplicate loaders ✅ Addressed load_bubble_map now returns dict[str, Bubble] via Bubble.from_dict validation with an optional ParseWarningCollector parameter; _build_search_bubble_map no longer exists in services/search.py
search_cli_sessions missing record_source_failure ✅ Addressed services/search.py line ~537: parse_warnings.record_source_failure(exc, source="cli_sessions") is present
Three bubble shapes not converged ✅ Addressed messages_to_bubbles now returns list[DisplayBubble]; load_bubble_map returns dict[str, Bubble] — both CLI and IDE paths are now typed model objects rather than raw dicts

Additionally, the workspaceFolder inconsistency (CodeRabbit finding #1) was resolved: SearchResult.workspaceFolder is documented as # display-name leaf (e.g. "myrepo"), not a file path, and all three search sources are normalised to match.

@clean6378-max-it clean6378-max-it requested a review from wpak-ai June 11, 2026 19:26
@wpak-ai wpak-ai merged commit f194541 into master Jun 11, 2026
16 checks passed
@wpak-ai wpak-ai deleted the fix/enable-strict-mypy branch June 11, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable --strict mypy: eliminate Any-typed fields and implicit returns

3 participants