feat(discord): internal user token extraction and per-channel incremental export#283
feat(discord): internal user token extraction and per-channel incremental export#283leostar0412 wants to merge 5 commits into
Conversation
…incremental exports and improved staging management
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds workspace-configured internal Discord user-token extraction (Chrome profile LevelDB), token JSON persistence and re-extract flow, refactors exporter to per-channel-per-day outputs, merges exporter JSON into per-day archives by message id, updates orchestration, Docker/Make tooling, scripts, docs, and comprehensive tests. ChangesDiscord Internal Token Extraction and Per-Channel Daily Export
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/service_api/discord_activity_tracker.md (1)
85-97:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument the per-channel incremental lower bound.
This still reads like a guild-wide “latest DB message” resume path, but
run_discord_activity_trackernow resumes each channel from its own latest stored message (or today UTC when empty). The current wording can send operators back to the quiet-channel skip behavior this PR is fixing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/service_api/discord_activity_tracker.md` around lines 85 - 97, Update the docs text around run_discord_activity_tracker/DiscordChatExporter to clarify that the resume lower bound is per-channel, not guild-wide: replace the phrase "the lower bound is the latest stored message time for this guild (and channel allowlist)" with wording that each channel resumes from its own latest stored message time (and if a channel has no stored rows, only today UTC for that channel is exported), and also ensure the note about merging raw archives (`YYYY-MM-DD.json`) and the behavior when both --since and --until are set still applies per-channel; reference run_discord_activity_tracker and DiscordChatExporter in the doc so readers know where the behavior is implemented.
🧹 Nitpick comments (1)
discord_activity_tracker/sync/raw_archive.py (1)
92-98: ⚡ Quick winFix docstring indentation.
The closing line of the docstring (line 98) is indented with 2 spaces instead of 4, inconsistent with Python docstring conventions.
📝 Proposed fix
Returns the number of messages written to the merged file. - """ + """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@discord_activity_tracker/sync/raw_archive.py` around lines 92 - 98, The docstring block in discord_activity_tracker/sync/raw_archive.py that documents merging exporter JSON has its closing triple-quote indented by 2 spaces; adjust the closing triple-quote to use 4-space indentation so it lines up with the other lines of the docstring (making the entire docstring consistently indented and PEP-style), i.e., move the closing """ to the same indentation level as the opening docstring in that function/module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@discord_activity_tracker/sync/chat_exporter.py`:
- Around line 546-553: The current logic sets explicit_after = after_date if not
per_channel_incremental else None which drops a caller-supplied --since when
per_channel_incremental is True; change this so explicit_after always carries
the provided after_date (i.e., do not null it based on per_channel_incremental)
and pass that explicit_after through to resolve_channel_export_after(guild_id,
ch_id, explicit_after=explicit_after) so per-channel incremental uses each
channel's checkpoint only when no explicit --since was supplied. Ensure
resolve_channel_export_after still prefers explicit_after when present.
In `@discord_activity_tracker/sync/exporter_window.py`:
- Around line 104-108: Normalize naive `before` to UTC the same way `after` and
`now` are handled: when computing `upper` in exporter_window.py, treat a naive
`before` as UTC instead of local time by checking `before.tzinfo` and using
`before.replace(tzinfo=timezone.utc)` for naive datetimes, otherwise call
`before.astimezone(timezone.utc)`; keep the existing fallback to `now`. This
ensures `upper`, `before`, `after`, and `now` are consistently UTC-aware.
---
Outside diff comments:
In `@docs/service_api/discord_activity_tracker.md`:
- Around line 85-97: Update the docs text around
run_discord_activity_tracker/DiscordChatExporter to clarify that the resume
lower bound is per-channel, not guild-wide: replace the phrase "the lower bound
is the latest stored message time for this guild (and channel allowlist)" with
wording that each channel resumes from its own latest stored message time (and
if a channel has no stored rows, only today UTC for that channel is exported),
and also ensure the note about merging raw archives (`YYYY-MM-DD.json`) and the
behavior when both --since and --until are set still applies per-channel;
reference run_discord_activity_tracker and DiscordChatExporter in the doc so
readers know where the behavior is implemented.
---
Nitpick comments:
In `@discord_activity_tracker/sync/raw_archive.py`:
- Around line 92-98: The docstring block in
discord_activity_tracker/sync/raw_archive.py that documents merging exporter
JSON has its closing triple-quote indented by 2 spaces; adjust the closing
triple-quote to use 4-space indentation so it lines up with the other lines of
the docstring (making the entire docstring consistently indented and PEP-style),
i.e., move the closing """ to the same indentation level as the opening
docstring in that function/module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3abf2f92-aaff-4575-b53a-9d81ca93cdf9
📒 Files selected for processing (32)
.env.exampleMakefileSECURITY.mdconfig/settings.pydiscord_activity_tracker/README.mddiscord_activity_tracker/management/commands/extract_discord_tokens.pydiscord_activity_tracker/management/commands/run_discord_activity_tracker.pydiscord_activity_tracker/sync/chat_exporter.pydiscord_activity_tracker/sync/exporter_window.pydiscord_activity_tracker/sync/raw_archive.pydiscord_activity_tracker/tests/test_chat_exporter_branch_coverage.pydiscord_activity_tracker/tests/test_discord_internal_tokens_store.pydiscord_activity_tracker/tests/test_discord_tokens.pydiscord_activity_tracker/tests/test_exporter_window.pydiscord_activity_tracker/tests/test_extract_discord_tokens_command.pydiscord_activity_tracker/tests/test_raw_archive.pydiscord_activity_tracker/tests/test_run_command_coverage.pydiscord_activity_tracker/tests/test_run_discord_activity_tracker_command.pydiscord_activity_tracker/tests/test_sync_chat_exporter.pydiscord_activity_tracker/tests/test_task_discord_sync_coverage.pydiscord_activity_tracker/tests/test_workspace.pydiscord_activity_tracker/utils/__init__.pydiscord_activity_tracker/utils/discord_internal_tokens_store.pydiscord_activity_tracker/utils/discord_tokens.pydiscord_activity_tracker/workspace.pydocker-compose.ymldocs/Docker.mddocs/Workspace.mddocs/operations/discord_chat_exporter.mddocs/service_api/discord_activity_tracker.mdscripts/clean-macos.shscripts/wait_discord_chrome_profile.sh
…t and enhance date handling in exports
…for empty databases
Summary
Adds a compliance-gated workflow to extract and persist Discord user tokens from a dedicated Chrome profile (workspace JSON +
extract_discord_tokens), with Docker/noVNC login targets and Makefile helpers mirroring the Slack session flow.run_discord_activity_trackercan load tokens from that JSON at runtime and re-extract on DiscordChatExporter auth failures whenALLOW_INTERNAL_DISCORD_TOKENSis enabled.Also improves scheduled export reliability:
raw/discord_activity_tracker/<server_id>/<channel_id>/YYYY-MM-DD.jsonmerges by message id.Docs,
.env.example,SECURITY.md, and Docker/Makefile ops are updated for the new session profile and token paths.Apps touched
Test plan
python -m pytest discord_activity_tracker/tests/ -vuv run pyright(if typed code changed)lint-imports(if imports or cross-app coupling changed)Docs / coupling
python scripts/generate_service_docs.pyrun (ifservices.pyorcore/protocols.pychanged)docs/updated (if behavior or ops changed)Closes #278
Summary by CodeRabbit
New Features
Configuration
Documentation
Tests