Conversation
… in parallel rather one one after another
There was a problem hiding this comment.
Pull request overview
This PR updates the GitHub ETL entrypoint to process multiple configured repositories concurrently (rather than sequentially), with a bounded worker pool and added tests to validate worker-count resolution and concurrency behavior.
Changes:
- Add a thread pool (
ThreadPoolExecutor) to process repositories in parallel with a configurable worker cap. - Extract per-repository ETL logic into a dedicated
process_repo()function with a per-threadrequests.Session. - Add unit tests for
_resolve_max_workers()and a concurrency test to ensure repositories overlap in execution.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| main.py | Introduces bounded parallel per-repo processing via ThreadPoolExecutor, plus _resolve_max_workers() and process_repo() to isolate per-repo ETL work. |
| tests/test_main.py | Adds tests for worker resolution logic and verifies repositories are processed concurrently. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
shtrom
left a comment
There was a problem hiding this comment.
Some notes for later, but looking good!
| - **Secondary** (abuse detection): signaled by a ``Retry-After`` header, | ||
| and frequently *without* ``X-RateLimit-Remaining: 0``. | ||
|
|
||
| A 403/429 carrying neither signal (e.g. a genuine permission error) is |
There was a problem hiding this comment.
A 429 should probably still be treated as rate-limited, with a default or exponential backoff.
| ) | ||
|
|
||
| if resp.status_code == 200: | ||
| if resp.status_code == expected_status: |
There was a problem hiding this comment.
It may be worth making expected_statuses a list and do
if resp.status_code in expected_statuses:
No description provided.