Skip to content

Bug 2044654 - Update the Github ETL process to run repository exports in parallel rather one one after another#18

Merged
dklawren merged 7 commits into
mainfrom
2044654
Jun 17, 2026
Merged

Bug 2044654 - Update the Github ETL process to run repository exports in parallel rather one one after another#18
dklawren merged 7 commits into
mainfrom
2044654

Conversation

@dklawren

@dklawren dklawren commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings June 3, 2026 18:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the GitHub ETL entrypoint to process multiple configured repositories concurrently (rather than sequentially), with a bounded worker pool and added tests to validate worker-count resolution and concurrency behavior.

Changes:

  • Add a thread pool (ThreadPoolExecutor) to process repositories in parallel with a configurable worker cap.
  • Extract per-repository ETL logic into a dedicated process_repo() function with a per-thread requests.Session.
  • Add unit tests for _resolve_max_workers() and a concurrency test to ensure repositories overlap in execution.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
main.py Introduces bounded parallel per-repo processing via ThreadPoolExecutor, plus _resolve_max_workers() and process_repo() to isolate per-repo ETL work.
tests/test_main.py Adds tests for worker resolution logic and verifies repositories are processed concurrently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread main.py Outdated
dklawren and others added 2 commits June 3, 2026 16:15
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py
Comment thread main.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
Comment thread main.py Outdated
@dklawren dklawren merged commit 17ff767 into main Jun 17, 2026
5 checks passed
@dklawren dklawren deleted the 2044654 branch June 17, 2026 18:36

@shtrom shtrom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for later, but looking good!

Comment thread main.py
- **Secondary** (abuse detection): signaled by a ``Retry-After`` header,
and frequently *without* ``X-RateLimit-Remaining: 0``.

A 403/429 carrying neither signal (e.g. a genuine permission error) is

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 429 should probably still be treated as rate-limited, with a default or exponential backoff.

Comment thread main.py
)

if resp.status_code == 200:
if resp.status_code == expected_status:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth making expected_statuses a list and do

if resp.status_code in expected_statuses:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants