Skip to content

Issue 4800: Reuse metadata driver for over-replicated ledger GC#4801

Open
gaozhangmin wants to merge 3 commits into
apache:masterfrom
gaozhangmin:codex/reuse-gc-zk-client-4172
Open

Issue 4800: Reuse metadata driver for over-replicated ledger GC#4801
gaozhangmin wants to merge 3 commits into
apache:masterfrom
gaozhangmin:codex/reuse-gc-zk-client-4172

Conversation

@gaozhangmin
Copy link
Copy Markdown
Contributor

Descriptions of the changes in this PR:

Main Issue: #xyz

Motivation

Over-replicated ledger cleanup currently instantiates a metadata bookie driver each time the scan-and-compare garbage collector checks for over-replicated ledgers. This repeatedly creates metadata client resources during GC cycles, which is unnecessary and can add overhead to the cleanup path.

Changes

  • Reuse a metadata bookie driver and ledger manager factory inside ScanAndCompareGarbageCollector for over-replicated ledger cleanup.
  • Make ScanAndCompareGarbageCollector closeable and release the cached metadata driver when GarbageCollectorThread shuts down.
  • Keep the existing GC metadata operation rate limiter behavior while reusing the metadata driver.
  • Add a focused unit test to verify the metadata driver is reused across calls and closed correctly.

In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks.


Be sure to do all the following to help us incorporate your contribution
quickly and easily:

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

Copilot AI review requested due to automatic review settings May 27, 2026 07:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds lifecycle management for metadata resources used during over-replicated ledger GC by reusing a single MetadataBookieDriver/LedgerManagerFactory and explicitly closing them during shutdown.

Changes:

  • Cache and reuse MetadataBookieDriver/LedgerManagerFactory in ScanAndCompareGarbageCollector instead of recreating them per GC run.
  • Add close() to ScanAndCompareGarbageCollector and invoke it from GarbageCollectorThread.shutdown().
  • Add a unit test validating driver reuse and cleanup behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
bookkeeper-server/src/test/java/org/apache/bookkeeper/bookie/GarbageCollectorThreadTest.java Adds a test to ensure the metadata driver is reused and then closed/reset.
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java Introduces cached metadata driver/factory, lazy initialization, and explicit close logic.
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java Ensures the garbage collector’s metadata resources are closed during thread shutdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gaozhangmin
Copy link
Copy Markdown
Contributor Author

@hangc0276 PTAL

@void-ptr974
Copy link
Copy Markdown
Contributor

void-ptr974 commented May 27, 2026

Thanks for improving this path and avoiding repeated metadata driver initialization.

One thing that might be worth considering is whether we can wire in the bookie server’s existing MetadataBookieDriver or LedgerManagerFactory here. Since the server already creates and owns those resources through its lifecycle, GC could potentially borrow the factory and only create/close the per-run LedgerUnderreplicationManager. That would keep the ownership/close semantics centralized while still achieving the reuse goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants