Skip to content

Refactor DICOM tests and enhance hybrid mode functionality#17

Merged
NicholasLeotta99 merged 13 commits into
mainfrom
develop
Jun 17, 2026
Merged

Refactor DICOM tests and enhance hybrid mode functionality#17
NicholasLeotta99 merged 13 commits into
mainfrom
develop

Conversation

@NicholasLeotta99

Copy link
Copy Markdown
Member

This pull request enhances the DICOM preprocessing pipeline by improving how removed scans are tracked, checkpointed, and aggregated, as well as increasing robustness and memory efficiency. The main changes include more consistent handling of removed scan records, improved checkpointing for better resumability, and several bug fixes and code cleanups.

Improvements to removed scan tracking and aggregation:

  • Changed the structure of removed records to always store lists of DataFrames, ensuring consistent tracking of all removed scans throughout filtering and splitting steps. This includes updating how removed scans are appended and aggregated in _filter_worker, _split_worker, and _aggregate_removed. [1] [2] [3] [4] [5]
  • Updated the split worker to always return a DataFrame of removed scans if all scans in a session are discarded, ensuring that no lost scans are missed in the removal log.

Checkpointing and resumability enhancements:

  • Added new helper functions to load checkpoint data for filter, split, and order steps, and updated checkpoint save/load logic to include removed scans and redirections. This allows for more robust resumption and memory management during long preprocessing runs. [1] [2] [3] [4] [5] [6]
  • Modified batch processing in the main pipeline to reload existing checkpoint data before saving, ensuring that all results and removed scans are preserved and memory usage is capped. [1] [2]

Robustness and bug fixes:

  • Improved handling of malformed relocation commands in _relocate_worker, skipping and warning about invalid entries instead of failing.
  • Fixed issues where previously some removed scans could be dropped or overwritten, by always appending and aggregating lists of DataFrames instead of single DataFrames.

Interface and argument consistency:

  • Updated CLI argument names in 02_parseDicom.py from --batch-size and --min-free-gb to --batch_size and --min_free_gb for consistency.
  • Updated shell script 00_preprocess.sh to pass --save_dir and --load_table arguments to the parsing script if SAVE_DIR is set.

General code cleanup and documentation:

  • Added and improved docstrings for worker functions, clarified return values, and cleaned up unused or redundant code paths. [1] [2]

These changes collectively make the preprocessing pipeline more reliable, easier to resume, and better at tracking all data modifications throughout the workflow.

- Updated `test_scanDicom_integration.py` to remove unnecessary logger parameters in function calls.
- Enhanced `test_synthetic_known_result.py` to improve clarity and structure, focusing on the DICOMfilter pipeline and its output validation.
- Introduced new tests for hybrid mode in `test_toolbox_hybrid.py`, validating the correct execution and result ordering of the toolbox's hybrid processing functionality.
@NicholasLeotta99 NicholasLeotta99 merged commit 54da3ec into main Jun 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants