Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
1e88eb8
Optimize Removed_fully.csv logging loop in 02_parseDicom.py
google-labs-jules[bot] Mar 16, 2026
1655cbc
Merge pull request #12 from TheParraLab/fix-parsedicom-removal-perf-1…
NicholasLeotta99 Mar 16, 2026
ef06dff
Remove commented-out SessionID concatenation in Data_table
NicholasLeotta99 Mar 17, 2026
10cfd85
Remove unused MOVE argument and related code in parseDicom
NicholasLeotta99 Mar 17, 2026
118f2f5
Enhance laterality separation logic in DICOMfilter class
NicholasLeotta99 Mar 17, 2026
98051db
Add filtering process for fully removed sessions and logging enhancem…
NicholasLeotta99 Mar 17, 2026
ea18d43
Refactor DICOM processing logic to enhance scan handling and improve …
NicholasLeotta99 Mar 18, 2026
6a7e6eb
Refactor DICOM processing to improve directory creation and data load…
NicholasLeotta99 Mar 23, 2026
4516e67
fix(DICOMextract): enhance modality extraction and fix glob pattern u…
NicholasLeotta99 Apr 21, 2026
abc01fc
Merge branch 'develop' of https://github.com/TheParraLab/MRI_preproce…
NicholasLeotta99 Apr 21, 2026
768c8d8
Enhance testing framework for DICOM processing
NicholasLeotta99 Apr 22, 2026
3fb9a0a
Removed deprecated web interface
NicholasLeotta99 Apr 23, 2026
3b2770d
Simplifying base container deployment
NicholasLeotta99 Apr 23, 2026
f05fba5
fix(docs): update README for WSL compose file and clarify DATA_DIRECT…
NicholasLeotta99 Apr 23, 2026
8eff3e1
refactor(docs): update README to enhance clarity and structure for MR…
NicholasLeotta99 Apr 24, 2026
35f0792
refactor: remove disk space check and related logic from DICOM parsin…
NicholasLeotta99 Apr 24, 2026
82bc8f1
Faster DCM detection, remoiving unused variables and imports
NicholasLeotta99 Apr 30, 2026
b44579b
Toggle BASE_PATH for environment flexibility and restore run_with_pro…
NicholasLeotta99 Apr 30, 2026
d507118
fixed profile and checkpoint directory defaults
NicholasLeotta99 Apr 30, 2026
4af85e0
Add coregistration overlay functionality for NIfTI files with error h…
NicholasLeotta99 Apr 30, 2026
6ffcd3b
Refactor DICOM scanning script to use temporary directory for interme…
NicholasLeotta99 May 1, 2026
f291452
Remove unused imports and functions to streamline DICOM scanning script
NicholasLeotta99 May 1, 2026
4d1196f
Enhance scan alignment script with NiftyReg version logging and resto…
NicholasLeotta99 May 1, 2026
3ba5382
Refactor DICOM scanning script to use ScanConfig dataclass for argume…
NicholasLeotta99 May 1, 2026
bbdaa4f
Merge branch 'develop' of https://github.com/TheParraLab/MRI_preproce…
NicholasLeotta99 May 1, 2026
a1379e1
Refactor tests to utilize ScanConfig and improve logging; streamline …
NicholasLeotta99 May 2, 2026
e5462d0
Refactor 01_scanDicom.py to improve logging and streamline DICOM extr…
NicholasLeotta99 May 2, 2026
facf885
Add normalization script for MRI data processing; implement directory…
NicholasLeotta99 May 4, 2026
574cb74
Add functionality to remove checkpoint files after successful DICOM e…
NicholasLeotta99 May 5, 2026
10d4894
Fix default CPU count in multiprocessing argument to ensure at least …
NicholasLeotta99 May 5, 2026
9ab1a13
script overhaul for consistency and performance
NicholasLeotta99 May 5, 2026
ffb51fe
Enhance DICOMfilter to manage Pre_scan and Post_scan flags; adjust fi…
NicholasLeotta99 May 5, 2026
208bf22
Merge branch 'develop' of https://github.com/TheParraLab/MRI_preproce…
NicholasLeotta99 May 5, 2026
0b3b2e5
Add GitHub Actions workflows for Docker build and testing; refactor s…
NicholasLeotta99 May 5, 2026
7eeffd2
Update .gitignore to include JSON files in ignored patterns
NicholasLeotta99 May 5, 2026
2d632fd
Add scripts for comparing and scanning checksum data; implement JSON …
NicholasLeotta99 May 6, 2026
69fe48b
Refactor temporary save directory handling and improve user prompts f…
NicholasLeotta99 May 7, 2026
da178e2
Optimize data aggregation and compilation in DICOM parsing script
NicholasLeotta99 May 11, 2026
601e153
Enhance scan comparison script with detailed loading messages and ref…
NicholasLeotta99 May 11, 2026
100d229
Add checkpointing and batch processing to DICOM filtering script
NicholasLeotta99 May 12, 2026
bf2e572
Refactor checkpoint helpers section and update imports in DICOM parsi…
NicholasLeotta99 May 12, 2026
26ec55c
Refactor main function to improve data filtering logic and enhance lo…
NicholasLeotta99 May 12, 2026
b27467c
Add split checkpointing functionality to DICOM parsing script
NicholasLeotta99 May 12, 2026
46a607a
Add order checkpointing functionality and update DICOM processing logic
NicholasLeotta99 May 12, 2026
3afdba3
Refactor logging in DICOM processing workers to use dedicated worker …
NicholasLeotta99 May 19, 2026
602d2e8
Add error handling for corrupt DICOM files in DICOMsplit class
NicholasLeotta99 May 19, 2026
b274142
Add disk space check in main function to prevent processing with insu…
NicholasLeotta99 May 19, 2026
a3d5421
Add minimum free disk space requirement to configuration
NicholasLeotta99 May 19, 2026
9616f68
Add fallback for symlink creation in _relocate_worker function
NicholasLeotta99 May 19, 2026
e09d979
Add extraction of body part examined in DICOM files
NicholasLeotta99 May 26, 2026
459dbf9
Add disk space check to prevent processing with insufficient storage
NicholasLeotta99 May 26, 2026
537de3d
Add 'BodyPartExamined' and 'Part' fields for DICOM metadata consistency
NicholasLeotta99 May 27, 2026
23aff33
Add functions to save and load split relocations for persistent symli…
NicholasLeotta99 May 28, 2026
1368998
Add support for tracking and logging removed scans during ordering pr…
NicholasLeotta99 Jun 1, 2026
4264334
Clear existing logger handlers before initializing logger
NicholasLeotta99 Jun 1, 2026
439e03e
Add option to export fully removed sessions in build_config
NicholasLeotta99 Jun 1, 2026
0877f46
Refactor DICOMfilter to ensure boolean conversion and use of copy for…
NicholasLeotta99 Jun 1, 2026
81a2b42
Change Pre_scan and Post_scan initialization from integers to boolean…
NicholasLeotta99 Jun 1, 2026
b344076
improvements to memory management and hpc deployment
NicholasLeotta99 Jun 1, 2026
b3d4eea
Enhance DICOMextract to accept pre-computed number of slices and opti…
NicholasLeotta99 Jun 2, 2026
6e4b7ef
Implement single-pass directory discovery and representative selectio…
NicholasLeotta99 Jun 3, 2026
8920b48
Add n_cpus parameter to ScanConfig for configurable parallel processing
NicholasLeotta99 Jun 3, 2026
14a1000
Refactor logging system to support high concurrency with QueueHandler…
NicholasLeotta99 Jun 3, 2026
f0e368b
Improve logger listener management by replacing flush and enqueue wit…
NicholasLeotta99 Jun 3, 2026
ca32894
Add comprehensive tests for logger, parallel runner, and progress bar…
NicholasLeotta99 Jun 3, 2026
4919c15
Update parallel processing method to use 'process' type and include n…
NicholasLeotta99 Jun 3, 2026
83f082e
Refactor tests to handle returned file lists from _find_dicom_worker
NicholasLeotta99 Jun 3, 2026
c6673fe
Refactor DICOM scanning logic to rely on pyd.dcmread() for non-DICOM …
NicholasLeotta99 Jun 3, 2026
b0eb185
Enhance DICOM metadata extraction by utilizing specific tags in pyd.d…
NicholasLeotta99 Jun 3, 2026
4e8568d
Add hybrid parallel processing support in run_function with configura…
NicholasLeotta99 Jun 4, 2026
45566f5
Update InjectionTime tag to use hex format for compatibility with new…
NicholasLeotta99 Jun 4, 2026
3230d34
Update parallel processing type to hybrid in DICOM scanning and extra…
NicholasLeotta99 Jun 4, 2026
06760d3
Add parallel processing support for subdirectory scanning in DICOM ex…
NicholasLeotta99 Jun 4, 2026
05b69f3
Fix multiprocessing deadlock and parallelize directory walk in 01_sca…
google-labs-jules[bot] Jun 4, 2026
25bcad2
Fix test references to multiprocessing logger arguments
google-labs-jules[bot] Jun 4, 2026
dc0bb8b
Merge pull request #13 from TheParraLab/fix-01-scan-dicom-parallel-de…
NicholasLeotta99 Jun 4, 2026
ab129fd
Enhance logging functionality in toolbox.py and 01_scanDicom.py to en…
NicholasLeotta99 Jun 4, 2026
429c720
Fix logging propagation and improve file handler usage in toolbox.py
NicholasLeotta99 Jun 4, 2026
4ec9edb
Update 01_scanDicom.py review: refine summary, enhance metadata extra…
NicholasLeotta99 Jun 5, 2026
7b1e8f9
Implement hybrid chunk worker for improved parallel processing in too…
NicholasLeotta99 Jun 5, 2026
9c0079f
Add Singularity/Apptainer support for HPC deployments
NicholasLeotta99 Jun 11, 2026
1195db7
Add conda/mamba native HPC deployment as third runtime option
NicholasLeotta99 Jun 11, 2026
a345c49
Add --scan-dir/--save-dir passthrough to 00_preprocess.sh for native …
NicholasLeotta99 Jun 11, 2026
eb7a294
Add --start-step, --stop-step, --steps flags to 00_preprocess.sh for …
NicholasLeotta99 Jun 11, 2026
5985580
Restore --scan-dir/--save-dir passthrough to step 01 in 00_preprocess.sh
NicholasLeotta99 Jun 11, 2026
1559f2b
Fix conda runtime: pass --scan-dir/--save-dir through to pipeline and…
NicholasLeotta99 Jun 11, 2026
c79fdaf
fix: name Docker base stage for CI target build
Copilot Jun 11, 2026
18fafa6
Implement test synthetic dataset
NicholasLeotta99 Jun 11, 2026
54c981d
Merge branch 'feature/hpc-singularity' of https://github.com/TheParra…
NicholasLeotta99 Jun 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/docker-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Docker Build

on:
push:
branches: [main, develop, "**"]
paths:
- "control_system/dockerfile"
- "control_system/docker-compose*.yml"
pull_request:
branches: [main, develop]
paths:
- "control_system/dockerfile"
- "control_system/docker-compose*.yml"

jobs:
docker-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Build Docker image
run: |
cd control_system
docker build -t mri_preprocessing_test --target base .
41 changes: 41 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Tests

on:
push:
branches: [main, develop, "**"]
pull_request:
branches: [main, develop]

jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt

- name: Run unit tests (01_scanDicom)
run: |
python -m pytest test/test_scanDicom_unit.py -v

- name: Run full tests (01_scanDicom + 02_parseDicom)
run: |
python -m pytest test/test_scanDicom_full.py -v

- name: Run synthetic known-result tests (01 + 02 deterministic verification)
run: |
python -m pytest test/test_synthetic_known_result.py -v
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,9 @@ tmp/*
import os.py

reset_02.sh
*.log
*.log
*.json
.aider*

# Singularity images (large, site-specific)
*.sif
166 changes: 84 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,66 @@
# MRI Preprocessing Pipeline

A generalized implementation of MRI preprocessing for various ML/AI tasks within the Parra Lab. This project is designed to automate the ingestion, analysis, and processing of raw DICOM MRI data into model-ready inputs.
A modular pipeline for automated MRI DICOM preprocessing. Converts raw DICOM MRI data into model-ready inputs through a series of numbered processing steps.

## Table of Contents

- [Overview](#overview)
- [Key Features](#key-features)
- [Project Structure](#project-structure)
- [Installation](#installation)
- [Usage](#usage)
- [Starting the System](#starting-the-system)
- [Web Control Interface](#web-control-interface)
- [Command Line Interface (CLI)](#command-line-interface-cli)
- [Starting the Container](#starting-the-container)
- [Direct Container Access](#direct-container-access)
- [Running Preprocessing Steps](#running-preprocessing-steps)
- [Preprocessing Workflow](#preprocessing-workflow)
- [Testing](#testing)
- [Contributing](#contributing)
- [Acknowledgements](#acknowledgements)

## Overview

The MRI Preprocessing Pipeline is a modular system built to handle large datasets of MRI scans. It runs within a Docker container to ensure a consistent environment and supports both an interactive web-based control system and a scriptable command-line interface.

The core functionality resides in `code/preprocessing/`, where a series of Python scripts handle everything from DICOM extraction to NIfTI conversion and spatial alignment.

## Key Features

- **Automated Scanning**: Recursively scans directories for MRI DICOM files.
- **Metadata Extraction**: Extracts and standardizes DICOM header information into CSV tables.
- **Intelligent Parsing**: Identifies scan types (T1, T2, etc.) and orders sequences based on acquisition times.
- **Modular Design**: Each step of the pipeline is a standalone script, allowing for flexible execution and debugging.
- **Containerized Environment**: Fully Dockerized setup for easy deployment on Linux and WSL systems.
- **Web Interface**: (In Development) A Flask-based dashboard to monitor and control the processing status.
- **Automated DICOM Scanning**: Recursively scans directories for MRI DICOM files and extracts metadata.
- **Intelligent Parsing**: Identifies scan types, filters artifacts, and orders sequences by acquisition time.
- **NIfTI Conversion**: Converts DICOM series to NIfTI format using dcm2niix.
- **Spatial Alignment**: Coregisters scans to a reference volume.
- **Modular Design**: Each pipeline step is an independent script that can be run manually or in sequence.
- **Containerized**: Docker image with all dependencies pre-installed (Python, pydicom, nibabel, niftyreg, dcm2niix).

## Project Structure

```
MRI_preprocessing/
├── code/
│ └── preprocessing/ # Core python scripts for data processing
│ ├── 01_scanDicom.py # Scans and extracts DICOM metadata
│ ├── 02_parseDicom.py # Filters and orders scans
│ ├── ... # Subsequent processing steps
│ └── preprocessing/ # Core Python preprocessing scripts
│ ├── 01_scanDicom.py # Scan DICOM files and extract metadata
│ ├── 02_parseDicom.py # Filter and order scans
│ ├── 03_saveNifti.py # Convert DICOM to NIfTI
│ ├── 04_saveRAS.py # Reorient to RAS
│ ├── 05_alignScans.py # Coregister scans
│ ├── 06_genInputs.py # Generate model inputs
│ ├── DICOM.py # DICOM handling utilities
│ └── toolbox.py # General helper functions
├── control_system/ # Docker and Web App configuration
│ ├── app/ # Flask web application
│ └── docker* # Docker Compose files
├── data/ # Data storage (mounted volumes)
│ ├── toolbox.py # Shared helper functions
│ └── 00_preprocess.sh # Run full pipeline
├── control_system/ # Docker image and compose files
│ ├── dockerfile # Container image definition
│ ├── docker-compose.yml # Linux compose file
│ ├── docker-compose-wsl.yml # WSL compose file
│ ├── startup.sh # Container entrypoint
│ └── README.md # Container documentation
├── test/ # Unit and integration tests
├── start_control.sh # Main entry point script
└── install.py # Dependency installation script
├── docs/ # Code reviews and improvement recommendations
├── start_control.sh # Container startup script
├── access_preprocessing.sh # Direct CLI access to container
├── install.py # Docker + NVIDIA toolkit installer (Linux)
├── mount_kirbyPro.sh # Machine-specific mount script
├── requirements.txt # Python runtime dependencies
└── requirements-dev.txt # Development/testing dependencies
```

## Installation

### Prerequisites
- Linux or Windows Subsystem for Linux (WSL2)
- Python 3.x
- Docker & Docker Compose (installed automatically via `install.py` if not present)

- Linux or WSL2
- Python 3.10+
- NVIDIA GPU (for preprocessing acceleration)

### Steps

Expand All @@ -67,85 +70,84 @@ MRI_preprocessing/
cd MRI_preprocessing
```

2. **Install dependencies and setup Docker:**
2. **Install Docker and NVIDIA Container Toolkit:**
```bash
python3 install.py
sudo python3 install.py
```
*Note: This script attempts to install Docker and configure GPU access. If you prefer, you can install Docker manually.*
*This installs Docker, configures GPU access, and verifies the setup.*

## Usage

### Starting the System

The primary way to interact with the pipeline is through the `start_control.sh` script.
### Starting the Container

```bash
bash start_control.sh
```

You will be prompted to:
1. Enable the webserver component (y/n).
2. Provide the path to your raw DICOM data on the host machine.
You will be prompted for:
1. The path to your raw DICOM data directory
2. The path for NIfTI output

The system maps your local data directory to `/FL_system/data/raw/` inside the Docker container.
The container mounts your host directories into `/FL_system/data/raw/` and `/FL_system/data/nifti/` inside the container.

### Web Control Interface
If enabled, the web interface is accessible at `http://localhost:5000`. It provides a dashboard to view the status of the preprocessing steps.
*(Note: The web interface is currently under active development).*
### Direct Container Access

### Command Line Interface (CLI)
For batch processing or direct control, you can access the container's shell:
While the container is running:

**Option 1: Convenience Script**
```bash
bash access_preprocessing.sh
```

**Option 2: Direct Docker Exec**
This opens an interactive shell inside the container. Navigate to `/FL_system/code/preprocessing/` to run preprocessing scripts.

### Running Preprocessing Steps

Each step can be run manually:

```bash
docker exec -it control bash
cd /FL_system/code/preprocessing/
# Step 1: Scan DICOM files
python 01_scanDicom.py --scan_dir /FL_system/data/raw --save_dir /FL_system/data

# Step 2: Parse and filter
python 02_parseDicom.py --save_dir /FL_system/data

# Full pipeline:
bash /FL_system/code/preprocessing/00_preprocess.sh
```

## Preprocessing Workflow

The pipeline consists of numbered scripts in `code/preprocessing/` that should generally be run in order:
The pipeline consists of numbered scripts that should generally be run in order:

1. **01_scanDicom.py**: Scans raw data and builds a `Data_table.csv` of all found DICOM files.
* *Documentation*: See `code/preprocessing/01_scanDicom.py` for detailed usage and arguments.
2. **02_parseDicom.py**: Filters relevant scans (e.g., T1) and orders them by time.
3. **03_saveNifti.py**: Converts selected DICOM series to NIfTI format.
4. **04_saveRAS.py**: Reorients NIfTI files to RAS orientation.
5. **05_alignScans.py**: Aligns scans to a reference volume.
6. **06_genInputs.py**: Generates final model inputs.
1. **01_scanDicom.py** — Scans raw DICOM data, extracts metadata, produces `Data_table.csv`
2. **02_parseDicom.py** — Filters scans (removes T2, DWI, computed images), orders by trigger time, produces `Data_table_timing.csv`
3. **03_saveNifti.py** — Converts selected DICOM series to NIfTI format using dcm2niix
4. **04_saveRAS.py** — Reorients NIfTI files to RAS orientation
5. **05_alignScans.py** — Coregisters all scans to a reference volume
6. **06_genInputs.py** — Generates numpy inputs for model training

To run a specific step manually inside the container:
```bash
python 01_scanDicom.py --scan_dir /FL_system/data/raw --save_dir /FL_system/data
```
Intermediate outputs:
- `/FL_system/data/Data_table.csv` — DICOM metadata table (step 01 output)
- `/FL_system/data/Data_table_timing.csv` — Filtered and ordered table (step 02 output)
- `/FL_system/data/nifti/` — NIfTI files (step 03 output)
- `/FL_system/data/RAS/` — RAS-oriented NIfTI files (step 04 output)
- `/FL_system/data/coreg/` — Coregistered scans (step 05 output)
- `/FL_system/data/inputs/` — Final model inputs (step 06 output)

## Testing

Unit and integration tests are located in the `test/` directory.

To run tests (ensure you have `pytest` installed):
```bash
pytest test/
```
# Run all tests
pytest test/ -v

## Contributing
# Run unit tests only (fastest)
pytest test/test_scanDicom_unit.py -v

1. Fork the repository.
2. Create a feature branch (`git checkout -b feature/NewFeature`).
3. Commit your changes.
4. Push to the branch.
5. Open a Pull Request.
# Run comprehensive tests
pytest test/test_scanDicom_full.py -v

Please ensure all new code is well-documented and passes existing tests.

## Acknowledgements
- [Parra Lab](https://www.ccny.cuny.edu/bme/people/lucas-parra)
- Contributors: [Add names here]
# Run deterministic known-result tests
pytest test/test_synthetic_known_result.py -v
```

---
*For questions or support, please contact nleotta000@citymail.cuny.edu*
Test coverage for `01_scanDicom.py` is comprehensive (89 tests). See `test/TESTS.md` for the full test suite documentation.
3 changes: 1 addition & 2 deletions access_preprocessing.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

echo "MRI Preprocessing - Direct CLI Access"
echo "====================================="
echo "======================================"
echo ""
echo "This script provides direct access to the preprocessing container"
echo "without starting the webserver component."
Expand All @@ -11,7 +11,6 @@ echo ""
if ! docker ps --format "table {{.Names}}" | grep -q "^control$"; then
echo "Error: The control container is not running."
echo "Please start the system first with: bash start_control.sh"
echo "And choose 'n' when asked about the webserver component."
exit 1
fi

Expand Down
Loading
Loading