Pr1 part1 by enlorenz · Pull Request #49 · FairRootGroup/powersched

enlorenz · 2026-06-02T12:31:49Z

Part one of three consecutive PRs.

This PR handles the larger part with the newest version of job-age penalty calculation, plus drop logic, both by age and flushing when too many consecutive drops occur during training.

coderabbitai · 2026-06-02T12:32:02Z

📝 Walkthrough

Walkthrough

This PR introduces drop-streak-driven episode termination with reworked pending-work snapshots, launch-based job accounting, expanded episode metrics (including proportional power/cost and lost-total), and a reward refactor using price-phase strengths and intrinsic starvation. CLI, training orchestration, plotting, and analyzers are updated to match the new metrics and accept both "Dropped" and "Lost" labels.

Changes

Drop-streak termination, metrics rework, and reward refactoring

Layer / File(s)	Summary
Log parsing compatibility for Dropped/Lost labels `analyze_arrivalscale_occupancy.py`, `analyze_lambda_occupancy.py`, `analyze_seed_occupancy.py`	Episode and run-level parsing regexes are broadened to accept either `Dropped` or `Lost` terminology when extracting dropped/lost job counts, and arrivalscale parser error text was updated.
Environment drop-streak termination and pending-work refactoring `src/environment.py`	Adds `flush_after_drop_streak` and `DROP_STREAK_TERMINATION_PENALTY`; introduces `_pending_work_summary()`, soft/hard reset behavior, decision-time and end-of-step pending snapshots, workload-flush helpers, and flush-triggered termination that applies a terminal penalty and records flush metadata.
Metrics tracking with launch-based accounting and proportional power/cost `src/metrics_tracker.py`, `src/job_management.py`	Records `jobs_launched` and launch wait times, computes avg wait from launch-based totals, aggregates `jobs_lost_total = dropped + flushed`, expands episode payload with pending/overdue end snapshots, and computes proportional per-core power/cost and savings fields.
Reward calculation rework: price phases, cheap-hour pressure, starvation `src/reward_calculation.py`	Adds `_price_phase_strengths()`, replaces queue-age penalty with a cheap-hour service-pressure `_penalty_job_age()` using decision-time pending demand, adds `_reward_intrinsic_starvation()`, tightens blackout eligibility, routes lost-job penalty through `loss_penalty()`, and extends `calculate()` to accept pending/overdue inputs.
Episode outputs and visualization with new metrics `src/callbacks.py`, `src/evaluation_summary.py`, `src/plotter.py`, `src/workloadgen.py`	Callback logging now emits launched/submitted/completed, completion/loss rates, pending/backlog end snapshots, lost-total fields, and proportional power/cost; evaluation lines include `PendingEnd`/`OverdueEnd`; plotter uses `_episode_launch_wait()` and relabels job-age penalty to "Backlog Pressure"; workload small-burst defaults adjusted.
Training infrastructure: CLI options and orchestration `train.py`, `train_iter.py`	Adds `--jobs-exact-replay-aggregate`, `--dashboard-interval`, and `--flush-after-drop-streak` flags; forwards options into `ComputeClusterEnv`; updates evaluation console to use launched and proportional metrics; gates dashboard plotting; threads `flush_after_drop_streak` through `build_command()` and `run_all_parallel()`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

FairRootGroup/powersched#31: Updates src/reward_calculation.py's price-reward computation including price saturation and negative-price overdrive as part of revised reward shaping.
FairRootGroup/powersched#44: Introduces jobs_lost_total and loss-rate accounting with "Dropped|Lost" label terminology that directly aligns with this PR's metrics refactoring.
FairRootGroup/powersched#24: Expands src/workloadgen.py burst-small configuration, directly related to this PR's burst_small parameter adjustments.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Pr1 part1' is vague and generic, using non-descriptive terms that do not convey meaningful information about the changeset's primary modifications.	Consider using a more descriptive title that reflects the main changes, such as 'Implement job-age penalty calculation and drop-streak termination logic' or similar.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly relates to the changeset by explaining the implementation of job-age penalty calculation and drop/flush behavior during training.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

train_iter.py (1)

369-377: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Restore output_dir in run_all_parallel() and forward it to train.py.

This path is currently broken: Line 521 passes output_dir=..., but run_all_parallel() no longer accepts that parameter, and Line 376 still reads it anyway. train_iter.py will fail before launching any run.

Suggested fix

 def run_all_parallel(combinations, max_parallel, iter_limit_per_step, session, prices,
                      job_durations, jobs, hourly_jobs, job_arrival_scale, jobs_exact_replay,
                      plot_dashboard, dashboard_hours,
                      seeds, seed_sweep, evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args,
-                     no_tui=False):
+                     no_tui=False, output_dir=None):
     multi_seed = len(seeds) > 1
     current_env = os.environ.copy()
     log_dir = make_log_dir(session, output_dir or "sessions")
...
         command = build_command(
             efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight,
             iter_limit_per_step, session, prices, job_durations, jobs, hourly_jobs,
             job_arrival_scale, jobs_exact_replay,
             plot_dashboard, dashboard_hours, seed, seed_sweep,
-            evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args,
+            evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args,
+            output_dir=output_dir,
         )

Also applies to: 384-390, 501-522

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@train_iter.py` around lines 369 - 377, The function run_all_parallel
currently reads output_dir (used in make_log_dir) but does not accept it as a
parameter and therefore fails; update the run_all_parallel signature to include
an output_dir parameter (with same default behavior as before), use that
output_dir when calling make_log_dir(current session log_dir creation), and
ensure every place inside run_all_parallel that spawns or calls train.py (the
subprocess/command builder that passes output_dir=...) forwards this output_dir
through the command/environment; also update any helper call sites that call
run_all_parallel (where applicable) to pass the output_dir argument so the value
flows into make_log_dir and the train.py invocation.

🧹 Nitpick comments (2)

src/reward_calculation.py (1)

472-485: ⚡ Quick win

Duplicate computation of intrinsic_starvation_reward.

intrinsic_starvation_reward is computed twice identically (lines 473-476 and 482-485). The first computation is used in job_age_penalty_weighted, and the second appears intended only for env_print but overwrites the same variable. This is likely a merge artifact.

♻️ Proposed fix: Remove duplicate calculation

         job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm + intrinsic_starvation_reward

-        # 4. Intrinsic anti-starvation signal: once work is overdue, staying near
-        # zero throughput should be structurally worse than doing at least some work.
-        intrinsic_starvation_reward = self._reward_intrinsic_starvation(
-            remaining_overdue_age_core_hours,
-            total_used_cores,
-        )
-
         # 4. penalty for idling nodes

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/reward_calculation.py` around lines 472 - 485, The duplicate call to
self._reward_intrinsic_starvation(...) assigns intrinsic_starvation_reward
twice; remove the second identical computation and reuse the value computed
before job_age_penalty_weighted so it isn't overwritten (or if the second was
intended for logging/printing, use a separate variable name like
intrinsic_starvation_reward_for_print and call _reward_intrinsic_starvation only
once). Update usages around job_age_penalty_weighted and any env_print
references to read the retained variable (or the new distinct name) so the value
isn't lost by the merge artifact.

src/metrics_tracker.py (1)

236-262: 💤 Low value

Consider adding strict=True to zip() calls for proportional power/cost calculations.

These zip() calls combine episode time-series lists (episode_on_nodes, episode_used_cores, episode_price_stats, etc.). If lengths ever mismatch due to a bug elsewhere, data would be silently truncated. Adding strict=True would fail fast instead.

Given the lists are managed together in step(), the risk is low, but strict=True provides a safety net.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/metrics_tracker.py` around lines 236 - 262, Update the zip() calls in the
proportional power/cost calculations so they fail fast on length mismatches: for
the expressions that compute agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and
baseline_off_prop_cost, add strict=True to each zip() invocation that pairs
time-series lists (e.g., self.episode_on_nodes, self.episode_used_cores,
self.episode_price_stats, self.episode_baseline_used_cores,
self.episode_baseline_used_nodes) so a ValueError is raised if the lists differ
in length instead of silently truncating.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/callbacks.py`:
- Around line 25-27: The current _on_step implementation only logs when
env.metrics.current_hour == EPISODE_HOURS-1, which misses episodes terminated
early by the drop-streak flush path; change the condition in _on_step to log
when the episode actually terminates as well (e.g., when env indicates
done/terminated or when env.metrics.has a termination flag) in addition to the
final simulated hour. Update the check in _on_step to something like "if
env.metrics.current_hour == EPISODE_HOURS-1 or env.metrics.terminated (or
env.done/is_done)" so both normal ends and early drop-streak terminations cause
the episode-end metrics to be recorded and sent to TensorBoard.

In `@src/reward_calculation.py`:
- Around line 162-165: The method _price_phase_strengths in RewardCalculator
uses undefined attributes self.PRICE_QUANTILE_LOW and self.PRICE_QUANTILE_HIGH
which will raise AttributeError; add these two class-level constants to
RewardCalculator (e.g., PRICE_QUANTILE_LOW = 0.05 and PRICE_QUANTILE_HIGH = 0.95
or other domain-appropriate quantiles) so the np.quantile call has valid values,
and ensure any tests or usages reference these constants if needed; update
RewardCalculator's class definition to include PRICE_QUANTILE_LOW and
PRICE_QUANTILE_HIGH.

In `@train.py`:
- Line 41: The code currently ties checkpoint cadence to STEPS_PER_ITERATION
(set to 10000), so change it to decouple checkpointing by introducing a
CHECKPOINT_FREQUENCY = 100000 constant and use that when deciding to save
checkpoints instead of STEPS_PER_ITERATION; update the checkpoint decision in
the function responsible for saving (e.g., save_checkpoint or checkpointing
logic in train.py/train_iter.py) to check step % CHECKPOINT_FREQUENCY == 0, and
ensure tensorboard logging is initialized/used (e.g., SummaryWriter or existing
tb_logger) per the repo policy so both train.py and train_iter.py keep 100K
checkpoint cadence and tensorboard logging.
- Line 450: The current guard still allows plotting on the very first loop
because (iters - 1) == 0; update the conditional that checks plotting to
explicitly skip iteration 1 by requiring iters > 1 (or iters != 1) in addition
to the existing modulus check so that args.plot_dashboard and the
STEPS_PER_ITERATION*(iters - 1) % args.dashboard_interval condition only run
starting from iteration 2; modify the line containing the conditional that
references args.plot_dashboard, STEPS_PER_ITERATION, iters, and
args.dashboard_interval accordingly.

---

Outside diff comments:
In `@train_iter.py`:
- Around line 369-377: The function run_all_parallel currently reads output_dir
(used in make_log_dir) but does not accept it as a parameter and therefore
fails; update the run_all_parallel signature to include an output_dir parameter
(with same default behavior as before), use that output_dir when calling
make_log_dir(current session log_dir creation), and ensure every place inside
run_all_parallel that spawns or calls train.py (the subprocess/command builder
that passes output_dir=...) forwards this output_dir through the
command/environment; also update any helper call sites that call
run_all_parallel (where applicable) to pass the output_dir argument so the value
flows into make_log_dir and the train.py invocation.

---

Nitpick comments:
In `@src/metrics_tracker.py`:
- Around line 236-262: Update the zip() calls in the proportional power/cost
calculations so they fail fast on length mismatches: for the expressions that
compute agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and
baseline_off_prop_cost, add strict=True to each zip() invocation that pairs
time-series lists (e.g., self.episode_on_nodes, self.episode_used_cores,
self.episode_price_stats, self.episode_baseline_used_cores,
self.episode_baseline_used_nodes) so a ValueError is raised if the lists differ
in length instead of silently truncating.

In `@src/reward_calculation.py`:
- Around line 472-485: The duplicate call to
self._reward_intrinsic_starvation(...) assigns intrinsic_starvation_reward
twice; remove the second identical computation and reuse the value computed
before job_age_penalty_weighted so it isn't overwritten (or if the second was
intended for logging/printing, use a separate variable name like
intrinsic_starvation_reward_for_print and call _reward_intrinsic_starvation only
once). Update usages around job_age_penalty_weighted and any env_print
references to read the retained variable (or the new distinct name) so the value
isn't lost by the merge artifact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17ba0ab4-b947-40b1-afdb-9b8df3dfcd94

📥 Commits

Reviewing files that changed from the base of the PR and between 9821c32 and afd2369.

📒 Files selected for processing (13)

analyze_arrivalscale_occupancy.py
analyze_lambda_occupancy.py
analyze_seed_occupancy.py
src/callbacks.py
src/environment.py
src/evaluation_summary.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
src/reward_calculation.py
src/workloadgen.py
train.py
train_iter.py

coderabbitai · 2026-06-02T12:38:52Z



-STEPS_PER_ITERATION = 100000
+STEPS_PER_ITERATION = 10000


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Keep checkpoint cadence at 100K even if iterations run at 10K.

This change makes train.py save a checkpoint every 10,000 steps, which breaks the repository’s 100K checkpoint contract. Decouple save frequency from STEPS_PER_ITERATION instead of tying them together.

Suggested fix

STEPS_PER_ITERATION = 10000 +CHECKPOINT_EVERY_STEPS = 100000 ... - model.save(f"{models_dir}/{STEPS_PER_ITERATION * iters}.zip") + total_steps = STEPS_PER_ITERATION * iters + if total_steps % CHECKPOINT_EVERY_STEPS == 0: + model.save(f"{models_dir}/{total_steps}.zip")

As per coding guidelines, {train.py,train_iter.py}: Training scripts (train.py, train_iter.py) must use tensorboard logging and model checkpointing every 100K steps.

Also applies to: 463-465

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@train.py` at line 41, The code currently ties checkpoint cadence to STEPS_PER_ITERATION (set to 10000), so change it to decouple checkpointing by introducing a CHECKPOINT_FREQUENCY = 100000 constant and use that when deciding to save checkpoints instead of STEPS_PER_ITERATION; update the checkpoint decision in the function responsible for saving (e.g., save_checkpoint or checkpointing logic in train.py/train_iter.py) to check step % CHECKPOINT_FREQUENCY == 0, and ensure tensorboard logging is initialized/used (e.g., SummaryWriter or existing tb_logger) per the repo policy so both train.py and train_iter.py keep 100K checkpoint cadence and tensorboard logging.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/reward_calculation.py (1)

475-510: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Logging breakdown double-counts intrinsic_starvation_reward, misrepresenting the reward sum.

intrinsic_starvation_reward is added to job_age_penalty_weighted at line 480, but the log at lines 505-510 shows both job_age_penalty_weighted and intrinsic_starvation_reward as separate addends. This makes the displayed sum not equal to reward, which will mislead debugging and analysis.

Additionally, intrinsic_starvation_reward is computed twice (lines 475-478 and 484-487) with identical logic.

🐛 Proposed fix: compute once and fix log breakdown

-        job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm + intrinsic_starvation_reward
-
-        # 4. Intrinsic anti-starvation signal: once work is overdue, staying near
-        # zero throughput should be structurally worse than doing at least some work.
-        intrinsic_starvation_reward = self._reward_intrinsic_starvation(
-            remaining_overdue_age_core_hours,
-            total_used_cores,
-        )
+        job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm
+        intrinsic_starvation_weighted = intrinsic_starvation_reward  # unweighted per design

-        # 4. penalty for idling nodes
+        # 4. Penalty for idling nodes
         idle_penalty_norm = self._penalty_idle_normalized(num_idle_nodes)
         idle_penalty_weighted = weights.idle_weight * idle_penalty_norm

@@ -496,6 +493,7 @@
             efficiency_reward_weighted
             + price_reward_weighted
             + job_age_penalty_weighted
+            + intrinsic_starvation_weighted
             + idle_penalty_weighted
             + drop_penalty_weighted
         )
@@ -503,8 +501,8 @@
         env_print(
             f"    > $$$TOTAL: {reward:.4f} = "
             f"{efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + "
-            f"{idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} + "
-            f"{intrinsic_starvation_reward:.4f} + {drop_penalty_weighted:.4f}"
+            f"{job_age_penalty_weighted:.4f} + {intrinsic_starvation_weighted:.4f} + "
+            f"{idle_penalty_weighted:.4f} + {drop_penalty_weighted:.4f}"
         )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/reward_calculation.py` around lines 475 - 510, The code computes
intrinsic_starvation_reward twice via _reward_intrinsic_starvation and then adds
it into job_age_penalty_weighted, but the env_print logs
intrinsic_starvation_reward as a separate term causing double-counting and
mismatch with reward; fix by calling _reward_intrinsic_starvation only once
(remove the duplicate computation), ensure intrinsic_starvation_reward is only
included where job_age_penalty_weighted is formed (or split job_age_penalty into
base + intrinsic explicitly), and update the env_print breakdown to reflect the
exact terms summed into reward (use job_age_penalty_weighted in the breakdown
and do not list intrinsic_starvation_reward separately unless you subtract it
from job_age_penalty_weighted and display both consistently). Ensure you update
references to intrinsic_starvation_reward, job_age_penalty_weighted, reward,
_reward_intrinsic_starvation, and env_print accordingly.

train.py (1)

80-80: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the help text and adjust the default to align with iteration size.

The help text incorrectly states "Hours between dashboard plots," but the code uses this value as steps (line 450 modulo check). Additionally, with STEPS_PER_ITERATION = 100000, the default dashboard_interval = 10000 causes the condition (STEPS_PER_ITERATION * (iters - 1)) % args.dashboard_interval == 0 to always evaluate to true, generating a dashboard on every iteration after the first rather than at the intended interval.
📊 Suggested fix
-    parser.add_argument("--dashboard-interval", type=int, default=10000, help="Hours between dashboard plots (default: 10000).")
+    parser.add_argument("--dashboard-interval", type=int, default=100000, help="Steps between dashboard plots (default: 100000).")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@train.py` at line 80, Update the parser argument so the help and default
reflect that the value is in steps, not hours: change the help text for
dashboard_interval to "Steps between dashboard plots" and set its default to the
STEPS_PER_ITERATION constant (instead of the hardcoded 10000) so it aligns with
the modulo check that uses STEPS_PER_ITERATION (the condition
"(STEPS_PER_ITERATION * (iters - 1)) % args.dashboard_interval == 0").

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/reward_calculation.py`:
- Around line 475-510: The code computes intrinsic_starvation_reward twice via
_reward_intrinsic_starvation and then adds it into job_age_penalty_weighted, but
the env_print logs intrinsic_starvation_reward as a separate term causing
double-counting and mismatch with reward; fix by calling
_reward_intrinsic_starvation only once (remove the duplicate computation),
ensure intrinsic_starvation_reward is only included where
job_age_penalty_weighted is formed (or split job_age_penalty into base +
intrinsic explicitly), and update the env_print breakdown to reflect the exact
terms summed into reward (use job_age_penalty_weighted in the breakdown and do
not list intrinsic_starvation_reward separately unless you subtract it from
job_age_penalty_weighted and display both consistently). Ensure you update
references to intrinsic_starvation_reward, job_age_penalty_weighted, reward,
_reward_intrinsic_starvation, and env_print accordingly.

In `@train.py`:
- Line 80: Update the parser argument so the help and default reflect that the
value is in steps, not hours: change the help text for dashboard_interval to
"Steps between dashboard plots" and set its default to the STEPS_PER_ITERATION
constant (instead of the hardcoded 10000) so it aligns with the modulo check
that uses STEPS_PER_ITERATION (the condition "(STEPS_PER_ITERATION * (iters -
1)) % args.dashboard_interval == 0").

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a14621d-18b1-42e3-97f4-4ad9ccf26aa8

📥 Commits

Reviewing files that changed from the base of the PR and between afd2369 and 2d47d4c.

📒 Files selected for processing (6)

analyze_arrivalscale_occupancy.py
analyze_lambda_occupancy.py
analyze_seed_occupancy.py
src/callbacks.py
src/reward_calculation.py
train.py

🚧 Files skipped from review as they are similar to previous changes (4)

analyze_arrivalscale_occupancy.py
analyze_seed_occupancy.py
analyze_lambda_occupancy.py
src/callbacks.py

Apply a direct lost-job penalty of -1.0 for the first dropped job in a step, plus -0.25 for each additional dropped job, capped at 1000 extra drops. This makes repeated job loss increasingly costly while avoiding an unbounded reward hit. Also add proportional power/cost totals, mean prices, and savings percentages to the training summary.

… this allows free shifting within 24 rollout window.

Add an ALLOW_DROP_PENALTY switch to RewardCalculator and gate the dropped-job reward penalty behind it. The flag defaults to true, so the current behavior is preserved: the first lost job costs -1.0, each additional same-step loss adds -0.25, and the extra-drop penalty is capped after 1000 drops.

…ime wait metrics The reward now explicitly teaches “defer into cheap, then serve hard in cheap”: the old blackout penalty no longer fights deferral, cheap hours penalize under-service when backlog exists, overdue backlog after the 24h grace gets punished, and drop_weight now actually scales the drop penalty. I also reset queue/backlog state cleanly between episodes, switched AvgWait to launch-time wait instead of completion-time wait, and added end-of-episode pending/overdue metrics

Add an optional recovery path that flushes outstanding work only when the agent has been dropping jobs for a configured number of consecutive steps. - add --flush-after-drop-streak CLI option - track consecutive dropped-job steps in the environment - flush queue, backlog, and running jobs at episode end once the streak threshold is reached - apply a single flush penalty to the terminal step reward - record flushed jobs separately in metrics and include them in total lost jobs - add regression coverage for both triggered and non-triggered flush cases

…ached.

… pressure`

Log training metrics from the recorded episode summary instead of reading live env counters at the terminal step, so TensorBoard output matches the finalized episode data. - switch callback logging to `metrics.episode_costs[-1]` on actual episode end - add `on_nodes_end` and `used_nodes_end` snapshot fields to episode data - track and export `max_drop_streak` for each episode - include `MaxDropStreak` in the printed episode summary

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/metrics_tracker.py`:
- Around line 234-263: Add length validation before the proportional power and
cost calculations to ensure all metric series have matching lengths. Before the
section that computes agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and
baseline_off_prop_cost, add assertions or validation checks that verify all the
series (episode_on_nodes, episode_used_cores, episode_baseline_used_cores,
episode_baseline_used_nodes, episode_price_stats) have consistent lengths. This
will catch divergence in metric series immediately rather than allowing zip() to
silently truncate to the shortest list and produce incorrect totals.

In `@train.py`:
- Line 227: Several print statements use f-strings without placeholders (e.g.,
print(f"Starting a new model training...") and the other occurrences at the
indicated lines) which triggers Ruff F541; remove the unnecessary "f" prefix on
each plain literal print call so they become normal string literals (e.g.,
change print(f"...") to print("...")) across all flagged occurrences (lines
shown: the "Starting a new model training..." print and the other prints at 262,
314-315, 320, 379-380, 390, 398, 403, 410, 414, 463) ensuring no other f-strings
with actual placeholders are altered.
- Line 80: The CLI allows zero/negative values for --dashboard-interval which
will cause ZeroDivisionError where you use expressions like "... %
args.dashboard_interval"; validate and clamp this value right after parsing (or
in the add_argument call) by checking the parsed args.dashboard_interval and
calling parser.error(...) or resetting to a minimum of 1 if <= 0. Update the
parser.add_argument("--dashboard-interval", ...) handling to enforce a positive
integer (or add a custom type/validator) and ensure all places referencing
args.dashboard_interval (the modulo expressions) can assume a value >= 1.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8c46b22-4404-4317-bd68-29d47a9a53db

📥 Commits

Reviewing files that changed from the base of the PR and between 2d47d4c and 8b484b1.

📒 Files selected for processing (13)

analyze_arrivalscale_occupancy.py
analyze_lambda_occupancy.py
analyze_seed_occupancy.py
src/callbacks.py
src/environment.py
src/evaluation_summary.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
src/reward_calculation.py
src/workloadgen.py
train.py
train_iter.py

🚧 Files skipped from review as they are similar to previous changes (10)

src/workloadgen.py
analyze_lambda_occupancy.py
src/job_management.py
analyze_seed_occupancy.py
analyze_arrivalscale_occupancy.py
src/evaluation_summary.py
src/plotter.py
src/environment.py
train_iter.py
src/reward_calculation.py

coderabbitai · 2026-06-05T14:24:07Z

+        # Proportional (per-core) power: idle_base for all on-nodes + compute delta scaled by core utilization.
+        # Formula per step: COST_IDLE_MW * num_on + (COST_USED_MW - COST_IDLE_MW) * (cores_used / CORES_PER_NODE)
+        _compute_delta_mw = COST_USED_MW - COST_IDLE_MW
+        agent_prop_power_mwh: float = sum(
+            COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE)
+            for on, cores in zip(self.episode_on_nodes, self.episode_used_cores)
+        )
+        # Baseline always has all MAX_NODES on
+        baseline_prop_power_mwh: float = sum(
+            COST_IDLE_MW * MAX_NODES + _compute_delta_mw * (cores / CORES_PER_NODE)
+            for cores in self.episode_baseline_used_cores
+        )
+        # Baseline_off: only used nodes are on (no idle nodes)
+        baseline_off_prop_power_mwh: float = sum(
+            COST_IDLE_MW * used + _compute_delta_mw * (cores / CORES_PER_NODE)
+            for used, cores in zip(self.episode_baseline_used_nodes, self.episode_baseline_used_cores)
+        )
+        # Proportional cost: same as prop power but multiplied by price at each step
+        agent_prop_cost: float = sum(
+            (COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE)) * price
+            for on, cores, price in zip(self.episode_on_nodes, self.episode_used_cores, self.episode_price_stats)
+        )
+        baseline_prop_cost: float = sum(
+            (COST_IDLE_MW * MAX_NODES + _compute_delta_mw * (cores / CORES_PER_NODE)) * price
+            for cores, price in zip(self.episode_baseline_used_cores, self.episode_price_stats)
+        )
+        baseline_off_prop_cost: float = sum(
+            (COST_IDLE_MW * used + _compute_delta_mw * (cores / CORES_PER_NODE)) * price
+            for used, cores, price in zip(self.episode_baseline_used_nodes, self.episode_baseline_used_cores, self.episode_price_stats)
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if per-step metric series lengths diverge before proportional aggregates.

These calculations use zip(...), which silently truncates to the shortest list. If any append path drifts, power/cost totals are undercounted without visibility, corrupting episode reporting.

Suggested fix

+ n_steps = len(self.episode_price_stats) + if len(self.episode_on_nodes) != n_steps or len(self.episode_used_cores) != n_steps: + raise ValueError( + "Agent proportional metrics length mismatch: " + f"on_nodes={len(self.episode_on_nodes)}, " + f"used_cores={len(self.episode_used_cores)}, prices={n_steps}" + ) + if len(self.episode_baseline_used_cores) != n_steps or len(self.episode_baseline_used_nodes) != n_steps: + raise ValueError( + "Baseline proportional metrics length mismatch: " + f"baseline_used_nodes={len(self.episode_baseline_used_nodes)}, " + f"baseline_used_cores={len(self.episode_baseline_used_cores)}, prices={n_steps}" + )

As per coding guidelines, src/metrics_tracker.py must track and report performance metrics.

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 239-239: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

[warning] 249-249: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

[warning] 254-254: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

[warning] 258-258: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

[warning] 262-262: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/metrics_tracker.py` around lines 234 - 263, Add length validation before the proportional power and cost calculations to ensure all metric series have matching lengths. Before the section that computes agent_prop_power_mwh, baseline_prop_power_mwh, baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and baseline_off_prop_cost, add assertions or validation checks that verify all the series (episode_on_nodes, episode_used_cores, episode_baseline_used_cores, episode_baseline_used_nodes, episode_price_stats) have consistent lengths. This will catch divergence in metric series immediately rather than allowing zip() to silently truncate to the shortest list and produce incorrect totals.

coderabbitai · 2026-06-05T14:24:07Z

    add_workloadgen_args(parser)
    parser.add_argument("--plot-dashboard", action="store_true", help="Generate dashboard plot (per-hour panels + cumulative savings).")
    parser.add_argument("--dashboard-hours", type=int, default=24*14, help="Hours to show in dashboard time-series panels (default: 336).")
+    parser.add_argument("--dashboard-interval", type=int, default=10000, help="Hours between dashboard plots (default: 10000).")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard --dashboard-interval against zero/negative values before modulo.

( ... ) % args.dashboard_interval will crash with ZeroDivisionError when the CLI passes 0 (or invalid negative settings).

Suggested fix

args = parser.parse_args() + if args.dashboard_interval <= 0: + parser.error("--dashboard-interval must be a positive integer")

Also applies to: 450-450

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@train.py` at line 80, The CLI allows zero/negative values for --dashboard-interval which will cause ZeroDivisionError where you use expressions like "... % args.dashboard_interval"; validate and clamp this value right after parsing (or in the add_argument call) by checking the parsed args.dashboard_interval and calling parser.error(...) or resetting to a minimum of 1 if <= 0. Update the parser.add_argument("--dashboard-interval", ...) handling to enforce a positive integer (or add a custom type/validator) and ensure all places referencing args.dashboard_interval (the modulo expressions) can assume a value >= 1.

coderabbitai · 2026-06-05T14:24:07Z

        model = PPO.load(latest_model_file, env=env, tensorboard_log=log_dir, n_steps=64, batch_size=64, device=args.device)
    else:
-        print("Starting a new model training...")
+        print(f"Starting a new model training...")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove placeholder-free f-strings to satisfy Ruff F541.

These are plain literals with f prefixes and are flagged as errors.

Suggested fix

- print(f"Starting a new model training...") + print("Starting a new model training...") ... - print(f"=== EVALUATION MODE ===") + print("=== EVALUATION MODE ===") ... - print(f"\n=== CUMULATIVE SAVINGS ANALYSIS ===") + print("\n=== CUMULATIVE SAVINGS ANALYSIS ===") ... - model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback()) + model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())

Also applies to: 262-262, 314-315, 320-320, 379-380, 390-390, 398-398, 403-403, 410-410, 414-414, 463-463

🧰 Tools

🪛 Ruff (0.15.15)

[error] 227-227: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@train.py` at line 227, Several print statements use f-strings without placeholders (e.g., print(f"Starting a new model training...") and the other occurrences at the indicated lines) which triggers Ruff F541; remove the unnecessary "f" prefix on each plain literal print call so they become normal string literals (e.g., change print(f"...") to print("...")) across all flagged occurrences (lines shown: the "Starting a new model training..." print and the other prints at 262, 314-315, 320, 379-380, 390, 398, 403, 410, 414, 463) ensuring no other f-strings with actual placeholders are altered.

rbx · 2026-06-05T17:04:03Z

+
+        # 4. Intrinsic anti-starvation signal: once work is overdue, staying near
+        # zero throughput should be structurally worse than doing at least some work.
+        intrinsic_starvation_reward = self._reward_intrinsic_starvation(


This is computed second time here (previous on line 475)?

rbx · 2026-06-05T17:05:44Z

+            f"    > $$$TOTAL: {reward:.4f} = "
+            f"{efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + "
+            f"{idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} + "
+            f"{intrinsic_starvation_reward:.4f} + {drop_penalty_weighted:.4f}"


This prints both {job_age_penalty_weighted:.4f} + {intrinsic_starvation_reward:.4f}, but job_age_penalty_weighted already contains intrinsic_starvation_reward, so the printed expression doesn't equal reward - it overstates it by one starvation term.

Fix: compute it once, store it, and either (a) include it in job_age_penalty_weighted and don't print it separately, or (b) keep it as a truly separate reward addend and add it to the reward formula directly.

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

enlorenz force-pushed the pr1-part1 branch from afd2369 to 2d47d4c Compare June 5, 2026 13:09

enlorenz marked this pull request as draft June 5, 2026 13:11

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

enlorenz marked this pull request as ready for review June 5, 2026 13:25

enlorenz and others added 9 commits June 5, 2026 16:14

Test job age: Only start penalizing age, when wait time > 0. Ideally,…

025a669

… this allows free shifting within 24 rollout window.

Change that flush happens during the step where the drop streak is re…

19d38b8

…ached.

Terminate episode on drop-streak flush with fixed penalty

8579f59

`Separate intrinsic overdue-starvation reward from cheap-hour backlog…

10374a2

… pressure`

enlorenz force-pushed the pr1-part1 branch from bfdb786 to 8b484b1 Compare June 5, 2026 14:16

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

enlorenz marked this pull request as draft June 5, 2026 14:38

rbx reviewed Jun 5, 2026

View reviewed changes

Conversation

enlorenz commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

rbx Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

rbx Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

rbx Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading