Pr1 part1#49
Conversation
📝 WalkthroughWalkthroughThis PR introduces drop-streak-driven episode termination with reworked pending-work snapshots, launch-based job accounting, expanded episode metrics (including proportional power/cost and lost-total), and a reward refactor using price-phase strengths and intrinsic starvation. CLI, training orchestration, plotting, and analyzers are updated to match the new metrics and accept both "Dropped" and "Lost" labels. ChangesDrop-streak termination, metrics rework, and reward refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
train_iter.py (1)
369-377:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRestore
output_dirinrun_all_parallel()and forward it totrain.py.This path is currently broken: Line 521 passes
output_dir=..., butrun_all_parallel()no longer accepts that parameter, and Line 376 still reads it anyway.train_iter.pywill fail before launching any run.Suggested fix
def run_all_parallel(combinations, max_parallel, iter_limit_per_step, session, prices, job_durations, jobs, hourly_jobs, job_arrival_scale, jobs_exact_replay, plot_dashboard, dashboard_hours, seeds, seed_sweep, evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args, - no_tui=False): + no_tui=False, output_dir=None): multi_seed = len(seeds) > 1 current_env = os.environ.copy() log_dir = make_log_dir(session, output_dir or "sessions") ... command = build_command( efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, iter_limit_per_step, session, prices, job_durations, jobs, hourly_jobs, job_arrival_scale, jobs_exact_replay, plot_dashboard, dashboard_hours, seed, seed_sweep, - evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args, + evaluate_savings, eval_months, flush_after_drop_streak, workloadgen_args, + output_dir=output_dir, )Also applies to: 384-390, 501-522
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@train_iter.py` around lines 369 - 377, The function run_all_parallel currently reads output_dir (used in make_log_dir) but does not accept it as a parameter and therefore fails; update the run_all_parallel signature to include an output_dir parameter (with same default behavior as before), use that output_dir when calling make_log_dir(current session log_dir creation), and ensure every place inside run_all_parallel that spawns or calls train.py (the subprocess/command builder that passes output_dir=...) forwards this output_dir through the command/environment; also update any helper call sites that call run_all_parallel (where applicable) to pass the output_dir argument so the value flows into make_log_dir and the train.py invocation.
🧹 Nitpick comments (2)
src/reward_calculation.py (1)
472-485: ⚡ Quick winDuplicate computation of
intrinsic_starvation_reward.
intrinsic_starvation_rewardis computed twice identically (lines 473-476 and 482-485). The first computation is used injob_age_penalty_weighted, and the second appears intended only forenv_printbut overwrites the same variable. This is likely a merge artifact.♻️ Proposed fix: Remove duplicate calculation
job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm + intrinsic_starvation_reward - # 4. Intrinsic anti-starvation signal: once work is overdue, staying near - # zero throughput should be structurally worse than doing at least some work. - intrinsic_starvation_reward = self._reward_intrinsic_starvation( - remaining_overdue_age_core_hours, - total_used_cores, - ) - # 4. penalty for idling nodes🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reward_calculation.py` around lines 472 - 485, The duplicate call to self._reward_intrinsic_starvation(...) assigns intrinsic_starvation_reward twice; remove the second identical computation and reuse the value computed before job_age_penalty_weighted so it isn't overwritten (or if the second was intended for logging/printing, use a separate variable name like intrinsic_starvation_reward_for_print and call _reward_intrinsic_starvation only once). Update usages around job_age_penalty_weighted and any env_print references to read the retained variable (or the new distinct name) so the value isn't lost by the merge artifact.src/metrics_tracker.py (1)
236-262: 💤 Low valueConsider adding
strict=Truetozip()calls for proportional power/cost calculations.These
zip()calls combine episode time-series lists (episode_on_nodes,episode_used_cores,episode_price_stats, etc.). If lengths ever mismatch due to a bug elsewhere, data would be silently truncated. Addingstrict=Truewould fail fast instead.Given the lists are managed together in
step(), the risk is low, butstrict=Trueprovides a safety net.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/metrics_tracker.py` around lines 236 - 262, Update the zip() calls in the proportional power/cost calculations so they fail fast on length mismatches: for the expressions that compute agent_prop_power_mwh, baseline_prop_power_mwh, baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and baseline_off_prop_cost, add strict=True to each zip() invocation that pairs time-series lists (e.g., self.episode_on_nodes, self.episode_used_cores, self.episode_price_stats, self.episode_baseline_used_cores, self.episode_baseline_used_nodes) so a ValueError is raised if the lists differ in length instead of silently truncating.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/callbacks.py`:
- Around line 25-27: The current _on_step implementation only logs when
env.metrics.current_hour == EPISODE_HOURS-1, which misses episodes terminated
early by the drop-streak flush path; change the condition in _on_step to log
when the episode actually terminates as well (e.g., when env indicates
done/terminated or when env.metrics.has a termination flag) in addition to the
final simulated hour. Update the check in _on_step to something like "if
env.metrics.current_hour == EPISODE_HOURS-1 or env.metrics.terminated (or
env.done/is_done)" so both normal ends and early drop-streak terminations cause
the episode-end metrics to be recorded and sent to TensorBoard.
In `@src/reward_calculation.py`:
- Around line 162-165: The method _price_phase_strengths in RewardCalculator
uses undefined attributes self.PRICE_QUANTILE_LOW and self.PRICE_QUANTILE_HIGH
which will raise AttributeError; add these two class-level constants to
RewardCalculator (e.g., PRICE_QUANTILE_LOW = 0.05 and PRICE_QUANTILE_HIGH = 0.95
or other domain-appropriate quantiles) so the np.quantile call has valid values,
and ensure any tests or usages reference these constants if needed; update
RewardCalculator's class definition to include PRICE_QUANTILE_LOW and
PRICE_QUANTILE_HIGH.
In `@train.py`:
- Line 41: The code currently ties checkpoint cadence to STEPS_PER_ITERATION
(set to 10000), so change it to decouple checkpointing by introducing a
CHECKPOINT_FREQUENCY = 100000 constant and use that when deciding to save
checkpoints instead of STEPS_PER_ITERATION; update the checkpoint decision in
the function responsible for saving (e.g., save_checkpoint or checkpointing
logic in train.py/train_iter.py) to check step % CHECKPOINT_FREQUENCY == 0, and
ensure tensorboard logging is initialized/used (e.g., SummaryWriter or existing
tb_logger) per the repo policy so both train.py and train_iter.py keep 100K
checkpoint cadence and tensorboard logging.
- Line 450: The current guard still allows plotting on the very first loop
because (iters - 1) == 0; update the conditional that checks plotting to
explicitly skip iteration 1 by requiring iters > 1 (or iters != 1) in addition
to the existing modulus check so that args.plot_dashboard and the
STEPS_PER_ITERATION*(iters - 1) % args.dashboard_interval condition only run
starting from iteration 2; modify the line containing the conditional that
references args.plot_dashboard, STEPS_PER_ITERATION, iters, and
args.dashboard_interval accordingly.
---
Outside diff comments:
In `@train_iter.py`:
- Around line 369-377: The function run_all_parallel currently reads output_dir
(used in make_log_dir) but does not accept it as a parameter and therefore
fails; update the run_all_parallel signature to include an output_dir parameter
(with same default behavior as before), use that output_dir when calling
make_log_dir(current session log_dir creation), and ensure every place inside
run_all_parallel that spawns or calls train.py (the subprocess/command builder
that passes output_dir=...) forwards this output_dir through the
command/environment; also update any helper call sites that call
run_all_parallel (where applicable) to pass the output_dir argument so the value
flows into make_log_dir and the train.py invocation.
---
Nitpick comments:
In `@src/metrics_tracker.py`:
- Around line 236-262: Update the zip() calls in the proportional power/cost
calculations so they fail fast on length mismatches: for the expressions that
compute agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and
baseline_off_prop_cost, add strict=True to each zip() invocation that pairs
time-series lists (e.g., self.episode_on_nodes, self.episode_used_cores,
self.episode_price_stats, self.episode_baseline_used_cores,
self.episode_baseline_used_nodes) so a ValueError is raised if the lists differ
in length instead of silently truncating.
In `@src/reward_calculation.py`:
- Around line 472-485: The duplicate call to
self._reward_intrinsic_starvation(...) assigns intrinsic_starvation_reward
twice; remove the second identical computation and reuse the value computed
before job_age_penalty_weighted so it isn't overwritten (or if the second was
intended for logging/printing, use a separate variable name like
intrinsic_starvation_reward_for_print and call _reward_intrinsic_starvation only
once). Update usages around job_age_penalty_weighted and any env_print
references to read the retained variable (or the new distinct name) so the value
isn't lost by the merge artifact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 17ba0ab4-b947-40b1-afdb-9b8df3dfcd94
📒 Files selected for processing (13)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pyanalyze_seed_occupancy.pysrc/callbacks.pysrc/environment.pysrc/evaluation_summary.pysrc/job_management.pysrc/metrics_tracker.pysrc/plotter.pysrc/reward_calculation.pysrc/workloadgen.pytrain.pytrain_iter.py
|
|
||
|
|
||
| STEPS_PER_ITERATION = 100000 | ||
| STEPS_PER_ITERATION = 10000 |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Keep checkpoint cadence at 100K even if iterations run at 10K.
This change makes train.py save a checkpoint every 10,000 steps, which breaks the repository’s 100K checkpoint contract. Decouple save frequency from STEPS_PER_ITERATION instead of tying them together.
Suggested fix
STEPS_PER_ITERATION = 10000
+CHECKPOINT_EVERY_STEPS = 100000
...
- model.save(f"{models_dir}/{STEPS_PER_ITERATION * iters}.zip")
+ total_steps = STEPS_PER_ITERATION * iters
+ if total_steps % CHECKPOINT_EVERY_STEPS == 0:
+ model.save(f"{models_dir}/{total_steps}.zip")As per coding guidelines, {train.py,train_iter.py}: Training scripts (train.py, train_iter.py) must use tensorboard logging and model checkpointing every 100K steps.
Also applies to: 463-465
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@train.py` at line 41, The code currently ties checkpoint cadence to
STEPS_PER_ITERATION (set to 10000), so change it to decouple checkpointing by
introducing a CHECKPOINT_FREQUENCY = 100000 constant and use that when deciding
to save checkpoints instead of STEPS_PER_ITERATION; update the checkpoint
decision in the function responsible for saving (e.g., save_checkpoint or
checkpointing logic in train.py/train_iter.py) to check step %
CHECKPOINT_FREQUENCY == 0, and ensure tensorboard logging is initialized/used
(e.g., SummaryWriter or existing tb_logger) per the repo policy so both train.py
and train_iter.py keep 100K checkpoint cadence and tensorboard logging.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/reward_calculation.py (1)
475-510:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLogging breakdown double-counts
intrinsic_starvation_reward, misrepresenting the reward sum.
intrinsic_starvation_rewardis added tojob_age_penalty_weightedat line 480, but the log at lines 505-510 shows bothjob_age_penalty_weightedandintrinsic_starvation_rewardas separate addends. This makes the displayed sum not equal toreward, which will mislead debugging and analysis.Additionally,
intrinsic_starvation_rewardis computed twice (lines 475-478 and 484-487) with identical logic.🐛 Proposed fix: compute once and fix log breakdown
- job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm + intrinsic_starvation_reward - - # 4. Intrinsic anti-starvation signal: once work is overdue, staying near - # zero throughput should be structurally worse than doing at least some work. - intrinsic_starvation_reward = self._reward_intrinsic_starvation( - remaining_overdue_age_core_hours, - total_used_cores, - ) + job_age_penalty_weighted = weights.job_age_weight * job_age_penalty_norm + intrinsic_starvation_weighted = intrinsic_starvation_reward # unweighted per design - # 4. penalty for idling nodes + # 4. Penalty for idling nodes idle_penalty_norm = self._penalty_idle_normalized(num_idle_nodes) idle_penalty_weighted = weights.idle_weight * idle_penalty_norm @@ -496,6 +493,7 @@ efficiency_reward_weighted + price_reward_weighted + job_age_penalty_weighted + + intrinsic_starvation_weighted + idle_penalty_weighted + drop_penalty_weighted ) @@ -503,8 +501,8 @@ env_print( f" > $$$TOTAL: {reward:.4f} = " f"{efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + " - f"{idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} + " - f"{intrinsic_starvation_reward:.4f} + {drop_penalty_weighted:.4f}" + f"{job_age_penalty_weighted:.4f} + {intrinsic_starvation_weighted:.4f} + " + f"{idle_penalty_weighted:.4f} + {drop_penalty_weighted:.4f}" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reward_calculation.py` around lines 475 - 510, The code computes intrinsic_starvation_reward twice via _reward_intrinsic_starvation and then adds it into job_age_penalty_weighted, but the env_print logs intrinsic_starvation_reward as a separate term causing double-counting and mismatch with reward; fix by calling _reward_intrinsic_starvation only once (remove the duplicate computation), ensure intrinsic_starvation_reward is only included where job_age_penalty_weighted is formed (or split job_age_penalty into base + intrinsic explicitly), and update the env_print breakdown to reflect the exact terms summed into reward (use job_age_penalty_weighted in the breakdown and do not list intrinsic_starvation_reward separately unless you subtract it from job_age_penalty_weighted and display both consistently). Ensure you update references to intrinsic_starvation_reward, job_age_penalty_weighted, reward, _reward_intrinsic_starvation, and env_print accordingly.train.py (1)
80-80:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCorrect the help text and adjust the default to align with iteration size.
The help text incorrectly states "Hours between dashboard plots," but the code uses this value as steps (line 450 modulo check). Additionally, with
STEPS_PER_ITERATION = 100000, the defaultdashboard_interval = 10000causes the condition(STEPS_PER_ITERATION * (iters - 1)) % args.dashboard_interval == 0to always evaluate to true, generating a dashboard on every iteration after the first rather than at the intended interval.📊 Suggested fix
- parser.add_argument("--dashboard-interval", type=int, default=10000, help="Hours between dashboard plots (default: 10000).") + parser.add_argument("--dashboard-interval", type=int, default=100000, help="Steps between dashboard plots (default: 100000).")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@train.py` at line 80, Update the parser argument so the help and default reflect that the value is in steps, not hours: change the help text for dashboard_interval to "Steps between dashboard plots" and set its default to the STEPS_PER_ITERATION constant (instead of the hardcoded 10000) so it aligns with the modulo check that uses STEPS_PER_ITERATION (the condition "(STEPS_PER_ITERATION * (iters - 1)) % args.dashboard_interval == 0").
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/reward_calculation.py`:
- Around line 475-510: The code computes intrinsic_starvation_reward twice via
_reward_intrinsic_starvation and then adds it into job_age_penalty_weighted, but
the env_print logs intrinsic_starvation_reward as a separate term causing
double-counting and mismatch with reward; fix by calling
_reward_intrinsic_starvation only once (remove the duplicate computation),
ensure intrinsic_starvation_reward is only included where
job_age_penalty_weighted is formed (or split job_age_penalty into base +
intrinsic explicitly), and update the env_print breakdown to reflect the exact
terms summed into reward (use job_age_penalty_weighted in the breakdown and do
not list intrinsic_starvation_reward separately unless you subtract it from
job_age_penalty_weighted and display both consistently). Ensure you update
references to intrinsic_starvation_reward, job_age_penalty_weighted, reward,
_reward_intrinsic_starvation, and env_print accordingly.
In `@train.py`:
- Line 80: Update the parser argument so the help and default reflect that the
value is in steps, not hours: change the help text for dashboard_interval to
"Steps between dashboard plots" and set its default to the STEPS_PER_ITERATION
constant (instead of the hardcoded 10000) so it aligns with the modulo check
that uses STEPS_PER_ITERATION (the condition "(STEPS_PER_ITERATION * (iters -
1)) % args.dashboard_interval == 0").
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3a14621d-18b1-42e3-97f4-4ad9ccf26aa8
📒 Files selected for processing (6)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pyanalyze_seed_occupancy.pysrc/callbacks.pysrc/reward_calculation.pytrain.py
🚧 Files skipped from review as they are similar to previous changes (4)
- analyze_arrivalscale_occupancy.py
- analyze_seed_occupancy.py
- analyze_lambda_occupancy.py
- src/callbacks.py
Apply a direct lost-job penalty of -1.0 for the first dropped job in a step, plus -0.25 for each additional dropped job, capped at 1000 extra drops. This makes repeated job loss increasingly costly while avoiding an unbounded reward hit. Also add proportional power/cost totals, mean prices, and savings percentages to the training summary.
… this allows free shifting within 24 rollout window.
Add an ALLOW_DROP_PENALTY switch to RewardCalculator and gate the dropped-job reward penalty behind it. The flag defaults to true, so the current behavior is preserved: the first lost job costs -1.0, each additional same-step loss adds -0.25, and the extra-drop penalty is capped after 1000 drops.
…ime wait metrics The reward now explicitly teaches “defer into cheap, then serve hard in cheap”: the old blackout penalty no longer fights deferral, cheap hours penalize under-service when backlog exists, overdue backlog after the 24h grace gets punished, and drop_weight now actually scales the drop penalty. I also reset queue/backlog state cleanly between episodes, switched AvgWait to launch-time wait instead of completion-time wait, and added end-of-episode pending/overdue metrics
Add an optional recovery path that flushes outstanding work only when the agent has been dropping jobs for a configured number of consecutive steps. - add --flush-after-drop-streak CLI option - track consecutive dropped-job steps in the environment - flush queue, backlog, and running jobs at episode end once the streak threshold is reached - apply a single flush penalty to the terminal step reward - record flushed jobs separately in metrics and include them in total lost jobs - add regression coverage for both triggered and non-triggered flush cases
Log training metrics from the recorded episode summary instead of reading live env counters at the terminal step, so TensorBoard output matches the finalized episode data. - switch callback logging to `metrics.episode_costs[-1]` on actual episode end - add `on_nodes_end` and `used_nodes_end` snapshot fields to episode data - track and export `max_drop_streak` for each episode - include `MaxDropStreak` in the printed episode summary
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/metrics_tracker.py`:
- Around line 234-263: Add length validation before the proportional power and
cost calculations to ensure all metric series have matching lengths. Before the
section that computes agent_prop_power_mwh, baseline_prop_power_mwh,
baseline_off_prop_power_mwh, agent_prop_cost, baseline_prop_cost, and
baseline_off_prop_cost, add assertions or validation checks that verify all the
series (episode_on_nodes, episode_used_cores, episode_baseline_used_cores,
episode_baseline_used_nodes, episode_price_stats) have consistent lengths. This
will catch divergence in metric series immediately rather than allowing zip() to
silently truncate to the shortest list and produce incorrect totals.
In `@train.py`:
- Line 227: Several print statements use f-strings without placeholders (e.g.,
print(f"Starting a new model training...") and the other occurrences at the
indicated lines) which triggers Ruff F541; remove the unnecessary "f" prefix on
each plain literal print call so they become normal string literals (e.g.,
change print(f"...") to print("...")) across all flagged occurrences (lines
shown: the "Starting a new model training..." print and the other prints at 262,
314-315, 320, 379-380, 390, 398, 403, 410, 414, 463) ensuring no other f-strings
with actual placeholders are altered.
- Line 80: The CLI allows zero/negative values for --dashboard-interval which
will cause ZeroDivisionError where you use expressions like "... %
args.dashboard_interval"; validate and clamp this value right after parsing (or
in the add_argument call) by checking the parsed args.dashboard_interval and
calling parser.error(...) or resetting to a minimum of 1 if <= 0. Update the
parser.add_argument("--dashboard-interval", ...) handling to enforce a positive
integer (or add a custom type/validator) and ensure all places referencing
args.dashboard_interval (the modulo expressions) can assume a value >= 1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a8c46b22-4404-4317-bd68-29d47a9a53db
📒 Files selected for processing (13)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pyanalyze_seed_occupancy.pysrc/callbacks.pysrc/environment.pysrc/evaluation_summary.pysrc/job_management.pysrc/metrics_tracker.pysrc/plotter.pysrc/reward_calculation.pysrc/workloadgen.pytrain.pytrain_iter.py
🚧 Files skipped from review as they are similar to previous changes (10)
- src/workloadgen.py
- analyze_lambda_occupancy.py
- src/job_management.py
- analyze_seed_occupancy.py
- analyze_arrivalscale_occupancy.py
- src/evaluation_summary.py
- src/plotter.py
- src/environment.py
- train_iter.py
- src/reward_calculation.py
| # Proportional (per-core) power: idle_base for all on-nodes + compute delta scaled by core utilization. | ||
| # Formula per step: COST_IDLE_MW * num_on + (COST_USED_MW - COST_IDLE_MW) * (cores_used / CORES_PER_NODE) | ||
| _compute_delta_mw = COST_USED_MW - COST_IDLE_MW | ||
| agent_prop_power_mwh: float = sum( | ||
| COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE) | ||
| for on, cores in zip(self.episode_on_nodes, self.episode_used_cores) | ||
| ) | ||
| # Baseline always has all MAX_NODES on | ||
| baseline_prop_power_mwh: float = sum( | ||
| COST_IDLE_MW * MAX_NODES + _compute_delta_mw * (cores / CORES_PER_NODE) | ||
| for cores in self.episode_baseline_used_cores | ||
| ) | ||
| # Baseline_off: only used nodes are on (no idle nodes) | ||
| baseline_off_prop_power_mwh: float = sum( | ||
| COST_IDLE_MW * used + _compute_delta_mw * (cores / CORES_PER_NODE) | ||
| for used, cores in zip(self.episode_baseline_used_nodes, self.episode_baseline_used_cores) | ||
| ) | ||
| # Proportional cost: same as prop power but multiplied by price at each step | ||
| agent_prop_cost: float = sum( | ||
| (COST_IDLE_MW * on + _compute_delta_mw * (cores / CORES_PER_NODE)) * price | ||
| for on, cores, price in zip(self.episode_on_nodes, self.episode_used_cores, self.episode_price_stats) | ||
| ) | ||
| baseline_prop_cost: float = sum( | ||
| (COST_IDLE_MW * MAX_NODES + _compute_delta_mw * (cores / CORES_PER_NODE)) * price | ||
| for cores, price in zip(self.episode_baseline_used_cores, self.episode_price_stats) | ||
| ) | ||
| baseline_off_prop_cost: float = sum( | ||
| (COST_IDLE_MW * used + _compute_delta_mw * (cores / CORES_PER_NODE)) * price | ||
| for used, cores, price in zip(self.episode_baseline_used_nodes, self.episode_baseline_used_cores, self.episode_price_stats) | ||
| ) |
There was a problem hiding this comment.
Fail fast if per-step metric series lengths diverge before proportional aggregates.
These calculations use zip(...), which silently truncates to the shortest list. If any append path drifts, power/cost totals are undercounted without visibility, corrupting episode reporting.
Suggested fix
+ n_steps = len(self.episode_price_stats)
+ if len(self.episode_on_nodes) != n_steps or len(self.episode_used_cores) != n_steps:
+ raise ValueError(
+ "Agent proportional metrics length mismatch: "
+ f"on_nodes={len(self.episode_on_nodes)}, "
+ f"used_cores={len(self.episode_used_cores)}, prices={n_steps}"
+ )
+ if len(self.episode_baseline_used_cores) != n_steps or len(self.episode_baseline_used_nodes) != n_steps:
+ raise ValueError(
+ "Baseline proportional metrics length mismatch: "
+ f"baseline_used_nodes={len(self.episode_baseline_used_nodes)}, "
+ f"baseline_used_cores={len(self.episode_baseline_used_cores)}, prices={n_steps}"
+ )As per coding guidelines, src/metrics_tracker.py must track and report performance metrics.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 239-239: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 249-249: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 254-254: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 258-258: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 262-262: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/metrics_tracker.py` around lines 234 - 263, Add length validation before
the proportional power and cost calculations to ensure all metric series have
matching lengths. Before the section that computes agent_prop_power_mwh,
baseline_prop_power_mwh, baseline_off_prop_power_mwh, agent_prop_cost,
baseline_prop_cost, and baseline_off_prop_cost, add assertions or validation
checks that verify all the series (episode_on_nodes, episode_used_cores,
episode_baseline_used_cores, episode_baseline_used_nodes, episode_price_stats)
have consistent lengths. This will catch divergence in metric series immediately
rather than allowing zip() to silently truncate to the shortest list and produce
incorrect totals.
| add_workloadgen_args(parser) | ||
| parser.add_argument("--plot-dashboard", action="store_true", help="Generate dashboard plot (per-hour panels + cumulative savings).") | ||
| parser.add_argument("--dashboard-hours", type=int, default=24*14, help="Hours to show in dashboard time-series panels (default: 336).") | ||
| parser.add_argument("--dashboard-interval", type=int, default=10000, help="Hours between dashboard plots (default: 10000).") |
There was a problem hiding this comment.
Guard --dashboard-interval against zero/negative values before modulo.
( ... ) % args.dashboard_interval will crash with ZeroDivisionError when the CLI passes 0 (or invalid negative settings).
Suggested fix
args = parser.parse_args()
+ if args.dashboard_interval <= 0:
+ parser.error("--dashboard-interval must be a positive integer")Also applies to: 450-450
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@train.py` at line 80, The CLI allows zero/negative values for
--dashboard-interval which will cause ZeroDivisionError where you use
expressions like "... % args.dashboard_interval"; validate and clamp this value
right after parsing (or in the add_argument call) by checking the parsed
args.dashboard_interval and calling parser.error(...) or resetting to a minimum
of 1 if <= 0. Update the parser.add_argument("--dashboard-interval", ...)
handling to enforce a positive integer (or add a custom type/validator) and
ensure all places referencing args.dashboard_interval (the modulo expressions)
can assume a value >= 1.
| model = PPO.load(latest_model_file, env=env, tensorboard_log=log_dir, n_steps=64, batch_size=64, device=args.device) | ||
| else: | ||
| print("Starting a new model training...") | ||
| print(f"Starting a new model training...") |
There was a problem hiding this comment.
Remove placeholder-free f-strings to satisfy Ruff F541.
These are plain literals with f prefixes and are flagged as errors.
Suggested fix
- print(f"Starting a new model training...")
+ print("Starting a new model training...")
...
- print(f"=== EVALUATION MODE ===")
+ print("=== EVALUATION MODE ===")
...
- print(f"\n=== CUMULATIVE SAVINGS ANALYSIS ===")
+ print("\n=== CUMULATIVE SAVINGS ANALYSIS ===")
...
- model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())
+ model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())Also applies to: 262-262, 314-315, 320-320, 379-380, 390-390, 398-398, 403-403, 410-410, 414-414, 463-463
🧰 Tools
🪛 Ruff (0.15.15)
[error] 227-227: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@train.py` at line 227, Several print statements use f-strings without
placeholders (e.g., print(f"Starting a new model training...") and the other
occurrences at the indicated lines) which triggers Ruff F541; remove the
unnecessary "f" prefix on each plain literal print call so they become normal
string literals (e.g., change print(f"...") to print("...")) across all flagged
occurrences (lines shown: the "Starting a new model training..." print and the
other prints at 262, 314-315, 320, 379-380, 390, 398, 403, 410, 414, 463)
ensuring no other f-strings with actual placeholders are altered.
|
|
||
| # 4. Intrinsic anti-starvation signal: once work is overdue, staying near | ||
| # zero throughput should be structurally worse than doing at least some work. | ||
| intrinsic_starvation_reward = self._reward_intrinsic_starvation( |
There was a problem hiding this comment.
This is computed second time here (previous on line 475)?
| f" > $$$TOTAL: {reward:.4f} = " | ||
| f"{efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + " | ||
| f"{idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} + " | ||
| f"{intrinsic_starvation_reward:.4f} + {drop_penalty_weighted:.4f}" |
There was a problem hiding this comment.
This prints both {job_age_penalty_weighted:.4f} + {intrinsic_starvation_reward:.4f}, but job_age_penalty_weighted already contains intrinsic_starvation_reward, so the printed expression doesn't equal reward - it overstates it by one starvation term.
There was a problem hiding this comment.
Fix: compute it once, store it, and either (a) include it in job_age_penalty_weighted and don't print it separately, or (b) keep it as a truly separate reward addend and add it to the reward formula directly.
Part one of three consecutive PRs.
This PR handles the larger part with the newest version of job-age penalty calculation, plus drop logic, both by age and flushing when too many consecutive drops occur during training.