fix(filter-wheel): recover from CMD_EXECUTION_ERROR on post-home offset move#559
fix(filter-wheel): recover from CMD_EXECUTION_ERROR on post-home offset move#559Alpaca233 wants to merge 3 commits into
Conversation
…et move The post-home offset move in _home_wheel had no recovery for a CMD_EXECUTION_ERROR, while _move_to_position (normal slot changes) does (acknowledge + resend). Because home() runs uncaught in MicroscopeAddons.prepare_for_use during startup, a single recoverable firmware MOVETO rejection propagated all the way to main_hcs.py and crashed the app before the GUI opened. Firmware (../firmware/controller) only emits CMD_EXECUTION_ERROR on a filter-wheel MOVETO when the motor never moved (wheel not enabled, or target out of [xmin,xmax]) — so a plain resend is the documented-safe recovery, exactly as _move_to_position already does. - cephla.py: _home_wheel's offset move now mirrors _move_to_position — on CommandAborted, acknowledge + resend once. A failed resend or any other error (e.g. TimeoutError) still propagates; re-homing is not attempted from inside the home path. - microscope.py: prepare_for_use now contains a filter-wheel homing failure — log with traceback and continue with position unknown so a recoverable hiccup doesn't brick the whole GUI. Added a module logger. Tests (TDD): resend-on-abort completes homing (W + W2); failed resend re-raises and leaves tracked position untouched; TimeoutError is not resent; prepare_for_use no longer propagates a homing failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The post-home offset move's resend recovery added in the previous commit hand-copied the "issue MOVETO, ack + resend once on CommandAborted" step from _move_to_position. Extract it into _move_to_usteps_with_resend so the recovery contract — including the acknowledge_aborted_command() bookkeeping — lives in one place. Both _home_wheel's offset move and _move_to_position's first attempt now call the helper; the divergent fallback stays at each caller (_move_to_position re-homes on failure, _home_wheel propagates since it is itself the home path). This also collapses _home_wheel's two near-identical error-log blocks into one. Behavior preserved — all 46 filter-wheel + microscope tests pass; only log wording consolidates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens filter-wheel startup behavior by making post-home offset moves recover from firmware-reported CMD_EXECUTION_ERROR, and by preventing a filter-wheel homing failure from aborting microscope startup (so the GUI can still open).
Changes:
- Add a shared “ack + resend once” helper for absolute MOVETO commands and use it for both normal slot moves and the post-home offset move.
- Contain filter-wheel homing errors in
MicroscopeAddons.prepare_for_use()by logging the exception (with traceback) and continuing startup. - Add tests covering resend behavior during
_home_wheeloffset move and the new startup containment behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| software/squid/filter_wheel_controller/cephla.py | Factor resend logic into a helper and apply it to _move_to_position and _home_wheel offset move. |
| software/control/microscope.py | Catch and log filter-wheel homing failures during startup so the app can continue launching. |
| software/tests/squid/test_filter_wheel.py | Add tests validating resend-on-abort for the post-home offset MOVETO and non-resend on timeout. |
| software/tests/control/test_microscope.py | Add test ensuring filter-wheel homing failures are contained during prepare_for_use(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except CommandAborted as e: | ||
| _log.warning(f"Filter wheel {wheel_id} move aborted ({e}); resending in software...") | ||
| self.microcontroller.acknowledge_aborted_command() | ||
| self._move_to_usteps(wheel_id, usteps) |
There was a problem hiding this comment.
[Claude Code] Fixed in commit 2c23378 — confirmed CommandAborted is raised for three conditions (firmware CMD_EXECUTION_ERROR + ack-timeout/checksum-after-retries). The existing recoverable flag (set at abort_current_command for CMD_EXECUTION_ERROR) is now surfaced on the CommandAborted exception, and _move_to_usteps_with_resend resends only when e.recoverable; non-recoverable aborts re-raise so the caller re-homes (uncertain motor state), matching the direct-TimeoutError path. Added test_non_recoverable_abort_skips_resend_and_rehomes.
| # A filter-wheel homing failure must not brick the whole | ||
| # microscope: the wheel controller already leaves its tracked | ||
| # position unset on failure, so come up with the position | ||
| # unknown and let the operator re-home from the GUI rather | ||
| # than aborting startup before the window even opens. |
There was a problem hiding this comment.
[Claude Code] Fixed in commit 2c23378 — reworded; the comment now says the controller leaves its tracked position unchanged (possibly stale) on failure, not unset, matching _home_wheel's docstring.
…RROR) aborts Addresses Copilot review on PR Cephla-Lab#559. CommandAborted is raised for three distinct conditions: firmware-reported CMD_EXECUTION_ERROR (motor never moved — recoverable), and ack-timeout / checksum-failure after retries (motor state uncertain). The resend helper previously resent on *any* CommandAborted, contradicting its own contract that only the "rejected before the motor moved" case is resend-safe. - microcontroller.py: surface the existing `recoverable` flag on the CommandAborted exception (set from abort_current_command). - cephla.py: _move_to_usteps_with_resend resends only when e.recoverable; non-recoverable aborts re-raise so the caller re-homes (motor state uncertain), matching how a direct TimeoutError is already handled. - microscope.py: reword the containment comment — _home_wheel leaves the tracked position *unchanged* (possibly stale) on failure, not unset. Tests: existing resend tests now construct recoverable=True; added test_non_recoverable_abort_skips_resend_and_rehomes (W + W2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
A microscope crashed at startup with an uncaught exception:
The home succeeded, but the post-home offset move to slot 1 was rejected by firmware with
CMD_EXECUTION_ERROR. BecauseMicroscopeAddons.prepare_for_usecallsemission_filter_wheel.home()with no error handling, the exception propagated throughMicroscope.build_from_global_configtomain_hcs.pyand killed the app before the GUI opened.Root cause
An asymmetry introduced by #540: that PR taught
_move_to_position(normal slot changes) to recover fromCMD_EXECUTION_ERROR— acknowledge the abort and resend the MOVETO. But it left the identical MOVETO command inside_home_wheelwith no recovery, so the same recoverable rejection is gracefully absorbed during operation yet fatal at startup.Verified against firmware (
firmware/controller): a filter-wheelMOVETO_WreturnsCMD_EXECUTION_ERRORfrom exactly two sites (mark_move_failed/report_move_error), both meaning the motor never moved (wheel not enabled, or target outside[xmin,xmax]intmc4361A_moveTo). So a plain resend is the documented-safe recovery — exactly what_move_to_positionalready does.Fix (host-side)
squid/filter_wheel_controller/cephla.py—_home_wheel's offset move now mirrors_move_to_position: onCommandAborted, acknowledge + resend once. A failed resend or any other error type (e.g.TimeoutError) still propagates; re-homing is not attempted from inside the home path.control/microscope.py—prepare_for_usenow contains a filter-wheel homing failure: it logs the error with traceback and continues with the wheel position unknown, so a recoverable hiccup no longer bricks the whole GUI (the operator can re-home). Added a module logger.Tests (TDD — written first, confirmed failing, then green)
_home_wheelresends onCommandAbortedand completes homing (W + W2)TimeoutErroron the offset move is not resent (guard against over-reach)prepare_for_useno longer propagates a filter-wheel homing failure46 passedacross the filter-wheel and microscope suites; black-clean at 120 chars.Out of scope (flagged for later)
A latent firmware robustness gap, not the proven trigger: the W/W2 axes are never range-calibrated (
xmin/xmaxstay full int32) andfinalize_homing_w/w2ignores theERR_OUT_OF_RANGEreturn fromsetCurrentPosition. Worth hardening separately if filter-wheelCMD_EXECUTION_ERRORs recur on a specific machine.🤖 Generated with Claude Code