Skip to content

fix(filter-wheel): recover from CMD_EXECUTION_ERROR on post-home offset move#559

Open
Alpaca233 wants to merge 3 commits into
Cephla-Lab:masterfrom
Alpaca233:fix/filter-wheel-home-offset-resend
Open

fix(filter-wheel): recover from CMD_EXECUTION_ERROR on post-home offset move#559
Alpaca233 wants to merge 3 commits into
Cephla-Lab:masterfrom
Alpaca233:fix/filter-wheel-home-offset-resend

Conversation

@Alpaca233

Copy link
Copy Markdown
Collaborator

Problem

A microscope crashed at startup with an uncaught exception:

squid.filter_wheel_controller.cephla - ERROR - Filter wheel 1 home succeeded but offset move failed; wheel is at the home reference, not slot 1.
...
control.microcontroller.CommandAborted: firmware reported CMD_EXECUTION_ERROR

The home succeeded, but the post-home offset move to slot 1 was rejected by firmware with CMD_EXECUTION_ERROR. Because MicroscopeAddons.prepare_for_use calls emission_filter_wheel.home() with no error handling, the exception propagated through Microscope.build_from_global_config to main_hcs.py and killed the app before the GUI opened.

Root cause

An asymmetry introduced by #540: that PR taught _move_to_position (normal slot changes) to recover from CMD_EXECUTION_ERROR — acknowledge the abort and resend the MOVETO. But it left the identical MOVETO command inside _home_wheel with no recovery, so the same recoverable rejection is gracefully absorbed during operation yet fatal at startup.

Verified against firmware (firmware/controller): a filter-wheel MOVETO_W returns CMD_EXECUTION_ERROR from exactly two sites (mark_move_failed / report_move_error), both meaning the motor never moved (wheel not enabled, or target outside [xmin,xmax] in tmc4361A_moveTo). So a plain resend is the documented-safe recovery — exactly what _move_to_position already does.

Fix (host-side)

  • squid/filter_wheel_controller/cephla.py_home_wheel's offset move now mirrors _move_to_position: on CommandAborted, acknowledge + resend once. A failed resend or any other error type (e.g. TimeoutError) still propagates; re-homing is not attempted from inside the home path.
  • control/microscope.pyprepare_for_use now contains a filter-wheel homing failure: it logs the error with traceback and continues with the wheel position unknown, so a recoverable hiccup no longer bricks the whole GUI (the operator can re-home). Added a module logger.

Tests (TDD — written first, confirmed failing, then green)

  • _home_wheel resends on CommandAborted and completes homing (W + W2)
  • a failed resend re-raises and leaves the tracked position untouched (W + W2)
  • a TimeoutError on the offset move is not resent (guard against over-reach)
  • prepare_for_use no longer propagates a filter-wheel homing failure

46 passed across the filter-wheel and microscope suites; black-clean at 120 chars.

Out of scope (flagged for later)

A latent firmware robustness gap, not the proven trigger: the W/W2 axes are never range-calibrated (xmin/xmax stay full int32) and finalize_homing_w/w2 ignores the ERR_OUT_OF_RANGE return from setCurrentPosition. Worth hardening separately if filter-wheel CMD_EXECUTION_ERRORs recur on a specific machine.

🤖 Generated with Claude Code

…et move

The post-home offset move in _home_wheel had no recovery for a
CMD_EXECUTION_ERROR, while _move_to_position (normal slot changes) does
(acknowledge + resend). Because home() runs uncaught in
MicroscopeAddons.prepare_for_use during startup, a single recoverable
firmware MOVETO rejection propagated all the way to main_hcs.py and
crashed the app before the GUI opened.

Firmware (../firmware/controller) only emits CMD_EXECUTION_ERROR on a
filter-wheel MOVETO when the motor never moved (wheel not enabled, or
target out of [xmin,xmax]) — so a plain resend is the documented-safe
recovery, exactly as _move_to_position already does.

- cephla.py: _home_wheel's offset move now mirrors _move_to_position —
  on CommandAborted, acknowledge + resend once. A failed resend or any
  other error (e.g. TimeoutError) still propagates; re-homing is not
  attempted from inside the home path.
- microscope.py: prepare_for_use now contains a filter-wheel homing
  failure — log with traceback and continue with position unknown so a
  recoverable hiccup doesn't brick the whole GUI. Added a module logger.

Tests (TDD): resend-on-abort completes homing (W + W2); failed resend
re-raises and leaves tracked position untouched; TimeoutError is not
resent; prepare_for_use no longer propagates a homing failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The post-home offset move's resend recovery added in the previous commit
hand-copied the "issue MOVETO, ack + resend once on CommandAborted" step
from _move_to_position. Extract it into _move_to_usteps_with_resend so
the recovery contract — including the acknowledge_aborted_command()
bookkeeping — lives in one place.

Both _home_wheel's offset move and _move_to_position's first attempt now
call the helper; the divergent fallback stays at each caller
(_move_to_position re-homes on failure, _home_wheel propagates since it
is itself the home path). This also collapses _home_wheel's two
near-identical error-log blocks into one.

Behavior preserved — all 46 filter-wheel + microscope tests pass; only
log wording consolidates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens filter-wheel startup behavior by making post-home offset moves recover from firmware-reported CMD_EXECUTION_ERROR, and by preventing a filter-wheel homing failure from aborting microscope startup (so the GUI can still open).

Changes:

  • Add a shared “ack + resend once” helper for absolute MOVETO commands and use it for both normal slot moves and the post-home offset move.
  • Contain filter-wheel homing errors in MicroscopeAddons.prepare_for_use() by logging the exception (with traceback) and continuing startup.
  • Add tests covering resend behavior during _home_wheel offset move and the new startup containment behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
software/squid/filter_wheel_controller/cephla.py Factor resend logic into a helper and apply it to _move_to_position and _home_wheel offset move.
software/control/microscope.py Catch and log filter-wheel homing failures during startup so the app can continue launching.
software/tests/squid/test_filter_wheel.py Add tests validating resend-on-abort for the post-home offset MOVETO and non-resend on timeout.
software/tests/control/test_microscope.py Add test ensuring filter-wheel homing failures are contained during prepare_for_use().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +187 to +190
except CommandAborted as e:
_log.warning(f"Filter wheel {wheel_id} move aborted ({e}); resending in software...")
self.microcontroller.acknowledge_aborted_command()
self._move_to_usteps(wheel_id, usteps)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in commit 2c23378 — confirmed CommandAborted is raised for three conditions (firmware CMD_EXECUTION_ERROR + ack-timeout/checksum-after-retries). The existing recoverable flag (set at abort_current_command for CMD_EXECUTION_ERROR) is now surfaced on the CommandAborted exception, and _move_to_usteps_with_resend resends only when e.recoverable; non-recoverable aborts re-raise so the caller re-homes (uncertain motor state), matching the direct-TimeoutError path. Added test_non_recoverable_abort_skips_resend_and_rehomes.

Comment thread software/control/microscope.py Outdated
Comment on lines +263 to +267
# A filter-wheel homing failure must not brick the whole
# microscope: the wheel controller already leaves its tracked
# position unset on failure, so come up with the position
# unknown and let the operator re-home from the GUI rather
# than aborting startup before the window even opens.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in commit 2c23378 — reworded; the comment now says the controller leaves its tracked position unchanged (possibly stale) on failure, not unset, matching _home_wheel's docstring.

…RROR) aborts

Addresses Copilot review on PR Cephla-Lab#559.

CommandAborted is raised for three distinct conditions: firmware-reported
CMD_EXECUTION_ERROR (motor never moved — recoverable), and ack-timeout /
checksum-failure after retries (motor state uncertain). The resend helper
previously resent on *any* CommandAborted, contradicting its own contract
that only the "rejected before the motor moved" case is resend-safe.

- microcontroller.py: surface the existing `recoverable` flag on the
  CommandAborted exception (set from abort_current_command).
- cephla.py: _move_to_usteps_with_resend resends only when e.recoverable;
  non-recoverable aborts re-raise so the caller re-homes (motor state
  uncertain), matching how a direct TimeoutError is already handled.
- microscope.py: reword the containment comment — _home_wheel leaves the
  tracked position *unchanged* (possibly stale) on failure, not unset.

Tests: existing resend tests now construct recoverable=True; added
test_non_recoverable_abort_skips_resend_and_rehomes (W + W2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants