Skip to content

Surface the real inference error; don't report a gripper-overload shutdown as failed#48

Open
nobullryder wants to merge 1 commit into
huggingface:mainfrom
nobullryder:inference-outcome
Open

Surface the real inference error; don't report a gripper-overload shutdown as failed#48
nobullryder wants to merge 1 commit into
huggingface:mainfrom
nobullryder:inference-outcome

Conversation

@nobullryder

Copy link
Copy Markdown

Problem

When a rollout (inference) exits non-zero, the UI just said "failed — check logs", which (a) buries the real error in a log file inside the HF cache, and (b) calls a working run a failure when only shutdown tripped — e.g. disabling torque on a gripper that's still holding an object trips an overload during cleanup.

What this does (all in rollout.py)

  • _extract_error_from_log — pulls the actual exception out of the rollout log so the UI can show it directly.
  • _friendly_hint — a plain-language, actionable headline for the common SO-101 failures (gripper overload, unresponsive motor id 6, can't-connect, camera-too-slow, unsupported resolution, busy serial port).
  • _classify_outcome — a non-zero exit after the rollout main loop started, where the error is a torque-disable/overload on shutdown, is reported as ran_with_warning rather than failed. The skill ran; only cleanup complained.

Tests

tests/test_rollout.py adds coverage for the outcome classification, the hint mapping, and the log-error extraction.

🤖 Generated with Claude Code

…tdown as failed

When a rollout exits non-zero the UI just said "failed — check logs". Three small
improvements in rollout.py:

- _extract_error_from_log: pull the actual exception out of the rollout log so the
  UI can show it directly instead of sending the user digging through the HF cache.
- _friendly_hint: a plain-language, actionable headline for the common SO-101
  failures (gripper overload, unresponsive motor, can't-connect, camera too slow,
  unsupported resolution, busy serial port).
- _classify_outcome: a non-zero exit *after* the rollout main loop started, where
  the error is a torque-disable/overload on shutdown (e.g. disabling torque on a
  gripper still holding an object), is reported as `ran_with_warning` rather than
  `failed` — the skill actually ran; only cleanup tripped.

Tests cover the classification, the hint mapping, and the log-error extraction.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant