fix(main): keep node spin resilient to unrecognised message types by dakejahl · Pull Request #133 · dronecan/gui_tool

dakejahl · 2026-06-09T22:14:23Z

Summary

The GUI tool terminates a few seconds after it starts receiving a DroneCAN message whose data type ID isn't in the loaded DSDL set, and in the lead-up the node list flaps and the UI stalls. That makes it impossible to prototype a new message on a live bus without first building its DSDL into the tool. This makes the local node tolerate unrecognised transfers instead of falling over on them.

Problem

An unrecognised data type ID makes Transfer.from_frames() raise TransferError, and two things go wrong. First, _spin_node counts every such error toward the 1000-strike successive-error guard (and logs a full traceback on each 10 ms spin), so a continuously broadcast unknown message terminates the node within seconds. Second, and more subtly, Node.spin() drains the RX queue and then runs its scheduler, but the exception aborts the drain and skips the scheduler poll. Node-monitor liveness is scheduler-driven — the periodic stale-sweep and the outstanding-request timeouts — so at high message rates the scheduler is starved: nodes flap in and out of the monitor and the UI stalls.

Solution

Wrap Node._recv_frame so an undecodable transfer is reported and dropped in place, letting spin() finish draining the queue and run its scheduler exactly as it does for clean traffic. The catch in _spin_node is kept as a backstop but no longer counts these benign per-transfer errors toward the fatal threshold. Logging is throttled to once per 10 s per distinct error with a suppressed count, so a high-rate unknown message can't flood the log. Raw frames remain visible in the bus monitor, so unknown traffic is still observable while prototyping.

Unrecognised data type IDs (e.g. while prototyping new DSDL that isn't built into the tool) caused two distinct failures. First, they tripped the 1000-strike successive-error guard and terminated the local node after a few seconds, logging a full traceback on every 10 ms spin. Second, and more subtly: Node.spin() drains the RX queue and then runs its scheduler, but when Transfer.from_frames() raises on an undecodable transfer the drain aborts and the scheduler poll is skipped. The node monitor's liveness is scheduler-driven (periodic stale-sweep and outstanding-request timeouts), so at high message rates the scheduler is starved -- nodes flap in and out of the monitor and the UI stalls. Wrap Node._recv_frame so an undecodable transfer is dropped in place, letting spin() finish draining the queue and schedule normally. Keep a backstop catch in the spin loop that no longer counts these towards the fatal threshold, and throttle the logging so a high-rate unknown message cannot flood the log.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(main): keep node spin resilient to unrecognised message types#133

fix(main): keep node spin resilient to unrecognised message types#133
dakejahl wants to merge 1 commit into
dronecan:masterfrom
dakejahl:fix/graceful-unknown-message-handling

dakejahl commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dakejahl commented Jun 9, 2026

Summary

Problem

Solution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant