diff --git a/README.md b/README.md index 0d340ee..c714227 100644 --- a/README.md +++ b/README.md @@ -5,27 +5,42 @@ [![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE) ![Core dependencies](https://img.shields.io/badge/core-numpy%20%2B%20matplotlib-orange) -**Robots observe, act, fail, retry, update beliefs, and replan.** -This repo shows that loop in small, readable Python — no ROS, no GPU, no -simulator. Just `numpy + matplotlib`. - -[Open the example gallery](https://rsasaki0109.github.io/PythonInteractiveRobotics/), -[try the live playground](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html), or open a -[shareable live trace](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html?scenario=household&answer=red&compare=1&autoplay=1), -or jump straight into the first runnable loop below. You can also run the -flagship loops directly in Colab: -[pick and retry](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/pick_and_retry.ipynb), -[safety filter](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/safety_filter_cbf.ipynb), and -[human correction replanning](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/human_correction_replanning.ipynb). -For language ambiguity, try -[clarifying question](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/clarifying_question.ipynb), or run the integrated -[household task agent](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/household_task_agent.ipynb). -If the project helps you teach, prototype, or explain robotics loops, a GitHub -star helps others find it. +### A tiny robot failure lab -| Avoiding | Reaching under occlusion | Mapping while uncertain | -| --- | --- | --- | -| ![A point robot's naive go-to-goal velocity is projected onto a CBF safe set at every step. The policy itself never knows the obstacles exist - a separate runtime safety filter slides it around them.](docs/assets/gifs/safety_filter_cbf.gif) | ![A 2-link arm predicts a briefly occluded moving target, keeps servoing through the occlusion, and reaches the intercept point when the target reappears.](docs/assets/gifs/moving_target_reaching.gif) | ![A toy active-SLAM agent shrinks pose belief and occupancy belief at the same time, by picking moves that maximize expected entropy drop.](docs/assets/gifs/active_slam_toy.gif) | +**Robotics tutorials often assume actions succeed. This repo teaches what +happens when they do not.** + +Watch a robot miss a grasp, update its belief, and recover — in pure Python. +No ROS. No GPU. No simulator. Just `numpy + matplotlib`. + +![Side by side on the same tabletop task. Left, a naive picker locks onto its first guess and keeps grabbing the same empty spot until it gives up after eight misses. Right, a failure-aware agent looks from a better viewpoint, updates its belief about where the object is, and recovers the grasp in three tries.](docs/assets/gifs/naive_vs_failure_aware.gif) + +*Same task, same seed. Left: a naive picker that never updates its guess keeps +missing and gives up. Right: the failure-aware agent looks, updates its belief, +and recovers. That gap is the whole repo.* + +[▶ Run in your browser](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html?scenario=household&answer=red&compare=1&autoplay=1) + · [Start with `01_pick_and_retry.py`](#try-it) + · [Take the 10-lesson tour](lessons/README.md) + · [Browse all loops](https://rsasaki0109.github.io/PythonInteractiveRobotics/) + +The whole repo is one loop, written out small enough to read: + +```python +obs = env.reset(seed=0) +agent.reset() + +for t in range(max_steps): + action = agent.act(obs) # think + obs, reward, done, info = env.step(action) # act + agent.update(obs, reward, info) # observe failure, update belief + if done: + break # ...else replan and retry +``` + +`info["failure"]` is a first-class part of that loop — grasp misses, occlusion, +localization drift, blocked paths. The interesting behaviour is what the robot +does *after* it fails. ## Try it @@ -39,23 +54,26 @@ python3 examples/manipulation/01_pick_and_retry.py A tiny tabletop robot misses a grasp, updates its belief, and retries — in under 5 seconds. Core dependencies are `numpy` and `matplotlib` only. -For an even smaller first loop: - -```bash -python3 examples/runtime/01_sense_act_loop.py -``` - -## Start Here +## Three loops to start with | If you want to see | Run | What it teaches | | --- | --- | --- | | Failure recovery | `python3 examples/manipulation/01_pick_and_retry.py` | grasp miss -> belief update -> retry | -| Runtime safety | `python3 examples/navigation/29_safety_filter_cbf.py` | nominal controller -> CBF projection -> safe motion | -| Active perception | `python3 examples/navigation/07_active_slam_toy.py` | map and pose uncertainty -> information-seeking action | -| Shareable live trace | [Try live](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html?scenario=household&answer=red&compare=1&autoplay=1) | belief entropy, compare mode, and failure timeline | -| Human correction | [Open in Colab](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/human_correction_replanning.ipynb) | shortcut -> human correction -> cost update -> replan | -| Language ambiguity | [Open in Colab](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/clarifying_question.ipynb) | ambiguous command -> ask question -> answer -> act | -| Integrated household task | [Open in Colab](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/household_task_agent.ipynb) | clarify -> plan -> safety check -> retry -> human replan | +| Online replanning | `python3 examples/navigation/04_online_replanning_astar.py` | plan -> hit a hidden wall -> replan | +| Asking for help | `python3 examples/embodied_ai/35_clarifying_question.py "pick the block" --answer red` | ambiguous command -> ask -> act | + +Prefer the browser? Try the [live playground](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html) +(belief entropy, compare mode, failure timeline) — every run is a +[Shareable live trace](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html?scenario=household&answer=red&compare=1&autoplay=1) +you can link to — or open the flagship loops in Colab: +[pick and retry](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/pick_and_retry.ipynb), +[safety filter](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/safety_filter_cbf.ipynb), +[human correction replanning](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/human_correction_replanning.ipynb), +[clarifying question](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/clarifying_question.ipynb), or the integrated +[household task agent](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/household_task_agent.ipynb). + +If the project helps you teach, prototype, or explain robotics loops, a GitHub +star helps others find it. ## Status @@ -202,24 +220,10 @@ python scripts/run_all_smoke_tests.py --gifs --check-gifs CI runs the same smoke suite and GIF checks on Python 3.10, 3.11, and 3.12. -## Core idea - -```python -obs = env.reset(seed=0) -agent.reset() - -for t in range(max_steps): - action = agent.act(obs) - obs, reward, done, info = env.step(action) - agent.update(obs, reward, info) - env.render() - - if done: - break -``` +## Inspecting a run -The goal is not photorealism. -The goal is to understand the perception-action loop. +The goal is not photorealism. It is to understand the perception-action loop +shown at the top of this README — and to see the internal state that drives it. Every example returns a `Trace`, so headless runs can be inspected without rendering. See `docs/trace.md` for the full trace contract. diff --git a/docs/assets/gifs/naive_vs_failure_aware.gif b/docs/assets/gifs/naive_vs_failure_aware.gif new file mode 100644 index 0000000..e7cb423 Binary files /dev/null and b/docs/assets/gifs/naive_vs_failure_aware.gif differ diff --git a/docs/pyodide_playground_strategy.md b/docs/pyodide_playground_strategy.md new file mode 100644 index 0000000..6cda93e --- /dev/null +++ b/docs/pyodide_playground_strategy.md @@ -0,0 +1,125 @@ +# Design memo: running the real Python loops in the browser (Pyodide) + +Status: **proposal / not yet implemented.** This memo scopes follow-up item ② +from the launch plan — the highest-leverage growth hook GPT Pro flagged +("install-free / readable / no heavy stack" is what lands on Hacker News). + +## Why now + +The current [`docs/playground.js`](playground.js) is a ~1200-line **JavaScript +reimplementation** of just two scenarios (`clarifying`, `household`). It does not +run any of the repo's Python. Two consequences: + +1. **Drift risk.** The JS dynamics can silently disagree with the tested Python + examples; nothing keeps them in sync. +2. **Weak headline.** "An interactive web demo" is ordinary. "**The actual + `numpy` example you'd run locally, running in your browser, no install**" is + the share-worthy version — and it's true to the repo's whole premise. + +Pyodide ([pyodide.org](https://pyodide.org)) runs CPython + NumPy in WebAssembly +in the browser, so we can execute the real example code with zero install. + +## The key architecture decision: who renders? + +The examples use `numpy` for the loop and `matplotlib` for rendering. Matplotlib +*is* available in Pyodide, but it is a large download and slow to first paint. +We do not need it in the browser, because **we already have a renderer**: the +existing playground draws scenes and belief panels from plain data. + +**Recommended split — Python computes, JS draws:** + +``` +Pyodide (Python, numpy only) Browser (existing JS) + run(seed, render=False) -> Trace ──▶ trace JSON ──▶ current scene / + trace.summary() / per-step records (postMessage) belief / timeline + renderer +``` + +- The Python side runs the **real** `examples/.../*.py` loop headless + (`render=False`) and returns the `Trace` (already a first-class, tested object + — see [`docs/trace.md`](trace.md)). +- A thin serializer turns the `Trace` (obs/action/info/reward per step) into the + JSON shape the current `playground.js` renderer already consumes. +- The JS reimplementation of dynamics gets **deleted**; JS keeps only drawing. + +This keeps the browser bundle small (no matplotlib), makes the playground a true +mirror of the tested code, and removes the drift problem instead of adding to it. + +Fallback option (heavier): import `matplotlib` in Pyodide and blit Agg PNG frames +to a ``. Simpler to wire (reuses `env.render`), but a multi-MB download +and janky first paint. Keep this only as a stopgap for an example whose JSON +renderer is not ready yet. + +## Packaging + +The loops only need `pir` + `numpy`. Options, simplest first: + +1. **Load `pir` as source over the network.** `micropip` or Pyodide's + `loadPackage("numpy")` for numpy, then fetch the handful of `pir/**.py` files + (or a generated single-file bundle) and write them into Pyodide's virtual FS. + No build step; works from GitHub Pages. +2. **Ship a wheel.** `python -m build`, host `pir-0.1.0-py3-none-any.whl` under + `docs/`, `micropip.install("./pir-...whl")`. Cleaner import story; adds a + release artifact to keep current. + +Start with (1) for the first flagship, move to (2) if import wiring gets noisy. +`numpy` has a prebuilt Pyodide package, so no compilation is needed. + +## The 5 flagship loops (and their render shape) + +| Example | Renderer needed | Notes | +| --- | --- | --- | +| `manipulation/01_pick_and_retry` | tabletop (continuous) | best first target; the hero story, small state | +| `navigation/04_online_replanning_astar` | grid + path | reuses grid renderer; shows replanning | +| `navigation/29_safety_filter_cbf` | continuous + obstacles | needs vector overlay (nominal vs safe u) | +| `navigation/07_active_slam_toy` | grid + belief heatmap | reuses belief panel | +| `embodied_ai/35_clarifying_question` | already in JS today | swap JS dynamics for real Python first | + +Two render families cover all five: **grid** (already drawn today) and +**tabletop/continuous** (small addition). Build those two renderers once. + +## Phased plan + +**Phase 0 — proof of concept (½ day).** A standalone `docs/pyodide_poc.html` +that loads Pyodide, installs numpy, fetches `pir` + `pick_and_retry.py`, runs +`run(seed=0, render=False)`, and `console.log`s `trace.summary()`. Goal: confirm +load time and that the real loop runs unmodified. Decide packaging (1) vs (2). + +**Phase 1 — one real loop on the page (1–2 days).** Add a "Run real Python" +toggle to the existing playground for `clarifying_question` (its renderer already +exists). Python produces the trace; JS draws it; delete the JS dynamics for that +scenario. This is the first honest "real Python in your browser" claim. + +**Phase 2 — tabletop renderer + hero loop (1–2 days).** Add the continuous +tabletop renderer and wire `pick_and_retry`. Now the README hero GIF has a +"run it yourself" twin. + +**Phase 3 — editable code cell (1–2 days).** Expose the agent's `act()` in a +small editor so visitors can tweak the retry/belief logic and re-run. This is +the "wow, I can edit the robot's brain in the browser" moment that converts to +stars. + +Ship Phase 0–1 behind the existing playground before any Hacker News launch; +Phases 2–3 can follow the launch. + +## Risks / watch-list + +- **First-load latency.** Pyodide core is a few MB. Lazy-load it only when the + user clicks "Run real Python"; keep the instant JS-rendered preview as the + default first paint. Cache aggressively. +- **No silent matplotlib import.** If any example imports matplotlib at module + top level, the headless path drags it in. Keep example imports of matplotlib + lazy (inside `render`/`main`), as `01_pick_and_retry.py` already does. +- **Trace serialization is the contract.** Add a `tests/` check that the JSON + serializer covers every field the JS renderer reads, so Python and browser + cannot drift — this is the guard that makes Pyodide *reduce* drift rather than + add a new surface. +- **Keep it optional.** Pyodide is a `docs/` concern only. It must never become + a core dependency or touch the 5-second local first-run. + +## Definition of done (item ②) + +- One flagship loop runs its **unmodified** Python `run(...)` in the browser. +- The JS reimplementation of that scenario's dynamics is deleted. +- First paint stays instant (Pyodide lazy-loaded on demand). +- A test pins the trace-JSON contract shared by Python and JS. diff --git a/docs/release_notes_v0.1.0.md b/docs/release_notes_v0.1.0.md new file mode 100644 index 0000000..6b712f1 --- /dev/null +++ b/docs/release_notes_v0.1.0.md @@ -0,0 +1,106 @@ +# Release draft — v0.1.0 "Tiny Robot Failure Lab" + +> Draft for the first public GitHub Release. Paste the body below into the +> release on the `v0.1.0` tag. Keep it a story, not a changelog. Cut anything +> that reads like an internal status report. + +**Tag:** `v0.1.0`  ·  **Title:** `v0.1.0 — Tiny Robot Failure Lab` + +--- + +## Release body (copy from here) + +### Robotics tutorials usually assume actions succeed. This one is about what happens when they don't. + +Real robots miss grasps, drive into walls they couldn't see, lose track of where +they are, and misread ambiguous commands. PythonInteractiveRobotics is a tiny +lab for exactly that part of robotics — **observe, act, fail, update your +belief, replan, retry** — in readable Python with no ROS, no GPU, and no +simulator. Just `numpy + matplotlib`. + +**What's in this first release** + +A failure-first course of **10 short, runnable loops**, in order, plus 39 total +examples to branch into. Start here: + +```bash +git clone https://github.com/rsasaki0109/PythonInteractiveRobotics.git +cd PythonInteractiveRobotics +python3 -m pip install -e . +python3 examples/manipulation/01_pick_and_retry.py +``` + +A tabletop robot misses a grasp, updates its belief, and retries — in under 5 +seconds. + +**Try it without installing** + +- ▶ **Run in your browser:** the [live playground](https://rsasaki0109.github.io/PythonInteractiveRobotics/playground.html) + with belief entropy, compare mode, and a failure timeline. +- 📓 **Open in Colab:** [pick and retry](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/pick_and_retry.ipynb), + [safety filter](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/safety_filter_cbf.ipynb), + [household task agent](https://colab.research.google.com/github/rsasaki0109/PythonInteractiveRobotics/blob/main/notebooks/household_task_agent.ipynb). +- 🎓 **Take the tour:** the [10-lesson failure-first course](https://github.com/rsasaki0109/PythonInteractiveRobotics/blob/main/lessons/README.md). + +**Highlights** + +- **Fail and retry** — grasp miss → belief update → retry (`manipulation/01`) +- **Replan around a hidden wall** — plan → see new obstacle → replan (`navigation/04`) +- **Act to learn** — toy active SLAM that moves to shrink uncertainty (`navigation/07`) +- **Stay safe at runtime** — a CBF safety filter the policy never knows about (`navigation/29`) +- **Ask before acting** — ambiguous command → clarify → act (`embodied_ai/35`) +- **Put it together** — a household agent that clarifies, plans, stays safe, + retries, and replans in one run (`embodied_ai/36`) + +Every example exposes failure through `info["failure"]` and returns an +inspectable `Trace`, so you can study a run headless without rendering. + +**Under the hood** + +- 39 runnable examples · 38 generated GIFs · 5 Colab notebooks +- 111 smoke / regression tests · CI green on Python 3.10, 3.11, and 3.12 +- Core deps: `numpy` + `matplotlib` only; optional Gymnasium-style adapters and + ROS2 / simulator bridge docs for when you outgrow the toy worlds + +**Where this sits** + +Not a replacement for ROS2, MoveIt, MuJoCo, Isaac Sim, or LeRobot — and not a +benchmark. Think of it as the missing closed-loop chapter you read *after* +algorithm textbooks like PythonRobotics and *before* a heavy stack: a small, +debuggable model of failure, belief, recovery, and replanning. + +**Contribute** + +Good first contributions are deliberately small: + +- add a new **failure mode** to an existing world +- add a **one-file lesson** in the failure-first style +- improve a **trace story / GIF** + +See [`CONTRIBUTING.md`](https://github.com/rsasaki0109/PythonInteractiveRobotics/blob/main/CONTRIBUTING.md). +If this helped you learn, teach, or prototype, a ⭐ helps others find it. + +--- + +## Pre-publish checklist + +Before tagging `v0.1.0`, confirm the launch surface is ready (see also +`docs/public_launch.md`): + +- [ ] README top renders the hero GIF, the 8-line loop, and the browser link in + the first screen +- [ ] `lessons/README.md` links resolve on GitHub and all 10 commands run +- [ ] CI green on `main` for 3.10 / 3.11 / 3.12 +- [ ] GitHub repo "social preview" image set (Settings → General → Social preview) +- [ ] Playground page loads from GitHub Pages +- [ ] Tag the release: `git tag -a v0.1.0 -m "Tiny Robot Failure Lab" && git push origin v0.1.0` + +## How to cut the release + +```bash +# from a clean main with everything pushed +git tag -a v0.1.0 -m "Tiny Robot Failure Lab" +git push origin v0.1.0 +# then create the GitHub Release on that tag and paste the body above, +# or: gh release create v0.1.0 --title "v0.1.0 — Tiny Robot Failure Lab" --notes-file <(...) +``` diff --git a/lessons/README.md b/lessons/README.md new file mode 100644 index 0000000..d9c9fde --- /dev/null +++ b/lessons/README.md @@ -0,0 +1,211 @@ +# Lessons: a failure-first tour of robot loops + +Most robotics tutorials stop at `action = plan(observation)` and assume the +action works. Real robots miss grasps, hit walls they could not see, lose track +of where they are, and misread ambiguous commands. **This course is ten short +loops, in order, about what a robot does _after_ something goes wrong.** + +Each lesson is one runnable example you already have in this repo — no new +setup. Read the file top to bottom (every one fits on a screen or two), run it, +and watch the internal state change. Do them in order: each lesson assumes the +one before it. + +```bash +python3 -m pip install -e . +``` + +| # | Lesson | Run | The loop it teaches | +| --: | --- | --- | --- | +| 1 | The bare loop | `python3 examples/runtime/01_sense_act_loop.py` | observe → act → observe | +| 2 | Fail and retry | `python3 examples/manipulation/01_pick_and_retry.py` | grasp miss → belief update → retry | +| 3 | Replan around a hidden wall | `python3 examples/navigation/04_online_replanning_astar.py` | plan → see new obstacle → replan | +| 4 | Recover from a blocked path | `python3 examples/navigation/09_blocked_path_recovery.py` | step into a dead end → back off → replan | +| 5 | Act on a belief, not the truth | `python3 examples/navigation/06_belief_based_navigation.py` | estimate pose → decide under uncertainty | +| 6 | Act to learn (active perception) | `python3 examples/navigation/07_active_slam_toy.py` | pick moves that shrink uncertainty | +| 7 | Stay safe at runtime | `python3 examples/navigation/29_safety_filter_cbf.py` | unsafe command → safety filter → safe motion | +| 8 | Know when you don't know | `python3 examples/manipulation/30_conformal_ask_for_help.py` | ambiguous → ask for help instead of guessing | +| 9 | Ask before acting | `python3 examples/embodied_ai/35_clarifying_question.py "pick the block" --answer red` | ambiguous command → clarify → act | +| 10 | Put it all together | `python3 examples/embodied_ai/36_household_task_agent.py "put the block away" --answer red` | clarify → plan → stay safe → retry → replan | + +Most examples accept `--no-render` for a fast headless run, and every example +returns a `Trace` you can inspect (see [`docs/trace.md`](../docs/trace.md)). + +--- + +## Lesson 1 — The bare loop + +**Run:** `python3 examples/runtime/01_sense_act_loop.py` +([source](../examples/runtime/01_sense_act_loop.py)) + +The smallest closed-loop example: a robot observes a noisy state, acts, and +observes again. Nothing fails yet — this is the skeleton every later lesson +hangs on. + +- **What it observes:** a noisy estimate of its own state, every step. +- **What changes:** nothing surprising — that's the point. Establish the + `observe → act → observe` rhythm before failure enters. +- **Concept:** the perception–action loop is the unit of robotics, not a single + planning call. + +→ Next: a loop where the action can fail. + +## Lesson 2 — Fail and retry + +**Run:** `python3 examples/manipulation/01_pick_and_retry.py` +([source](../examples/manipulation/01_pick_and_retry.py)) + +A tabletop robot picks an object, **misses**, updates its belief about where the +object really is, and retries with a different grasp. This is the thesis of the +whole repo in one file. + +- **What fails:** the grasp, because the perceived object pose is biased. +- **What it observes:** `info["failure"]` reports the miss; each attempt yields a + fresh, noisy detection. +- **How belief changes:** a running estimate of the object pose is updated after + every miss, so the next attempt aims somewhere new. +- **Concept:** failure is data. The interesting behaviour is the belief update + between attempts. + +→ Next: the same idea, but the failure is a wall instead of a grasp. + +## Lesson 3 — Replan around a hidden wall + +**Run:** `python3 examples/navigation/04_online_replanning_astar.py` +([source](../examples/navigation/04_online_replanning_astar.py)) + +The robot plans a path with A* through a map it only partially knows, drives +into a previously unknown wall, and **replans** from what it just learned. + +- **What fails:** the current plan, when an obstacle is observed that the map did + not contain. +- **What it observes:** new occupied cells as it moves. +- **How belief changes:** the occupancy map is updated, then A* is re-run on the + new map. +- **Concept:** a plan is a hypothesis. Replanning is how a robot survives an + imperfect map. + +→ Next: what to do when replanning alone isn't enough and you're stuck. + +## Lesson 4 — Recover from a blocked path + +**Run:** `python3 examples/navigation/09_blocked_path_recovery.py` +([source](../examples/navigation/09_blocked_path_recovery.py)) + +The robot detects that the path ahead is blocked, **backs off**, marks the +blocked cell, and replans around it — recovery, not just replanning in place. + +- **What fails:** forward progress; the robot would otherwise be wedged. +- **What it observes:** a blocked cell directly ahead. +- **How belief changes:** the blocked cell is committed to the map so the + replan does not re-suggest it. +- **Concept:** recovery is an explicit behaviour — detect, undo, record, retry. + +→ Next: stop assuming you even know where you are. + +## Lesson 5 — Act on a belief, not the truth + +**Run:** `python3 examples/navigation/06_belief_based_navigation.py` +([source](../examples/navigation/06_belief_based_navigation.py)) + +The robot no longer has direct access to its true pose. It maintains a belief +(a heatmap), estimates where it probably is, and navigates from that estimate. + +- **What it observes:** noisy, partial cues about its position. +- **How belief changes:** a belief distribution over poses is updated each step; + the chosen action depends on the estimate, not the ground truth. +- **Concept:** under partial observability you act on a belief. The true state is + shown only so you can see how wrong the belief sometimes is. + +→ Next: choose actions specifically to make that belief sharper. + +## Lesson 6 — Act to learn (active perception) + +**Run:** `python3 examples/navigation/07_active_slam_toy.py` +([source](../examples/navigation/07_active_slam_toy.py)) + +A toy active-SLAM agent picks moves that **reduce** both pose and map +uncertainty — it acts in order to perceive better, not just to reach a goal. + +- **What it observes:** measurements whose value depends on where it chooses to go. +- **How belief changes:** moves are scored by expected uncertainty (entropy) + reduction, so the agent seeks informative actions. +- **Concept:** perception is something you can plan for. Information is a reward. + +→ Next: a hard runtime guarantee layered on top of any policy. + +## Lesson 7 — Stay safe at runtime + +**Run:** `python3 examples/navigation/29_safety_filter_cbf.py` +([source](../examples/navigation/29_safety_filter_cbf.py)) + +A naive go-to-goal controller produces velocities that would hit obstacles. A +**control-barrier-function** safety filter projects each command onto a safe +set — the policy never even knows the obstacles exist. + +- **What fails (and is prevented):** collisions the nominal controller would + cause. +- **What it observes:** obstacle geometry, used by the filter, not the policy. +- **How behaviour changes:** every command is minimally adjusted to stay safe. +- **Concept:** safety can be a separate runtime layer, independent of how smart + the policy is. + +→ Next: instead of guarding a confident policy, handle an unconfident one. + +## Lesson 8 — Know when you don't know + +**Run:** `python3 examples/manipulation/30_conformal_ask_for_help.py` +([source](../examples/manipulation/30_conformal_ask_for_help.py)) + +A sorter calibrates a **conformal prediction set** offline. At runtime it acts +only when the set is a single confident label, and **asks a toy oracle for help** +when the set is ambiguous. + +- **What fails (and is avoided):** confident-but-wrong actions on ambiguous input. +- **What it observes:** a prediction set whose size reflects uncertainty. +- **How behaviour changes:** singleton → act; ambiguous → ask. +- **Concept:** calibrated uncertainty turns "I'm not sure" into a concrete + request for help. + +→ Next: the help request becomes a natural-language question. + +## Lesson 9 — Ask before acting + +**Run:** `python3 examples/embodied_ai/35_clarifying_question.py "pick the block" --answer red` +([source](../examples/embodied_ai/35_clarifying_question.py)) + +Given the ambiguous command *"pick the block"* with several blocks present, the +robot **asks which one**, takes the answer, resolves the goal, and acts. + +- **What fails (and is avoided):** acting on an under-specified command. +- **What it observes:** multiple candidate objects matching the command. +- **How behaviour changes:** detect ambiguity → ask → bind the answer → act. +- **Concept:** language is just another noisy observation; clarification is a + belief update driven by a human. + +→ Next: combine clarify, plan, safety, retry, and replan in one agent. + +## Lesson 10 — Put it all together + +**Run:** `python3 examples/embodied_ai/36_household_task_agent.py "put the block away" --answer red` +([source](../examples/embodied_ai/36_household_task_agent.py)) + +The capstone. A household robot **clarifies** an ambiguous command, **plans** +through a room, **rejects an unsafe step**, **retries** a missed grasp, accepts a +**human correction** and **replans**, then stores the block. + +- **What fails:** ambiguity, an unsafe floor step, and a grasp miss — all in one + run. +- **How behaviour changes:** every earlier lesson shows up as one stage of this + pipeline. +- **Concept:** "embodied intelligence" here is not one big model — it is these + small failure-aware loops, composed. + +→ Done. From here, branch into the full +[example index](../examples/README.md) or the deeper +[learning paths](../docs/learning_paths.md). + +--- + +Found a failure mode these lessons don't cover yet? That's a great +contribution — see [`CONTRIBUTING.md`](../CONTRIBUTING.md). If this course helped +you learn or teach, a GitHub star helps other people find it. diff --git a/scripts/make_hero_compare_gif.py b/scripts/make_hero_compare_gif.py new file mode 100644 index 0000000..af15f93 --- /dev/null +++ b/scripts/make_hero_compare_gif.py @@ -0,0 +1,266 @@ +"""Generate the README hero GIF: a naive picker vs a failure-aware picker. + +Left panel: a naive agent that grabs at whatever it currently sees, never + moves the camera (so it stays behind the occluder), never keeps a + belief, and never adapts its retry. It keeps missing. +Right panel: the repo's PickAndRetryAgent, which looks from better viewpoints, + averages noisy detections into a belief, and retries differently + after each miss. It recovers and succeeds. + +Both panels run the same Tabletop2D world with the same seed, so the contrast is +the policy, not luck. This is a curated marketing asset, kept separate from the +per-example pipeline in scripts/make_gifs.py. + +Usage: + python3 scripts/make_hero_compare_gif.py + python3 scripts/make_hero_compare_gif.py --search # scan seeds for a clean story +""" + +from __future__ import annotations + +import argparse +import importlib.util +import sys +from pathlib import Path +from types import ModuleType +from typing import Any + +import numpy as np + +ROOT = Path(__file__).resolve().parents[1] +if str(ROOT) not in sys.path: + sys.path.insert(0, str(ROOT)) + +import imageio.v2 as imageio +import matplotlib + +matplotlib.use("Agg") + +import matplotlib.pyplot as plt +from matplotlib.patches import Circle, Rectangle + +from pir.core.types import Failure +from pir.worlds.tabletop_2d import Tabletop2D + +OUT_DIR = ROOT / "docs" / "assets" / "gifs" +GIF_NAME = "naive_vs_failure_aware.gif" + + +def load_example(relative_path: str) -> ModuleType: + path = ROOT / relative_path + spec = importlib.util.spec_from_file_location(path.stem, path) + if spec is None or spec.loader is None: + raise RuntimeError(f"could not load {path}") + module = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = module + spec.loader.exec_module(module) + return module + + +class NaiveAgent: + """The "no belief update" baseline. + + It locks onto the very first detection it ever sees and keeps grabbing at + that same stale point forever. It never re-observes, never averages, never + moves the camera, and never adapts after a miss. The contrast with the + failure-aware agent is therefore structural (update vs. don't update), not + a matter of luck on a particular seed. + """ + + def __init__(self) -> None: + self.reset() + + def reset(self) -> None: + # Keep a belief_mean attribute so the renderer can stay uniform, but the + # naive agent never maintains a real belief. + self.belief_mean: np.ndarray | None = None + self.belief_radius = 0.0 + self._locked_target: np.ndarray | None = None + + def act(self, obs: dict[str, Any]) -> dict[str, Any]: + if self._locked_target is None: + detections = obs.get("detections", []) + if detections: + self._locked_target = np.asarray(detections[0]["position"], dtype=float) + else: + # Nothing seen yet: commit to a blind guess and never revise it. + self._locked_target = np.array([0.5, 0.5], dtype=float) + return {"type": "pick", "position": np.clip(self._locked_target, 0.0, 1.0)} + + def update(self, obs: dict[str, Any], reward: float, info: dict[str, Any]) -> None: + # The whole point: the naive agent learns nothing from a miss. + return None + + +def fig_to_frame(fig: plt.Figure) -> np.ndarray: + fig.canvas.draw() + width, height = fig.canvas.get_width_height() + buffer = np.frombuffer(fig.canvas.buffer_rgba(), dtype=np.uint8) + return buffer.reshape((height, width, 4))[:, :, :3].copy() + + +def run_episode(agent_factory, seed: int, max_steps: int) -> list[dict[str, Any]]: + """Run one episode and return a per-frame record of (env snapshot, info).""" + env = Tabletop2D(seed=seed) + agent = agent_factory() + obs = env.reset(seed=seed) + agent.reset() + + records: list[dict[str, Any]] = [{"env": env, "agent": agent, "info": {}}] + for _ in range(max_steps): + action = agent.act(obs) + result = env.step(action) + obs, reward, done, info = result.as_tuple() + agent.update(obs, reward, info) + # Snapshot the mutable state we need for rendering this frame. + records.append( + { + "camera": env.camera_pos.copy(), + "last_detection": None if env.last_detection is None else env.last_detection.copy(), + "picked": env.obj.picked, + "attempts": env.attempts, + "belief_mean": None if getattr(agent, "belief_mean", None) is None else np.asarray(agent.belief_mean).copy(), + "belief_radius": float(getattr(agent, "belief_radius", 0.0)), + "info": info, + } + ) + if done: + break + return records + + +def episode_outcome(records: list[dict[str, Any]]) -> tuple[bool, int]: + picked = any(r.get("picked") for r in records[1:]) + attempts = max((r.get("attempts", 0) for r in records[1:]), default=0) + return picked, attempts + + +def build_frames(seed: int, max_steps: int) -> list[np.ndarray]: + naive_module_agent = NaiveAgent + pick_module = load_example("examples/manipulation/01_pick_and_retry.py") + + naive = run_episode(lambda: naive_module_agent(), seed=seed, max_steps=max_steps) + smart = run_episode(lambda: pick_module.PickAndRetryAgent(), seed=seed, max_steps=max_steps) + + n = max(len(naive), len(smart)) + + frames: list[np.ndarray] = [] + for i in range(n): + ln = naive[min(i, len(naive) - 1)] + ls = smart[min(i, len(smart) - 1)] + + fig, (axl, axr) = plt.subplots(1, 2, figsize=(8.6, 4.5), dpi=90) + fig.suptitle( + "Most robotics tutorials assume the grasp works. Real robots miss — and recover.", + fontsize=11, + ) + + for ax, rec, title, color in ( + (axl, ln, "Naive: grab at what you see", "tab:red"), + (axr, ls, "Failure-aware: look → update belief → retry", "tab:green"), + ): + _render_record(ax, rec, title, color) + + fig.tight_layout(rect=(0, 0, 1, 0.94)) + frames.append(fig_to_frame(fig)) + plt.close(fig) + + # Hold the final frame so the outcome reads clearly. + frames.extend([frames[-1]] * 4) + return frames + + +# A lightweight stand-in Tabletop2D geometry shared by both panels. +_OCCLUDER = np.array([0.43, 0.42, 0.57, 0.68], dtype=float) +_OBJ_POS = np.array([0.64, 0.54], dtype=float) +_OBJ_RADIUS = 0.045 + + +def _render_record(ax: plt.Axes, rec: dict[str, Any], title: str, color: str) -> None: + ax.set_title(title, fontsize=10.5, color=color, fontweight="bold") + ax.set_xlim(0.0, 1.0) + ax.set_ylim(0.0, 1.0) + ax.set_aspect("equal", adjustable="box") + ax.grid(True, alpha=0.25) + ax.tick_params(labelsize=7) + + xmin, ymin, xmax, ymax = _OCCLUDER + ax.add_patch(Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, color="0.2", alpha=0.18)) + + camera = rec.get("camera", np.array([0.16, 0.50])) + ax.plot(*camera, marker="s", color="tab:blue", markersize=9) + + picked = rec.get("picked", False) + if not picked: + ax.add_patch(Circle(_OBJ_POS, _OBJ_RADIUS, color="tab:red", alpha=0.85)) + + last_detection = rec.get("last_detection") + if last_detection is not None and not picked: + ax.plot(*last_detection, marker="x", markersize=9, color="tab:orange") + + belief_mean = rec.get("belief_mean") + if belief_mean is not None: + ax.add_patch( + Circle( + belief_mean, + rec.get("belief_radius", 0.08), + fill=False, + linestyle="--", + color="tab:green", + linewidth=2, + ) + ) + + info = rec.get("info", {}) + if "pick_position" in info: + ax.plot(*info["pick_position"], marker="+", markersize=14, color="black") + + status = f"attempts={rec.get('attempts', 0)}" + if picked: + status += " PICKED ✓" + elif isinstance(info.get("failure"), Failure): + status += f" {info['failure'].kind}" + ax.text( + 0.02, + 0.97, + status, + transform=ax.transAxes, + va="top", + fontsize=9, + bbox=dict(boxstyle="round", facecolor="white", edgecolor="0.7", alpha=0.85), + ) + + +def search_seeds(max_steps: int, limit: int = 60) -> None: + pick_module = load_example("examples/manipulation/01_pick_and_retry.py") + print("seed naive(picked,attempts) smart(picked,attempts)") + for seed in range(limit): + naive = run_episode(lambda: NaiveAgent(), seed=seed, max_steps=max_steps) + smart = run_episode(lambda: pick_module.PickAndRetryAgent(), seed=seed, max_steps=max_steps) + np_ = episode_outcome(naive) + sp_ = episode_outcome(smart) + flag = " <== clean story" if (not np_[0] and sp_[0]) else "" + print(f"{seed:>4} {np_} {sp_}{flag}") + + +def main() -> None: + parser = argparse.ArgumentParser() + parser.add_argument("--seed", type=int, default=3) + parser.add_argument("--max-steps", type=int, default=10) + parser.add_argument("--fps", type=int, default=2) + parser.add_argument("--search", action="store_true", help="scan seeds and exit") + args = parser.parse_args() + + if args.search: + search_seeds(args.max_steps) + return + + frames = build_frames(args.seed, args.max_steps) + OUT_DIR.mkdir(parents=True, exist_ok=True) + out = OUT_DIR / GIF_NAME + imageio.mimsave(out, frames, duration=1.0 / args.fps, loop=0) + print(f"wrote {out} ({len(frames)} frames)") + + +if __name__ == "__main__": + main()