From 78e8ba71dd671a70b5dad4f365de07c00f06499f Mon Sep 17 00:00:00 2001
From: Hyungtae Lim <shapelim@mit.edu>
Date: Thu, 21 May 2026 15:07:43 +0900
Subject: [PATCH 1/2] docs(usage): add Official Benchmarks section to USAGE.md

Quantitative results for both methods (`--method patchwork` and
`--method patchworkpp`) under both evaluation protocols, all on KITTI
00-10 (23,201 frames, macro average). Includes the paper Table I rows
and the original-repo reference numbers side-by-side so users can
spot protocol-mismatch issues at a glance.

Docs-only change; no version bump.
---
 USAGE.md | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/USAGE.md b/USAGE.md
index 33ee785..b3e8774 100644
--- a/USAGE.md
+++ b/USAGE.md
@@ -143,7 +143,59 @@ python python/examples/evaluate_semantickitti.py \
     --output_csv summary_patchwork.csv
 ```
 
-`--method patchwork` will be paper-faithful after the fixes on this branch (#89) land — until then it is ~2.3 F1 below the original Patchwork on the same protocol.
+`--method patchwork` is paper-faithful since v1.3.0 (see #89 / #90 for the fixes).
+
+______________________________________________________________________
+
+## 4. Official benchmarks
+
+KITTI 00-10 full sweep, **23,201 frames**, macro-average across the eleven sequences. All numbers are produced by `python/examples/evaluate_semantickitti.py` on current `master` (v1.3.1) with paper-matched parameters (the script already sets `uprightness_thr=0.707` and `using_global_thr=false` for `--method patchwork`; `--method patchworkpp` uses library defaults).
+
+### `--eval_protocol patchworkpp`  (Patchwork++ paper Sec. IV.A — VEGETATION excluded)
+
+| Method                                                                | Precision | Recall    | F1        |
+| --------------------------------------------------------------------- | --------- | --------- | --------- |
+| **`--method patchwork`** (this repo, classic Patchwork)               | 94.64     | 97.58     | 96.02     |
+| **`--method patchworkpp`** (this repo, Patchwork++)                   | **95.55** | **97.16** | **96.29** |
+| Patchwork \[1\] — as reported in Patchwork++ paper Table I            | 94.23     | 97.62     | 95.88     |
+| Patchwork++ — as reported in Patchwork++ paper Table I                | 94.92     | 98.18     | 96.51     |
+| `url-kaist/patchwork` (original ROS 2) — independent reference number | 94.38     | 97.90     | 96.05     |
+
+This is the protocol you want for **reproducing the Patchwork++ paper**.
+
+### `--eval_protocol patchwork`  (original Patchwork repo — VEGETATION-low-z counts as ground)
+
+| Method                                                                | Precision | Recall    | F1        |
+| --------------------------------------------------------------------- | --------- | --------- | --------- |
+| **`--method patchwork`** (this repo, classic Patchwork)               | 92.77     | 93.66     | 93.08     |
+| **`--method patchworkpp`** (this repo, Patchwork++)                   | 93.72     | 92.33     | 92.87     |
+| Patchwork \[1\] — as reported in original Patchwork paper Table I     | 92.47     | 93.43     | 93.00     |
+| `url-kaist/patchwork` (original ROS 2) — independent reference number | 91.94     | 94.22     | 92.94     |
+
+This is the protocol you want for **apples-to-apples comparisons against the original Patchwork paper / `url-kaist/patchwork` repo**.
+
+### Reading the table
+
+- Under the Patchwork++ paper protocol, both methods match their respective paper rows within run-to-run variance (±0.2 F1).
+- Patchwork++ beats Patchwork on precision and F1 (the headline claim of the paper). Patchwork has marginally higher recall.
+- Switching protocol moves both methods by ~3 F1 in the same direction; **never compare numbers across protocols**.
+- The numbers in this table are what you should see on your machine. If your F1 is more than ~0.5 off, the most common cause is the evaluation-protocol mismatch (see [§1](#1-evaluation-protocols)), followed by `sensor_height` being wrong for your sensor (see [§2](#2-parameter-tuning) Step 1).
+
+### Reproducing any row
+
+```bash
+# Patchwork++, paper protocol — top-line headline number
+python python/examples/evaluate_semantickitti.py \
+    --method patchworkpp --eval_protocol patchworkpp \
+    --dataset_path /path/to/SemanticKITTI/sequences
+
+# Classic Patchwork, paper protocol — apples-to-apples vs. Patchwork++
+python python/examples/evaluate_semantickitti.py \
+    --method patchwork --eval_protocol patchworkpp \
+    --dataset_path /path/to/SemanticKITTI/sequences
+
+# Either method under the original Patchwork-paper protocol — swap `--eval_protocol patchwork`
+```
 
 ______________________________________________________________________
 

From 57ebc28dd1129d67dada925cd2065586e56e1a45 Mon Sep 17 00:00:00 2001
From: Hyungtae Lim <shapelim@mit.edu>
Date: Thu, 21 May 2026 15:19:34 +0900
Subject: [PATCH 2/2] =?UTF-8?q?docs(usage):=20add=20per-sequence=20perform?=
 =?UTF-8?q?ance=20section=20(=C2=A75)=20to=20USAGE.md?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Four tables (method × protocol) with the v1.3.1 KITTI 00-10
per-sequence Precision / Recall / F1, plus per-seq tips on which
sequences are hard and why. Lets users localise regressions to
specific sequences instead of only comparing the macro average.

Docs-only.
---
 USAGE.md | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/USAGE.md b/USAGE.md
index b3e8774..da092c0 100644
--- a/USAGE.md
+++ b/USAGE.md
@@ -199,6 +199,86 @@ python python/examples/evaluate_semantickitti.py \
 
 ______________________________________________________________________
 
+## 5. Per-sequence performance
+
+All numbers below are produced by `python/examples/evaluate_semantickitti.py` on v1.3.1 (current `master`), KITTI 00-10, paper-matched parameters. Use them to debug per-sequence regressions: if seq 05 looks fine but seq 10 is 3 F1 below the table, you have a parameter problem, not a code problem.
+
+### `--method patchworkpp --eval_protocol patchworkpp`  (headline configuration, matches Patchwork++ paper)
+
+| seq     | frames    | Precision | Recall    | F1        |
+| ------- | --------- | --------- | --------- | --------- |
+| 00      | 4541      | 94.88     | 98.47     | 96.62     |
+| 01      | 1101      | 98.43     | 96.36     | 97.34     |
+| 02      | 4661      | 95.63     | 97.18     | 96.35     |
+| 03      | 801       | 96.72     | 97.73     | 97.21     |
+| 04      | 271       | 98.20     | 96.40     | 97.25     |
+| 05      | 2761      | 92.06     | 97.87     | 94.84     |
+| 06      | 1101      | 98.01     | 97.24     | 97.61     |
+| 07      | 1101      | 92.89     | 98.45     | 95.56     |
+| 08      | 4071      | 96.29     | 97.26     | 96.74     |
+| 09      | 1591      | 96.01     | 96.25     | 96.06     |
+| 10      | 1201      | 91.93     | 95.63     | 93.63     |
+| **Avg** | **23201** | **95.55** | **97.17** | **96.29** |
+
+### `--method patchwork --eval_protocol patchworkpp`  (classic Patchwork, paper protocol)
+
+| seq     | frames    | Precision | Recall    | F1        |
+| ------- | --------- | --------- | --------- | --------- |
+| 00      | 4541      | 93.61     | 98.97     | 96.19     |
+| 01      | 1101      | 97.47     | 96.80     | 97.09     |
+| 02      | 4661      | 95.26     | 97.11     | 96.11     |
+| 03      | 801       | 96.31     | 98.24     | 97.23     |
+| 04      | 271       | 98.15     | 97.96     | 98.04     |
+| 05      | 2761      | 90.32     | 98.53     | 94.19     |
+| 06      | 1101      | 97.32     | 98.45     | 97.88     |
+| 07      | 1101      | 91.19     | 98.71     | 94.76     |
+| 08      | 4071      | 95.52     | 98.16     | 96.79     |
+| 09      | 1591      | 95.29     | 96.63     | 95.87     |
+| 10      | 1201      | 90.65     | 93.86     | 92.04     |
+| **Avg** | **23201** | **94.64** | **97.58** | **96.02** |
+
+### `--method patchworkpp --eval_protocol patchwork`  (Patchwork++, original-Patchwork protocol)
+
+| seq     | frames    | Precision | Recall    | F1        |
+| ------- | --------- | --------- | --------- | --------- |
+| 00      | 4541      | 93.93     | 93.29     | 93.53     |
+| 01      | 1101      | 97.03     | 87.33     | 91.80     |
+| 02      | 4661      | 93.40     | 93.36     | 93.29     |
+| 03      | 801       | 90.74     | 93.21     | 91.83     |
+| 04      | 271       | 97.77     | 88.93     | 93.10     |
+| 05      | 2761      | 91.38     | 94.24     | 92.76     |
+| 06      | 1101      | 97.59     | 95.73     | 96.64     |
+| 07      | 1101      | 92.12     | 96.03     | 93.99     |
+| 08      | 4071      | 94.81     | 92.21     | 93.43     |
+| 09      | 1591      | 93.56     | 91.00     | 92.13     |
+| 10      | 1201      | 88.53     | 90.36     | 89.14     |
+| **Avg** | **23201** | **93.72** | **92.34** | **92.88** |
+
+### `--method patchwork --eval_protocol patchwork`  (classic Patchwork, original-Patchwork protocol)
+
+| seq     | frames    | Precision | Recall    | F1        |
+| ------- | --------- | --------- | --------- | --------- |
+| 00      | 4541      | 92.34     | 94.64     | 93.41     |
+| 01      | 1101      | 95.84     | 89.16     | 92.27     |
+| 02      | 4661      | 93.13     | 93.87     | 93.42     |
+| 03      | 801       | 90.26     | 95.74     | 92.77     |
+| 04      | 271       | 97.44     | 91.40     | 94.29     |
+| 05      | 2761      | 89.18     | 95.54     | 92.20     |
+| 06      | 1101      | 96.72     | 97.06     | 96.88     |
+| 07      | 1101      | 90.02     | 96.80     | 93.24     |
+| 08      | 4071      | 93.71     | 93.79     | 93.69     |
+| 09      | 1591      | 92.69     | 92.46     | 92.46     |
+| 10      | 1201      | 89.10     | 89.80     | 89.25     |
+| **Avg** | **23201** | **92.77** | **93.66** | **93.08** |
+
+### Per-sequence tips
+
+- **seq 05 and seq 10 are the hardest** under both protocols — undulating roads with steep cuts and rough shoulders. Recall stays high but precision drops by ~3-5 points vs. the easy seqs. Expected.
+- **seq 01 is a highway** with very planar ground and few non-ground structures — precision is highest (~97-98) and recall on the Patchwork-paper protocol is lowest (~87-89) because the high-z VEGETATION on highway shoulders gets rejected as non-ground.
+- **seq 04** has very few frames (271) so a small absolute number of errors moves the macro percentages noticeably — expect ±1 F1 noise on seq 04 alone across re-runs.
+
+______________________________________________________________________
+
 ## See also
 
 - [`python/examples/demo_visualize.py`](python/examples/demo_visualize.py) — single-frame visualisation.