Skip to content

Add a script to run PCA and clustering of our benchmarks#317

Open
fitzgen wants to merge 1 commit into
bytecodealliance:mainfrom
fitzgen:pca
Open

Add a script to run PCA and clustering of our benchmarks#317
fitzgen wants to merge 1 commit into
bytecodealliance:mainfrom
fitzgen:pca

Conversation

@fitzgen

@fitzgen fitzgen commented Jun 22, 2026

Copy link
Copy Markdown
Member

The methodology is based on "A Workload Characterization of the SPEC CPU2017 Benchmark Suite" by Limaye and
Adegbija
.

Each metric is standardized (centered to mean 0, scaled to unit variance) and PCA is run on the resulting correlation matrix so that metrics measured on different scales contribute comparably. Benchmarks are then clustered by the Euclidean distance between their principal-component scores, as in the paper.

Finally, we recommend a subset of the suite. Each cluster is represented by its cheapest member (the benchmark that executes the fewest dynamic wasm instructions). Sweeping the number of clusters traces a Pareto trade-off between clustering error (SSE) and the cost of running the subset (its total dynamic instructions); the knee of that curve is the Pareto-optimal cluster size.

Fixes #98

The methodology is based on ["A Workload Characterization of the SPEC CPU2017
Benchmark Suite" by Limaye and
Adegbija](https://tosiron.com/papers/2018/SPEC2017_ISPASS18.pdf).

Each metric is standardized (centered to mean 0, scaled to unit variance) and
PCA is run on the resulting correlation matrix so that metrics measured on
different scales contribute comparably. Benchmarks are then clustered by the
Euclidean distance between their principal-component scores, as in the paper.

Finally, we recommend a subset of the suite. Each cluster is represented by
its cheapest member (the benchmark that executes the fewest dynamic wasm
instructions). Sweeping the number of clusters traces a Pareto trade-off
between clustering error (SSE) and the cost of running the subset (its total
dynamic instructions); the knee of that curve is the Pareto-optimal cluster
size.

Fixes bytecodealliance#98
@fitzgen fitzgen requested a review from cfallin June 22, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sightglass-next: add principal component analysis (PCA)

1 participant