Add a script to run PCA and clustering of our benchmarks by fitzgen · Pull Request #317 · bytecodealliance/sightglass

fitzgen · 2026-06-22T22:53:54Z

The methodology is based on "A Workload Characterization of the SPEC CPU2017 Benchmark Suite" by Limaye and
Adegbija.

Each metric is standardized (centered to mean 0, scaled to unit variance) and PCA is run on the resulting correlation matrix so that metrics measured on different scales contribute comparably. Benchmarks are then clustered by the Euclidean distance between their principal-component scores, as in the paper.

Finally, we recommend a subset of the suite. Each cluster is represented by its cheapest member (the benchmark that executes the fewest dynamic wasm instructions). Sweeping the number of clusters traces a Pareto trade-off between clustering error (SSE) and the cost of running the subset (its total dynamic instructions); the knee of that curve is the Pareto-optimal cluster size.

Fixes #98

The methodology is based on ["A Workload Characterization of the SPEC CPU2017 Benchmark Suite" by Limaye and Adegbija](https://tosiron.com/papers/2018/SPEC2017_ISPASS18.pdf). Each metric is standardized (centered to mean 0, scaled to unit variance) and PCA is run on the resulting correlation matrix so that metrics measured on different scales contribute comparably. Benchmarks are then clustered by the Euclidean distance between their principal-component scores, as in the paper. Finally, we recommend a subset of the suite. Each cluster is represented by its cheapest member (the benchmark that executes the fewest dynamic wasm instructions). Sweeping the number of clusters traces a Pareto trade-off between clustering error (SSE) and the cost of running the subset (its total dynamic instructions); the knee of that curve is the Pareto-optimal cluster size. Fixes bytecodealliance#98

fitzgen requested a review from cfallin June 22, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script to run PCA and clustering of our benchmarks#317

Add a script to run PCA and clustering of our benchmarks#317
fitzgen wants to merge 1 commit into
bytecodealliance:mainfrom
fitzgen:pca

fitzgen commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fitzgen commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant