From 7d5251398a281469937d558c3eed854df2e38f2a Mon Sep 17 00:00:00 2001
From: Joana Maia <jmaia@contractor.linuxfoundation.org>
Date: Fri, 29 May 2026 17:02:49 +0100
Subject: [PATCH 1/5] docs: add criticality score proposal

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
---
 .../adr/0001-oss-packages-design-decisions.md | 162 ++++++++++++++++--
 1 file changed, 145 insertions(+), 17 deletions(-)

diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md
index 46daeddf89..32b0d96ff6 100644
--- a/docs/adr/0001-oss-packages-design-decisions.md
+++ b/docs/adr/0001-oss-packages-design-decisions.md
@@ -10,20 +10,21 @@ The oss-packages domain is being built inside CDP as a new, independent capabili
 
 ## Scope and current status
 
-| Decision area                                  | Status                                |
-| ---------------------------------------------- | ------------------------------------- |
-| Database placement                             | decided                               |
-| Worker architecture                            | decided                               |
-| Universe source and critical-package selection | decided (formula is a placeholder)    |
-| Write semantics across sub-workers             | decided                               |
-| Package → repository provenance                | decided                               |
-| OSV as canonical security source               | decided                               |
-| CVSS scoring strategy                          | decided (v4 numeric scoring deferred) |
-| `has_critical_vulnerability` semantics         | decided                               |
-| `advisory_affected_ranges` uniqueness scope    | decided                               |
-| Per-source ingestion strategies                | decided (Sonatype API access pending) |
-| deps.dev coverage and gaps                     | decided                               |
-| Downloads timeline by tier                     | decided                               |
+| Decision area                                  | Status                                              |
+| ---------------------------------------------- | --------------------------------------------------- |
+| Database placement                             | decided                                             |
+| Worker architecture                            | decided                                             |
+| Universe source and critical-package selection | decided                                             |
+| Criticality scoring methodology                | proposed (weights tunable; deps.dev ingestion vs in-memory pending) |
+| Write semantics across sub-workers             | decided                                             |
+| Package → repository provenance                | decided                                             |
+| OSV as canonical security source               | decided                                             |
+| CVSS scoring strategy                          | decided (v4 numeric scoring deferred)               |
+| `has_critical_vulnerability` semantics         | decided                                             |
+| `advisory_affected_ranges` uniqueness scope    | decided                                             |
+| Per-source ingestion strategies                | decided (Sonatype API access pending)               |
+| deps.dev coverage and gaps                     | decided                                             |
+| Downloads timeline by tier                     | decided                                             |
 
 ---
 
@@ -81,7 +82,7 @@ Tier 2 enriches a critical slice of the npm and Maven ecosystems — not the ful
 
 We use the [deps.dev BigQuery public datasets](https://deps.dev) — specifically `PackageVersionsLatest`, `DependentsLatest`, `PackageVersionToProjectLatest`, and `ProjectsLatest` — filtered to `System IN ('NPM', 'MAVEN')` as the universe input. The BigQuery data is exported to Parquet files and imported into `packages_universe` on a weekly cadence. A scoring + ranking job then promotes the top-N per ecosystem by setting `is_critical = true` and copying `criticality_score` onto the full `packages` table.
 
-The current scoring formula and per-ecosystem critical-package quotas are **not yet finalized** — both are still under discussion. The ranking function takes `critical_top_n_by_ecosystem` as a JSONB parameter and weights as numeric inputs, so thresholds and formula coefficients can be tuned at call time without a schema change.
+The scoring formula, per-ecosystem critical-package quotas, graph-signal inputs, and spotlight-override mechanism are defined in §Criticality scoring methodology below. The ranking function takes `critical_top_n_by_ecosystem` as a JSONB parameter and weights as numeric inputs, so thresholds and formula coefficients can be tuned at call time without a schema change.
 
 The BigQuery free tier is approximately 1 TiB/month. Column projection and `System` filtering are mandatory on every query; full-table scans will exhaust the quota.
 
@@ -89,6 +90,132 @@ The BigQuery free tier is approximately 1 TiB/month. Column projection and `Syst
 
 ---
 
+### Criticality scoring methodology
+
+The §Universe source section above establishes that `packages_universe` is the Tier 3 ranking workspace and that `rank_packages_universe()` produces the criticality scores. This section locks in **what goes into the score** — replacing the placeholder formula (`X * downloadsCount + Y * dependentCount`) with a defensible methodology that captures load-bearing upstream packages (the left-pad / XZ pattern), normalizes across ecosystems, and supports manual overrides for known-critical primitives.
+
+This is treated as a brand-new workstream: no reuse or extension of any existing in-flight criticality code. All code lives inside `services/apps/packages_worker/src/criticality/` following the §Worker architecture pattern (`activities.ts`, `workflows.ts`, `schedule.ts`, queries co-located in the same directory). No additions to `services/libs/data-access-layer` — consistent with how other sub-workers like `osv` and `enricher` keep DB access local to the worker.
+
+#### Inputs
+
+Five signals, all stored on `packages_universe`:
+
+| Signal                       | Existing? | Source                                                                  |
+| ---------------------------- | --------- | ----------------------------------------------------------------------- |
+| `downloads_last_30d`         | yes       | weekly downloads ingestion (registry APIs)                              |
+| `dependent_packages_count`   | yes       | deps.dev `DependentsLatest`                                             |
+| `dependent_repos_count`      | yes       | derived in Postgres from `package_repos`                                |
+| `transitive_dependent_count` | **new**   | computed in the criticality sub-worker (see Implementation note below) |
+| `centrality_score`           | **new**   | computed in the criticality sub-worker (PageRank, see below)            |
+
+Direct dependent counts capture popularity. Transitive dependent count and centrality capture **blast radius** — load-bearing upstream packages with few direct dependents but massive indirect reach (the left-pad / XZ class that direct counts alone miss).
+
+PageRank centrality is the primary blast-radius signal; transitive dependent count is stored as a sanity check / floor, not as an equal-weight input. The two are correlated — PageRank is a weighted refinement of transitive count, where a package's score depends recursively on the importance of who depends on it. Blending them as independent signals would double-count blast radius. Both columns are stored so weights can be tuned without rerunning the graph job.
+
+#### Scoring formula
+
+Per-ecosystem percentile-rank of each log-transformed signal, then weighted blend:
+
+```
+score =  w_downloads   * pct_rank( LN(1 + downloads_last_30d)         )   within ecosystem
+       + w_dep_pkgs    * pct_rank( LN(1 + dependent_packages_count)   )   within ecosystem
+       + w_dep_repos   * pct_rank( LN(1 + dependent_repos_count)      )   within ecosystem
+       + w_transitive  * pct_rank( LN(1 + transitive_dependent_count) )   within ecosystem
+       + w_centrality  * pct_rank( centrality_score                   )   within ecosystem
+```
+
+Weights sum to 1.0 → score ∈ `[0, 1]`. Centrality skips the `LN()` (PageRank is already in a small bounded range) but still passes through `pct_rank` so every signal lands on the same percentile scale. Starting weight bias: centrality dominant (PageRank is the primary blast-radius signal), transitive count low (kept as a sanity floor — see Inputs note on double-counting), direct dependents and downloads balanced as secondary popularity signals. All weights are call-time numeric parameters to `rank_packages_universe()` — tunable without schema or code changes.
+
+**Suggested starting weights** (use as the first call, then iterate):
+
+| Weight          | Value | Signal               | Rationale                                              |
+| --------------- | ----- | -------------------- | ------------------------------------------------------ |
+| `w_centrality`  | 0.40  | PageRank             | Primary blast-radius signal                            |
+| `w_transitive`  | 0.10  | Transitive dependents | Sanity floor; low to avoid double-counting centrality  |
+| `w_dep_pkgs`    | 0.20  | Direct dependent packages | Popularity within the package graph                |
+| `w_dep_repos`   | 0.15  | Direct dependent repos | Popularity across consumer codebases                  |
+| `w_downloads`   | 0.15  | 30-day downloads     | Adoption signal, lighter weight (noisy for new packages) |
+
+These are a starting point, not a recommendation we've validated. They will be revised once the first ranked list is observable and stakeholders review which packages land in / near Tier 1 — particularly for smaller ecosystems where the percentile distribution is less stable.
+
+**Why percentile-rank, not min-max:** even after log-transform, heavy-tailed signals retain extreme outliers that bend the min-max scale. Example — downloads `[10, 100, 1000, 10000, 1B]` log-transformed are `[2.4, 4.6, 6.9, 9.2, 20.7]`. Min-max on those gives `[0.00, 0.12, 0.25, 0.37, 1.00]` (four out of five squeezed below 0.4); percentile-rank gives uniform `[0.00, 0.25, 0.50, 0.75, 1.00]`, stable to outliers, and `0.5` means "median within ecosystem" regardless of which ecosystem.
+
+**Why per-ecosystem:** the percentile uses `PARTITION BY ecosystem` so ecosystems are never compared on the same absolute scale. A top-percentile crates package is strategically important; without per-ecosystem partitioning it would be buried by npm's volume.
+
+#### Per-ecosystem tier budgets
+
+`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N. Tier 2 reuses the same shape with a second JSONB parameter and a separate `is_tier2` column on `packages_universe`. Tier 1 ⊆ Tier 2 falls out naturally from the same ranking.
+
+Allocation policy is **floor + ceiling + judgment**: every onboarded ecosystem gets a minimum (the floor — guarantees representation regardless of size), no single ecosystem exceeds a percentage of the total (the ceiling — prevents npm from swallowing the list). Illustrative values for a 700k Tier 2 budget:
+
+| Ecosystem  | Tier 2 budget | Tier 1 budget |
+| ---------- | ------------- | ------------- |
+| npm        | 300k          | 50k           |
+| Maven      | 150k          | 25k           |
+| PyPI       | 100k          | 15k           |
+| crates     | 75k           | 5k            |
+| Go modules | 75k           | 5k            |
+| **Total**  | **700k**      | **100k**      |
+
+Specific numbers are a stakeholder decision; the rationale per ecosystem must live alongside the JSONB config so the "why these values?" question is answerable later. Avoid proportional-to-ecosystem-size — it amplifies npm dominance, the opposite of what we want.
+
+#### Spotlight overrides
+
+A new `package_criticality_spotlight` table keyed on `(ecosystem, namespace, name)` carries required `rationale`, `added_by`, `added_at` columns. Rows in this table are flagged `is_critical = TRUE` regardless of computed score. Applied **after** ranking inside the criticality workflow so spotlights are not overwritten on the next pass. Rationale-per-row is deliberate: the safety net stays auditable as it grows.
+
+The spotlight exists because the methodology has a known structural blind spot — packages that are critical but rarely depended on in the observable graph (vendored code, build-time-only tools, dependencies pulled outside the registry). No combination of graph signals will surface these; manual curation is the only path.
+
+#### Implementation note: in-memory graph computation vs deps.dev ingestion
+
+The in-memory build of `transitive_dependent_count` and `centrality_score` is a **direct consequence of the §Database placement and §Worker architecture decisions to store only direct dependencies on `package_dependencies`**. Materializing the full transitive closure would be ~1.5B rows; storing might not be viable at this point, so transitive signals must be computed at scoring time. The chosen approach: stream direct edges into memory per ecosystem (~10M nodes / ~100M edges for npm fits in ~2 GB RAM on a single worker box), compute transitive counts via reverse-BFS and PageRank centrality iteratively (damping 0.85, ~100 iterations, converges on `1e-6`), bulk-merge results into `packages_universe`. No graph DB, no distributed framework.
+
+**Before committing to this implementation, confirm whether deps.dev already provides these signals so we can ingest instead of compute:**
+
+- **Transitive dependent count** — `DependentsLatest` is the table we already source `dependent_packages_count` from. Verify whether its dependent counts are direct-only or include indirect dependents. If indirect counts are included, the column can be sourced in the existing universe-import job (consistent with how the other dependent counts are already populated) and the in-memory transitive computation is unnecessary.
+- **Centrality / importance score** — deps.dev does not appear to expose a PageRank-style score in its current schema.
+
+If both signals are ingestible, the criticality sub-worker reduces to "call `rank_packages_universe()` with the right weights" — much simpler. If only one is ingestible, the in-memory job still runs but does less work. Either way the §deps.dev coverage and gaps table below must be updated to record what's sourced from where.
+
+#### Worker layout
+
+A new directory `services/apps/packages_worker/src/criticality/` with the standard sub-worker layout (`activities.ts`, `workflows.ts`, `schedule.ts`, queries co-located), and `src/bin/criticality-worker.ts` as its entrypoint. Weekly cadence, one workflow per ecosystem, `ScheduleOverlapPolicy.SKIP`. Workflow steps: load graph snapshot → compute transitive counts and PageRank (or skip if ingested from deps.dev) → merge results into `packages_universe` → call `rank_packages_universe()` → apply spotlight overrides → propagate `criticality_score` onto `packages`.
+
+#### High-level flow
+
+```mermaid
+flowchart TD
+    A[package_dependencies<br/>direct edges, partitioned by depends_on_id] -->|stream filtered to<br/>direct kind + latest version| B[Load graph snapshot<br/>per ecosystem]
+    B --> C[Build in-memory<br/>reverse adjacency]
+    C --> D[Transitive dependent count<br/>reverse-BFS per node<br/>OR ingest from deps.dev]
+    C --> E[PageRank centrality<br/>iterative, damping 0.85]
+    D --> F[Merge into<br/>packages_universe]
+    E --> F
+    G[Downloads, direct dependent<br/>packages/repos<br/>existing inputs] --> F
+    F --> H[rank_packages_universe<br/>1. Percentile-rank per ecosystem<br/>2. Weighted blend<br/>3. Per-ecosystem top-N cut]
+    H --> I[Apply spotlight overrides<br/>force is_critical = TRUE]
+    I --> J[Propagate criticality_score<br/>to packages table]
+
+    style D fill:#e1f5ff
+    style E fill:#e1f5ff
+    style I fill:#fff4e1
+```
+
+Inputs in blue are new graph-derived signals; the spotlight step in orange is the deliberate safety net for the methodology's structural blind spot.
+
+#### Additional Decisions
+
+- **Edge filter**: `dependency_kind = 'direct'` only — exclude `dev` and `peer` (they don't represent runtime blast radius).
+- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked`).
+- **Graph scope**: per-ecosystem; don't merge ecosystems into a single graph. Cross-ecosystem edges are rare and noisy.
+- **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold.
+- **Cadence**: weekly, aligned with the existing universe refresh.
+
+**Weights are expected to change.** The starting weight vector (centrality + transitive heaviest, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in.
+
+**Status**: proposed — 2026-05-29. Formula shape, inputs, tier-budget policy, and spotlight table are agreed. Open: (1) whether transitive counts can be sourced from deps.dev before in-memory PageRank work begins; (2) final weight values, which will be tuned against an observable ranked list.
+
+---
+
 ### Write semantics across sub-workers
 
 Five sub-workers run concurrently (npm, Maven, OSV, GitHub, Docker Hub), all writing to the same `packages-db` schema. We define per-table write rules that allow concurrent writes without distributed locking:
@@ -211,7 +338,7 @@ A range `(introduced, fixed, last_affected)` matches `latest_version` when:
 - `fixed IS NULL OR latest_version < fixed`, AND
 - `last_affected IS NULL OR latest_version <= last_affected`.
 
-This is **option (b)** (latest_version inside an active range), plus a **MAL- override** so malicious-package reports flip the flag regardless of CVSS — the XZ-style maintainer-compromise case from the Osprey memo. ~213k of 220k npm OSV records are `MAL-*` with `cvss = NULL`, so option (b) on its own would miss the dominant security signal.
+This is **option (b)** (latest_version inside an active range), plus a **MAL- override** so malicious-package reports flip the flag regardless of CVSS — the XZ-style maintainer-compromise case. ~213k of 220k npm OSV records are `MAL-*` with `cvss = NULL`, so option (b) on its own would miss the dominant security signal.
 
 **Why not option (a)** (any critical advisory exists for the package name, regardless of version): option (a) over-reports — a CVE patched in v1.0 flags a package now on v9.0 — and under-reports when an advisory has multiple `affected[]` ranges where only some are patched. The actionable consumer question is "is the version I'd install today vulnerable?", and that's option (b).
 
@@ -396,7 +523,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi
 ## Open questions / in-flight
 
 - **Sonatype Central Stats API access** — not confirmed as of 2026-05-27. If unavailable by day 5, Maven download counts will be absent from the week-2 demo (`downloads_last_month` NULL for Maven rows; disclose to stakeholders).
-- **criticality_score formula** — the placeholder formula (`X * downloadsCount + Y * dependentCount`) has not been validated against known critical packages. Final formula is yet to be defined.
+- **deps.dev coverage for transitive dependents and centrality** — see §Criticality scoring methodology. Verify whether `DependentsLatest` includes indirect dependents before building the in-memory PageRank/BFS job; cheaper to ingest than to compute if it's already there.
 - **pg_partman + pg_cron setup** — must be confirmed active in the OCI environment before download workers start; `downloads_daily` and `downloads_last_30d` inserts will fail if monthly partitions are not pre-created.
 
 ---
@@ -405,6 +532,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi
 
 - 2026-05-27 — initial record
 - 2026-05-28 — folded standalone ADR-0003 (`has_critical_vulnerability` semantics), ADR-0005 (CVSS scoring strategy), and ADR-0006 (`advisory_affected_ranges` uniqueness scope) into this living record; standalone files removed. Resolved the prior open question on `has_critical_vulnerability` (option b + MAL- override). ADR-0004 (standalone-bin vs Temporal) was removed before merging — the worker architecture decision in this ADR supersedes it.
+- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table). Resolves the prior open question on the placeholder formula. Standalone ADR-0002 was folded into this living record before publication and removed. New open questions added on deps.dev coverage for graph signals and on the `packages_universe` write-semantics row.
 
 ---
 

From babd89e68b988bd54635c9d55066343bda20c4a2 Mon Sep 17 00:00:00 2001
From: Joana Maia <jmaia@contractor.linuxfoundation.org>
Date: Fri, 29 May 2026 17:53:32 +0100
Subject: [PATCH 2/5] Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
---
 docs/adr/0001-oss-packages-design-decisions.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md
index 33e3e68eb8..c97403a86f 100644
--- a/docs/adr/0001-oss-packages-design-decisions.md
+++ b/docs/adr/0001-oss-packages-design-decisions.md
@@ -206,7 +206,7 @@ Inputs in blue are new graph-derived signals; the spotlight step in orange is th
 #### Additional Decisions
 
 - **Edge filter**: `dependency_kind = 'direct'` only — exclude `dev` and `peer` (they don't represent runtime blast radius).
-- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked`).
+- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked` / `is_prerelease`).
 - **Graph scope**: per-ecosystem; don't merge ecosystems into a single graph. Cross-ecosystem edges are rare and noisy.
 - **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold.
 - **Cadence**: weekly, aligned with the existing universe refresh.

From 8dca2f7b4a00675a9fde8822711e14a8934215ba Mon Sep 17 00:00:00 2001
From: Joana Maia <jmaia@contractor.linuxfoundation.org>
Date: Fri, 29 May 2026 17:55:58 +0100
Subject: [PATCH 3/5] docs: fix changelog

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
---
 docs/adr/0001-oss-packages-design-decisions.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md
index c97403a86f..4020416122 100644
--- a/docs/adr/0001-oss-packages-design-decisions.md
+++ b/docs/adr/0001-oss-packages-design-decisions.md
@@ -577,7 +577,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi
 - 2026-05-27 — initial record
 - 2026-05-28 — folded standalone ADR-0003 (`has_critical_vulnerability` semantics), ADR-0005 (CVSS scoring strategy), and ADR-0006 (`advisory_affected_ranges` uniqueness scope) into this living record; standalone files removed. Resolved the prior open question on `has_critical_vulnerability` (option b + MAL- override). ADR-0004 (standalone-bin vs Temporal) was removed before merging — the worker architecture decision in this ADR supersedes it.
 - 2026-05-29 — clarified `packages_universe` import semantics (one-time backfill + weekly snapshot-diff incrementals; the ranking job updates score/flag columns in place). Added §Source of truth: deps.dev backfill vs registries / OSV with lifecycle ownership rules and the agreed `package_source_log` provenance table (`(package_id, source)` PK; `columns` array tracks `table.column` paths each source writes).
-- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table). Resolves the prior open question on the placeholder formula. Standalone ADR-0002 was folded into this living record before publication and removed. New open questions added on deps.dev coverage for graph signals and on the `packages_universe` write-semantics row.
+- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table).
 ---
 
 ## Note on promotion to production

From e361d06c1b1a89dc938e7edd2adbb3473f42d2db Mon Sep 17 00:00:00 2001
From: Joana Maia <jmaia@contractor.linuxfoundation.org>
Date: Fri, 29 May 2026 17:57:55 +0100
Subject: [PATCH 4/5] Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
---
 docs/adr/0001-oss-packages-design-decisions.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md
index 4020416122..966ba4b0ca 100644
--- a/docs/adr/0001-oss-packages-design-decisions.md
+++ b/docs/adr/0001-oss-packages-design-decisions.md
@@ -211,7 +211,7 @@ Inputs in blue are new graph-derived signals; the spotlight step in orange is th
 - **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold.
 - **Cadence**: weekly, aligned with the existing universe refresh.
 
-**Weights are expected to change.** The starting weight vector (centrality + transitive heaviest, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in.
+**Weights are expected to change.** The starting weight vector (centrality heaviest, transitive kept low as a sanity floor, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in.
 
 **Status**: proposed — 2026-05-29. Formula shape, inputs, tier-budget policy, and spotlight table are agreed. Open: (1) whether transitive counts can be sourced from deps.dev before in-memory PageRank work begins; (2) final weight values, which will be tuned against an observable ranked list.
 

From b678dc3af9cb576ead773f87d1fa3940221fc962 Mon Sep 17 00:00:00 2001
From: Joana Maia <jmaia@contractor.linuxfoundation.org>
Date: Fri, 29 May 2026 18:10:22 +0100
Subject: [PATCH 5/5] docs: remove mention to is_tier2 column

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
---
 docs/adr/0001-oss-packages-design-decisions.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md
index 966ba4b0ca..71e075bc9c 100644
--- a/docs/adr/0001-oss-packages-design-decisions.md
+++ b/docs/adr/0001-oss-packages-design-decisions.md
@@ -15,7 +15,7 @@ The oss-packages domain is being built inside CDP as a new, independent capabili
 | Database placement                             | decided                                             |
 | Worker architecture                            | decided                                             |
 | Universe source and critical-package selection | decided                                             |
-| Criticality scoring methodology                | proposed (weights tunable; deps.dev ingestion vs in-memory pending) |
+| Criticality scoring methodology                | proposed (weights tunable)                                          |
 | Write semantics across sub-workers             | decided                                             |
 | Package → repository provenance                | decided                                             |
 | OSV as canonical security source               | decided                                             |
@@ -145,7 +145,7 @@ These are a starting point, not a recommendation we've validated. They will be r
 
 #### Per-ecosystem tier budgets
 
-`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N. Tier 2 reuses the same shape with a second JSONB parameter and a separate `is_tier2` column on `packages_universe`. Tier 1 ⊆ Tier 2 falls out naturally from the same ranking.
+`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N.
 
 Allocation policy is **floor + ceiling + judgment**: every onboarded ecosystem gets a minimum (the floor — guarantees representation regardless of size), no single ecosystem exceeds a percentage of the total (the ceiling — prevents npm from swallowing the list). Illustrative values for a 700k Tier 2 budget: