From 7d5251398a281469937d558c3eed854df2e38f2a Mon Sep 17 00:00:00 2001 From: Joana Maia Date: Fri, 29 May 2026 17:02:49 +0100 Subject: [PATCH 1/5] docs: add criticality score proposal Signed-off-by: Joana Maia --- .../adr/0001-oss-packages-design-decisions.md | 162 ++++++++++++++++-- 1 file changed, 145 insertions(+), 17 deletions(-) diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md index 46daeddf89..32b0d96ff6 100644 --- a/docs/adr/0001-oss-packages-design-decisions.md +++ b/docs/adr/0001-oss-packages-design-decisions.md @@ -10,20 +10,21 @@ The oss-packages domain is being built inside CDP as a new, independent capabili ## Scope and current status -| Decision area | Status | -| ---------------------------------------------- | ------------------------------------- | -| Database placement | decided | -| Worker architecture | decided | -| Universe source and critical-package selection | decided (formula is a placeholder) | -| Write semantics across sub-workers | decided | -| Package → repository provenance | decided | -| OSV as canonical security source | decided | -| CVSS scoring strategy | decided (v4 numeric scoring deferred) | -| `has_critical_vulnerability` semantics | decided | -| `advisory_affected_ranges` uniqueness scope | decided | -| Per-source ingestion strategies | decided (Sonatype API access pending) | -| deps.dev coverage and gaps | decided | -| Downloads timeline by tier | decided | +| Decision area | Status | +| ---------------------------------------------- | --------------------------------------------------- | +| Database placement | decided | +| Worker architecture | decided | +| Universe source and critical-package selection | decided | +| Criticality scoring methodology | proposed (weights tunable; deps.dev ingestion vs in-memory pending) | +| Write semantics across sub-workers | decided | +| Package → repository provenance | decided | +| OSV as canonical security source | decided | +| CVSS scoring strategy | decided (v4 numeric scoring deferred) | +| `has_critical_vulnerability` semantics | decided | +| `advisory_affected_ranges` uniqueness scope | decided | +| Per-source ingestion strategies | decided (Sonatype API access pending) | +| deps.dev coverage and gaps | decided | +| Downloads timeline by tier | decided | --- @@ -81,7 +82,7 @@ Tier 2 enriches a critical slice of the npm and Maven ecosystems — not the ful We use the [deps.dev BigQuery public datasets](https://deps.dev) — specifically `PackageVersionsLatest`, `DependentsLatest`, `PackageVersionToProjectLatest`, and `ProjectsLatest` — filtered to `System IN ('NPM', 'MAVEN')` as the universe input. The BigQuery data is exported to Parquet files and imported into `packages_universe` on a weekly cadence. A scoring + ranking job then promotes the top-N per ecosystem by setting `is_critical = true` and copying `criticality_score` onto the full `packages` table. -The current scoring formula and per-ecosystem critical-package quotas are **not yet finalized** — both are still under discussion. The ranking function takes `critical_top_n_by_ecosystem` as a JSONB parameter and weights as numeric inputs, so thresholds and formula coefficients can be tuned at call time without a schema change. +The scoring formula, per-ecosystem critical-package quotas, graph-signal inputs, and spotlight-override mechanism are defined in §Criticality scoring methodology below. The ranking function takes `critical_top_n_by_ecosystem` as a JSONB parameter and weights as numeric inputs, so thresholds and formula coefficients can be tuned at call time without a schema change. The BigQuery free tier is approximately 1 TiB/month. Column projection and `System` filtering are mandatory on every query; full-table scans will exhaust the quota. @@ -89,6 +90,132 @@ The BigQuery free tier is approximately 1 TiB/month. Column projection and `Syst --- +### Criticality scoring methodology + +The §Universe source section above establishes that `packages_universe` is the Tier 3 ranking workspace and that `rank_packages_universe()` produces the criticality scores. This section locks in **what goes into the score** — replacing the placeholder formula (`X * downloadsCount + Y * dependentCount`) with a defensible methodology that captures load-bearing upstream packages (the left-pad / XZ pattern), normalizes across ecosystems, and supports manual overrides for known-critical primitives. + +This is treated as a brand-new workstream: no reuse or extension of any existing in-flight criticality code. All code lives inside `services/apps/packages_worker/src/criticality/` following the §Worker architecture pattern (`activities.ts`, `workflows.ts`, `schedule.ts`, queries co-located in the same directory). No additions to `services/libs/data-access-layer` — consistent with how other sub-workers like `osv` and `enricher` keep DB access local to the worker. + +#### Inputs + +Five signals, all stored on `packages_universe`: + +| Signal | Existing? | Source | +| ---------------------------- | --------- | ----------------------------------------------------------------------- | +| `downloads_last_30d` | yes | weekly downloads ingestion (registry APIs) | +| `dependent_packages_count` | yes | deps.dev `DependentsLatest` | +| `dependent_repos_count` | yes | derived in Postgres from `package_repos` | +| `transitive_dependent_count` | **new** | computed in the criticality sub-worker (see Implementation note below) | +| `centrality_score` | **new** | computed in the criticality sub-worker (PageRank, see below) | + +Direct dependent counts capture popularity. Transitive dependent count and centrality capture **blast radius** — load-bearing upstream packages with few direct dependents but massive indirect reach (the left-pad / XZ class that direct counts alone miss). + +PageRank centrality is the primary blast-radius signal; transitive dependent count is stored as a sanity check / floor, not as an equal-weight input. The two are correlated — PageRank is a weighted refinement of transitive count, where a package's score depends recursively on the importance of who depends on it. Blending them as independent signals would double-count blast radius. Both columns are stored so weights can be tuned without rerunning the graph job. + +#### Scoring formula + +Per-ecosystem percentile-rank of each log-transformed signal, then weighted blend: + +``` +score = w_downloads * pct_rank( LN(1 + downloads_last_30d) ) within ecosystem + + w_dep_pkgs * pct_rank( LN(1 + dependent_packages_count) ) within ecosystem + + w_dep_repos * pct_rank( LN(1 + dependent_repos_count) ) within ecosystem + + w_transitive * pct_rank( LN(1 + transitive_dependent_count) ) within ecosystem + + w_centrality * pct_rank( centrality_score ) within ecosystem +``` + +Weights sum to 1.0 → score ∈ `[0, 1]`. Centrality skips the `LN()` (PageRank is already in a small bounded range) but still passes through `pct_rank` so every signal lands on the same percentile scale. Starting weight bias: centrality dominant (PageRank is the primary blast-radius signal), transitive count low (kept as a sanity floor — see Inputs note on double-counting), direct dependents and downloads balanced as secondary popularity signals. All weights are call-time numeric parameters to `rank_packages_universe()` — tunable without schema or code changes. + +**Suggested starting weights** (use as the first call, then iterate): + +| Weight | Value | Signal | Rationale | +| --------------- | ----- | -------------------- | ------------------------------------------------------ | +| `w_centrality` | 0.40 | PageRank | Primary blast-radius signal | +| `w_transitive` | 0.10 | Transitive dependents | Sanity floor; low to avoid double-counting centrality | +| `w_dep_pkgs` | 0.20 | Direct dependent packages | Popularity within the package graph | +| `w_dep_repos` | 0.15 | Direct dependent repos | Popularity across consumer codebases | +| `w_downloads` | 0.15 | 30-day downloads | Adoption signal, lighter weight (noisy for new packages) | + +These are a starting point, not a recommendation we've validated. They will be revised once the first ranked list is observable and stakeholders review which packages land in / near Tier 1 — particularly for smaller ecosystems where the percentile distribution is less stable. + +**Why percentile-rank, not min-max:** even after log-transform, heavy-tailed signals retain extreme outliers that bend the min-max scale. Example — downloads `[10, 100, 1000, 10000, 1B]` log-transformed are `[2.4, 4.6, 6.9, 9.2, 20.7]`. Min-max on those gives `[0.00, 0.12, 0.25, 0.37, 1.00]` (four out of five squeezed below 0.4); percentile-rank gives uniform `[0.00, 0.25, 0.50, 0.75, 1.00]`, stable to outliers, and `0.5` means "median within ecosystem" regardless of which ecosystem. + +**Why per-ecosystem:** the percentile uses `PARTITION BY ecosystem` so ecosystems are never compared on the same absolute scale. A top-percentile crates package is strategically important; without per-ecosystem partitioning it would be buried by npm's volume. + +#### Per-ecosystem tier budgets + +`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N. Tier 2 reuses the same shape with a second JSONB parameter and a separate `is_tier2` column on `packages_universe`. Tier 1 ⊆ Tier 2 falls out naturally from the same ranking. + +Allocation policy is **floor + ceiling + judgment**: every onboarded ecosystem gets a minimum (the floor — guarantees representation regardless of size), no single ecosystem exceeds a percentage of the total (the ceiling — prevents npm from swallowing the list). Illustrative values for a 700k Tier 2 budget: + +| Ecosystem | Tier 2 budget | Tier 1 budget | +| ---------- | ------------- | ------------- | +| npm | 300k | 50k | +| Maven | 150k | 25k | +| PyPI | 100k | 15k | +| crates | 75k | 5k | +| Go modules | 75k | 5k | +| **Total** | **700k** | **100k** | + +Specific numbers are a stakeholder decision; the rationale per ecosystem must live alongside the JSONB config so the "why these values?" question is answerable later. Avoid proportional-to-ecosystem-size — it amplifies npm dominance, the opposite of what we want. + +#### Spotlight overrides + +A new `package_criticality_spotlight` table keyed on `(ecosystem, namespace, name)` carries required `rationale`, `added_by`, `added_at` columns. Rows in this table are flagged `is_critical = TRUE` regardless of computed score. Applied **after** ranking inside the criticality workflow so spotlights are not overwritten on the next pass. Rationale-per-row is deliberate: the safety net stays auditable as it grows. + +The spotlight exists because the methodology has a known structural blind spot — packages that are critical but rarely depended on in the observable graph (vendored code, build-time-only tools, dependencies pulled outside the registry). No combination of graph signals will surface these; manual curation is the only path. + +#### Implementation note: in-memory graph computation vs deps.dev ingestion + +The in-memory build of `transitive_dependent_count` and `centrality_score` is a **direct consequence of the §Database placement and §Worker architecture decisions to store only direct dependencies on `package_dependencies`**. Materializing the full transitive closure would be ~1.5B rows; storing might not be viable at this point, so transitive signals must be computed at scoring time. The chosen approach: stream direct edges into memory per ecosystem (~10M nodes / ~100M edges for npm fits in ~2 GB RAM on a single worker box), compute transitive counts via reverse-BFS and PageRank centrality iteratively (damping 0.85, ~100 iterations, converges on `1e-6`), bulk-merge results into `packages_universe`. No graph DB, no distributed framework. + +**Before committing to this implementation, confirm whether deps.dev already provides these signals so we can ingest instead of compute:** + +- **Transitive dependent count** — `DependentsLatest` is the table we already source `dependent_packages_count` from. Verify whether its dependent counts are direct-only or include indirect dependents. If indirect counts are included, the column can be sourced in the existing universe-import job (consistent with how the other dependent counts are already populated) and the in-memory transitive computation is unnecessary. +- **Centrality / importance score** — deps.dev does not appear to expose a PageRank-style score in its current schema. + +If both signals are ingestible, the criticality sub-worker reduces to "call `rank_packages_universe()` with the right weights" — much simpler. If only one is ingestible, the in-memory job still runs but does less work. Either way the §deps.dev coverage and gaps table below must be updated to record what's sourced from where. + +#### Worker layout + +A new directory `services/apps/packages_worker/src/criticality/` with the standard sub-worker layout (`activities.ts`, `workflows.ts`, `schedule.ts`, queries co-located), and `src/bin/criticality-worker.ts` as its entrypoint. Weekly cadence, one workflow per ecosystem, `ScheduleOverlapPolicy.SKIP`. Workflow steps: load graph snapshot → compute transitive counts and PageRank (or skip if ingested from deps.dev) → merge results into `packages_universe` → call `rank_packages_universe()` → apply spotlight overrides → propagate `criticality_score` onto `packages`. + +#### High-level flow + +```mermaid +flowchart TD + A[package_dependencies
direct edges, partitioned by depends_on_id] -->|stream filtered to
direct kind + latest version| B[Load graph snapshot
per ecosystem] + B --> C[Build in-memory
reverse adjacency] + C --> D[Transitive dependent count
reverse-BFS per node
OR ingest from deps.dev] + C --> E[PageRank centrality
iterative, damping 0.85] + D --> F[Merge into
packages_universe] + E --> F + G[Downloads, direct dependent
packages/repos
existing inputs] --> F + F --> H[rank_packages_universe
1. Percentile-rank per ecosystem
2. Weighted blend
3. Per-ecosystem top-N cut] + H --> I[Apply spotlight overrides
force is_critical = TRUE] + I --> J[Propagate criticality_score
to packages table] + + style D fill:#e1f5ff + style E fill:#e1f5ff + style I fill:#fff4e1 +``` + +Inputs in blue are new graph-derived signals; the spotlight step in orange is the deliberate safety net for the methodology's structural blind spot. + +#### Additional Decisions + +- **Edge filter**: `dependency_kind = 'direct'` only — exclude `dev` and `peer` (they don't represent runtime blast radius). +- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked`). +- **Graph scope**: per-ecosystem; don't merge ecosystems into a single graph. Cross-ecosystem edges are rare and noisy. +- **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold. +- **Cadence**: weekly, aligned with the existing universe refresh. + +**Weights are expected to change.** The starting weight vector (centrality + transitive heaviest, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in. + +**Status**: proposed — 2026-05-29. Formula shape, inputs, tier-budget policy, and spotlight table are agreed. Open: (1) whether transitive counts can be sourced from deps.dev before in-memory PageRank work begins; (2) final weight values, which will be tuned against an observable ranked list. + +--- + ### Write semantics across sub-workers Five sub-workers run concurrently (npm, Maven, OSV, GitHub, Docker Hub), all writing to the same `packages-db` schema. We define per-table write rules that allow concurrent writes without distributed locking: @@ -211,7 +338,7 @@ A range `(introduced, fixed, last_affected)` matches `latest_version` when: - `fixed IS NULL OR latest_version < fixed`, AND - `last_affected IS NULL OR latest_version <= last_affected`. -This is **option (b)** (latest_version inside an active range), plus a **MAL- override** so malicious-package reports flip the flag regardless of CVSS — the XZ-style maintainer-compromise case from the Osprey memo. ~213k of 220k npm OSV records are `MAL-*` with `cvss = NULL`, so option (b) on its own would miss the dominant security signal. +This is **option (b)** (latest_version inside an active range), plus a **MAL- override** so malicious-package reports flip the flag regardless of CVSS — the XZ-style maintainer-compromise case. ~213k of 220k npm OSV records are `MAL-*` with `cvss = NULL`, so option (b) on its own would miss the dominant security signal. **Why not option (a)** (any critical advisory exists for the package name, regardless of version): option (a) over-reports — a CVE patched in v1.0 flags a package now on v9.0 — and under-reports when an advisory has multiple `affected[]` ranges where only some are patched. The actionable consumer question is "is the version I'd install today vulnerable?", and that's option (b). @@ -396,7 +523,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi ## Open questions / in-flight - **Sonatype Central Stats API access** — not confirmed as of 2026-05-27. If unavailable by day 5, Maven download counts will be absent from the week-2 demo (`downloads_last_month` NULL for Maven rows; disclose to stakeholders). -- **criticality_score formula** — the placeholder formula (`X * downloadsCount + Y * dependentCount`) has not been validated against known critical packages. Final formula is yet to be defined. +- **deps.dev coverage for transitive dependents and centrality** — see §Criticality scoring methodology. Verify whether `DependentsLatest` includes indirect dependents before building the in-memory PageRank/BFS job; cheaper to ingest than to compute if it's already there. - **pg_partman + pg_cron setup** — must be confirmed active in the OCI environment before download workers start; `downloads_daily` and `downloads_last_30d` inserts will fail if monthly partitions are not pre-created. --- @@ -405,6 +532,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi - 2026-05-27 — initial record - 2026-05-28 — folded standalone ADR-0003 (`has_critical_vulnerability` semantics), ADR-0005 (CVSS scoring strategy), and ADR-0006 (`advisory_affected_ranges` uniqueness scope) into this living record; standalone files removed. Resolved the prior open question on `has_critical_vulnerability` (option b + MAL- override). ADR-0004 (standalone-bin vs Temporal) was removed before merging — the worker architecture decision in this ADR supersedes it. +- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table). Resolves the prior open question on the placeholder formula. Standalone ADR-0002 was folded into this living record before publication and removed. New open questions added on deps.dev coverage for graph signals and on the `packages_universe` write-semantics row. --- From babd89e68b988bd54635c9d55066343bda20c4a2 Mon Sep 17 00:00:00 2001 From: Joana Maia Date: Fri, 29 May 2026 17:53:32 +0100 Subject: [PATCH 2/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia --- docs/adr/0001-oss-packages-design-decisions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md index 33e3e68eb8..c97403a86f 100644 --- a/docs/adr/0001-oss-packages-design-decisions.md +++ b/docs/adr/0001-oss-packages-design-decisions.md @@ -206,7 +206,7 @@ Inputs in blue are new graph-derived signals; the spotlight step in orange is th #### Additional Decisions - **Edge filter**: `dependency_kind = 'direct'` only — exclude `dev` and `peer` (they don't represent runtime blast radius). -- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked`). +- **Version resolution**: each package's latest non-yanked, non-prerelease version (uses existing `versions.is_latest` / `is_yanked` / `is_prerelease`). - **Graph scope**: per-ecosystem; don't merge ecosystems into a single graph. Cross-ecosystem edges are rare and noisy. - **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold. - **Cadence**: weekly, aligned with the existing universe refresh. From 8dca2f7b4a00675a9fde8822711e14a8934215ba Mon Sep 17 00:00:00 2001 From: Joana Maia Date: Fri, 29 May 2026 17:55:58 +0100 Subject: [PATCH 3/5] docs: fix changelog Signed-off-by: Joana Maia --- docs/adr/0001-oss-packages-design-decisions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md index c97403a86f..4020416122 100644 --- a/docs/adr/0001-oss-packages-design-decisions.md +++ b/docs/adr/0001-oss-packages-design-decisions.md @@ -577,7 +577,7 @@ A package promoted from Tier 3 to Tier 2 (becomes critical) will have rolling-wi - 2026-05-27 — initial record - 2026-05-28 — folded standalone ADR-0003 (`has_critical_vulnerability` semantics), ADR-0005 (CVSS scoring strategy), and ADR-0006 (`advisory_affected_ranges` uniqueness scope) into this living record; standalone files removed. Resolved the prior open question on `has_critical_vulnerability` (option b + MAL- override). ADR-0004 (standalone-bin vs Temporal) was removed before merging — the worker architecture decision in this ADR supersedes it. - 2026-05-29 — clarified `packages_universe` import semantics (one-time backfill + weekly snapshot-diff incrementals; the ranking job updates score/flag columns in place). Added §Source of truth: deps.dev backfill vs registries / OSV with lifecycle ownership rules and the agreed `package_source_log` provenance table (`(package_id, source)` PK; `columns` array tracks `table.column` paths each source writes). -- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table). Resolves the prior open question on the placeholder formula. Standalone ADR-0002 was folded into this living record before publication and removed. New open questions added on deps.dev coverage for graph signals and on the `packages_universe` write-semantics row. +- 2026-05-29 — added §Criticality scoring methodology (graph signals — transitive dependent count and PageRank centrality; per-ecosystem percentile-rank formula in `[0, 1]`; floor + ceiling tier budget policy; `package_criticality_spotlight` table). --- ## Note on promotion to production From e361d06c1b1a89dc938e7edd2adbb3473f42d2db Mon Sep 17 00:00:00 2001 From: Joana Maia Date: Fri, 29 May 2026 17:57:55 +0100 Subject: [PATCH 4/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Joana Maia --- docs/adr/0001-oss-packages-design-decisions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md index 4020416122..966ba4b0ca 100644 --- a/docs/adr/0001-oss-packages-design-decisions.md +++ b/docs/adr/0001-oss-packages-design-decisions.md @@ -211,7 +211,7 @@ Inputs in blue are new graph-derived signals; the spotlight step in orange is th - **Score range**: `[0, 1]` (weights sum to 1.0). Score interpretation: weighted average percentile across signals within ecosystem. Tier membership is determined by rank, not by score threshold. - **Cadence**: weekly, aligned with the existing universe refresh. -**Weights are expected to change.** The starting weight vector (centrality + transitive heaviest, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in. +**Weights are expected to change.** The starting weight vector (centrality heaviest, transitive kept low as a sanity floor, downloads and direct dependents lighter) is a judgment-based initial bias, not a validated configuration. Once the ranked list is observable, weights will be tuned based on stakeholder review of which packages land where — particularly at the Tier 1 boundary and for smaller ecosystems. Because weights are call-time numeric parameters to `rank_packages_universe()`, retuning does not require a schema change, code change, or redeploy. Expect multiple iterations before weights are locked in. **Status**: proposed — 2026-05-29. Formula shape, inputs, tier-budget policy, and spotlight table are agreed. Open: (1) whether transitive counts can be sourced from deps.dev before in-memory PageRank work begins; (2) final weight values, which will be tuned against an observable ranked list. From b678dc3af9cb576ead773f87d1fa3940221fc962 Mon Sep 17 00:00:00 2001 From: Joana Maia Date: Fri, 29 May 2026 18:10:22 +0100 Subject: [PATCH 5/5] docs: remove mention to is_tier2 column Signed-off-by: Joana Maia --- docs/adr/0001-oss-packages-design-decisions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/adr/0001-oss-packages-design-decisions.md b/docs/adr/0001-oss-packages-design-decisions.md index 966ba4b0ca..71e075bc9c 100644 --- a/docs/adr/0001-oss-packages-design-decisions.md +++ b/docs/adr/0001-oss-packages-design-decisions.md @@ -15,7 +15,7 @@ The oss-packages domain is being built inside CDP as a new, independent capabili | Database placement | decided | | Worker architecture | decided | | Universe source and critical-package selection | decided | -| Criticality scoring methodology | proposed (weights tunable; deps.dev ingestion vs in-memory pending) | +| Criticality scoring methodology | proposed (weights tunable) | | Write semantics across sub-workers | decided | | Package → repository provenance | decided | | OSV as canonical security source | decided | @@ -145,7 +145,7 @@ These are a starting point, not a recommendation we've validated. They will be r #### Per-ecosystem tier budgets -`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N. Tier 2 reuses the same shape with a second JSONB parameter and a separate `is_tier2` column on `packages_universe`. Tier 1 ⊆ Tier 2 falls out naturally from the same ranking. +`rank_packages_universe()` already takes `critical_top_n_by_ecosystem` as a JSONB parameter that ranks within each ecosystem and cuts at top N. Allocation policy is **floor + ceiling + judgment**: every onboarded ecosystem gets a minimum (the floor — guarantees representation regardless of size), no single ecosystem exceeds a percentage of the total (the ceiling — prevents npm from swallowing the list). Illustrative values for a 700k Tier 2 budget: