Utilities for downloading OSV data, enriching vulnerabilities with a recidivism metric, and cloning referenced source repositories locally.
Copy the default config and edit your local paths:
cp recidivism.default.ini recidivism.iniBoth scripts read settings from recidivism.ini. If that file is missing, the
scripts print guidance and fall back to recidivism.default.ini.
Configuration options in the .ini file can be overridden at runtime using command line (see examples below)
python scripts/dump_osv.pyThis script:
- downloads the OSV dump (
OSV-all.zipby default), - extracts all vulnerabilities,
- computes a recidivism metric using CWE recurrence and repository/fix history,
- appends recidivism details to each vulnerability and writes JSONL output.
If you want to customize these at runtime, you can use the command line like this:
python scripts/dump_osv.py --osv-dump-url=https://storage.googleapis.com/osv-vulnerabilities/RubyGems/all.zip(Leaving this documentation in for now until we add this step back in)
python scripts/clone_osv_repositories.py \
--osv-dir data/osv_dump \
--target-dir data/repos \
--update-existingThis script scans OSV vulnerabilities for GitHub source references and
clones/updates local copies for research workflows (organized as
<target-dir>/<owner>/<repo>).
This command runs the script and removes empty directories without user prompts.
python scripts/collect_recidivism.pyThis script:
- scans the
OSV-all.zipfor all vulnerabilities in json files - calculates a recidivism score for each vulnerability, as available
- updates the JSONL in
scores.outputconfig with severity scores - (re)generates individual JSON files in
data/scores/<vulnerability_id>.jsoncontaining:- recidivism string (TBD)