Improved SNP Annotation Parser for SnpEff-Annotated VCF Files

Overview

This Python script extracts and summarizes variant annotations from a SnpEff-annotated VCF file. It was developed as an extension of the variant annotation practical to provide a more flexible and informative method for investigating variants within a genomic region of interest.

Unlike the original practical script, which only reports missense variants and prints results directly to the terminal, this implementation:

Produces a structured tab-delimited output file.
Extracts additional annotation information including impact level, gene name, transcript information and HGVS nomenclature.
Allows filtering by variant effect (e.g. missense variants only).
Allows filtering by impact category (HIGH, MODERATE, LOW).
Supports selection of known dbSNP variants only.
Generates output suitable for downstream analysis in Excel, R or Python.

Requirements

Python 3.7 or later.

No external Python packages are required.

Input Files

The script expects a VCF file that has already been annotated using SnpEff.

Example:

ldlr_filtered_snps_final_ann_dbsnp.vcf

Usage

Basic usage:

python3 improved_parse_vcf_annotations.py \
ldlr_filtered_snps_final_ann_dbsnp.vcf \
11089462 \
11133820

Arguments:

Argument	Description
vcf_file	Annotated VCF file
gene_start	Start coordinate of target gene
gene_end	End coordinate of target gene

Output

By default the script creates:

improved_snp_annotations.tsv

The output table contains:

Chromosome
Position
Variant ID
Reference allele
Alternate allele
Quality score
Filter status
Read depth
Allele frequency
Variant effect
Impact level
Gene name
Transcript ID
Transcript biotype
HGVS cDNA annotation
HGVS protein annotation
Protein position

Examples

Extract all variants within LDLR

python3 improved_parse_vcf_annotations.py \
ldlr_filtered_snps_final_ann_dbsnp.vcf \
11089462 \
11133820

Extract missense variants only

python3 improved_parse_vcf_annotations.py \
ldlr_filtered_snps_final_ann_dbsnp.vcf \
11089462 \
11133820 \
--effect missense_variant \
-o LDLR_missense.tsv

Extract HIGH impact variants only

python3 improved_parse_vcf_annotations.py \
ldlr_filtered_snps_final_ann_dbsnp.vcf \
11089462 \
11133820 \
--impact HIGH \
-o LDLR_high_impact.tsv

Extract known dbSNP variants only

python3 improved_parse_vcf_annotations.py \
ldlr_filtered_snps_final_ann_dbsnp.vcf \
11089462 \
11133820 \
--only-rs \
-o LDLR_known_rs.tsv

Comparison with Original Practical Script

Feature	Original Script	Improved Script
Region filtering	✓	✓
Missense variant detection	✓	✓
Transcript extraction	✓	✓
HGVS extraction	✓	✓
Terminal output	✓	✓
Structured output file	✗	✓
Impact filtering	✗	✓
dbSNP filtering	✗	✓
Additional annotations	✗	✓
Downstream analysis ready	✗	✓

Potential Future Improvements

Possible extensions include:

Export to CSV and Excel formats.
Functional consequence prioritisation.
ClinVar annotation support.
Protein domain annotation.
PolyPhen and SIFT score integration.
Automatic generation of summary statistics and plots.

Author

Edward Ying | Imperial College London, Biology

Developed as an extension of a variant annotation practical to improve exploration and interpretation of SnpEff-annotated VCF files.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Introduce_nucleotide_change.py		Introduce_nucleotide_change.py
README.md		README.md
Variant_Annotation.py		Variant_Annotation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improved SNP Annotation Parser for SnpEff-Annotated VCF Files

Overview

Requirements

Input Files

Usage

Output

Examples

Extract all variants within LDLR

Extract missense variants only

Extract HIGH impact variants only

Extract known dbSNP variants only

Comparison with Original Practical Script

Potential Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improved SNP Annotation Parser for SnpEff-Annotated VCF Files

Overview

Requirements

Input Files

Usage

Output

Examples

Extract all variants within LDLR

Extract missense variants only

Extract HIGH impact variants only

Extract known dbSNP variants only

Comparison with Original Practical Script

Potential Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages