Skip to content

Boyle-Lab/NanoMEI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoMEI

Build sample-specific diploid genomes with non-reference mobile element insertions from targeted long-read TE-capture data.

Overview

NanoMEI is a post-processing tool for TEnCATS and Fiber-TEnCATS datasets analyzed with NanoPal.

Starting from NanoPal mobile element insertion calls, NanoMEI extracts insertion-supporting sequence from read softclips or BLAST-defined read segments, groups reads that support the same insertion event, and builds a consensus sequence for each MEI. It then writes (optionally phased) MEI VCF and can uses vcf2diploid to augment the reference genome with the reconstructed non-reference insertions.

Installation

All dependencies can be installed with the provided conda environment:

conda env create -f environment.yml
conda activate NanoMEI
pip install -e .

Running tests

To make sure everything is working, you can run the following test from the main repository folder:

pytest -v

Usage

Please first run NanoPal on your (Fiber-)TEnCATS data to identify captured non-reference TEs. Once you have the NanoPal output files summary.final.PALMER.TE.read.txt and blastn_refine.all.txt, you can use NanoMEI to reconstruct MEI sequences and augment your reference genome.

reconstruct-mei \
  --final-summary-palmer summary.final.PALMER.TE.read.txt \
  --blastn-refine blastn_refine.all.txt \
  --bam-file reads.hg38.bam \
  --output-vcf sample.MEI.vcf \
  --reference-genome-fasta hg38.fa \
  --sample-id sample 
Argument Required Description
--final-summary-palmer Yes Path to the NanoPal/PALMER summary file, usually named summary.final.PALMER.TE.read.txt. This file provides read-level evidence for captured non-reference TE insertions.
--blastn-refine Yes Path to the NanoPal BLAST refinement output, usually named blastn_refine.all.txt. This file is used to determine which portion of each read corresponds to the mobile element insertion.
--bam-file Yes BAM file containing nanopore reads aligned to the reference genome. NanoMEI uses this file to extract insertion-supporting read sequence from softclips or BLAST-defined read segments.
--output-vcf Yes Path/name for the final MEI VCF produced by NanoMEI.
--reference-genome-fasta Yes Reference genome FASTA used by vcf2diploid when creating the sample-specific diploid genome.
--sample-id Yes Sample identifier to use in the final VCF and vcf2diploid output.
--vcf2diploid-jar No Path to the vcf2diploid.jar file. vcf2diploid.jar is already in the resources subdirectory, but you can provide a new path if you want to try out new versions
--min-reads-support No Minimum number of reads required to build a consensus sequence for an insertion event. Default: 10.
--phased-reads No Optional file with read-level haplotype assignments. The first column must contain the read name matching the FASTA header, and the second column must contain the phase (`1
--output-dir No Directory for intermediate files and vcf2diploid output. Default: TE_vcf.
--vcf-header No Path to the VCF header file. The provided default is a minimal header that can be used with any reference genome. An example of a more detailed header is also provided for hg38; use a different header if working with another reference.
--use-blast-defined-only No If set, NanoMEI extracts only the BLAST-defined read segment instead of the full trimmed softclip.
-v, --version No Print the NanoMEI version and exit.
-h, --help No Print the help message and exit.

Citation

If you use this repository, please cite:

Fiber-TEnCATS reveals haplotype-specific chromatin accessibility and DNA methylation at human L1HS loci

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages