virusdetect¶
This function is used for detect virus in sequencing data.
Note
This this function is calling centrifuge, only used for WGS data.
Kim, Daehwan, et al. “Centrifuge: rapid and sensitive classification of metagenomic sequences.” Genome research 26.12 (2016): 1721-1729.
Parameters¶
virusdetect(seqInput1=None, seqInput2=None,
ref=None, outputdir=None,
threads=1, paired=True,
other_params={"-q": True, "-N": 1, "--time": True},
stepNum=None, upstream=None,)
seqInput1: list, input _1 fastq files.
seqInput2: list, input _2 fastq files, None for single end.
ref: bowtie2 reference path.
outputdir: str, output result folder, None means the same folder as input files.
threads: int, how many thread to use.
paired: True for paired data, False for single end data.
- other_params: dict, other parameters passing to Bismark.
“-parameter”: True means “-parameter” in command line. “-parameter”: 1 means “-parameter 1” in command line.
stepNum: int or str, step flag for folder name.
upstream: upstream output results, used for pipeline. This parameter can be True, which means a new pipeline start.
verbose: bool, True means print all stdout, but will be slow; False means black stdout verbose, much faster.
Example usage:
from cfDNApipe import *
import glob
pipeConfigure(
threads=20,
genome="hg19",
refdir=r"path_to_reference/hg19",
outdir=r"path_to_output/virus_output",
data="WGS",
type="paired",
build=True,
JavaMem="10g",
)
# Download and Build Virus Genome
Configure.virusGenomeCheck(folder="path_to_reference/virus_database", build=True)
# paired data
fq1 = glob.glob("path_to_unmapped/*.fq.1.gz")
fq2 = glob.glob("path_to_unmapped/*.fq.2.gz")
virusdetect(seqInput1=fq1, seqInput2=fq2, upstream=True)