rmduplicate¶
This function is used for removing duplicates in WGS data.
Note
This function is calling gatk, please install GATK before using.
McKenna, Aaron, et al. “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.” Genome research 20.9 (2010): 1297-1303.
Parameters¶
rmduplicate(bamInput=None, outputdir=None,
threads=1, stepNum=None,
upstream=None, verbose=True)
bamInput: list, bam file input.
outputdir: str, output result folder, None means the same folder as input files.
Xmx: How many memory will be used for every thread, default: 4G.
threads: int, how many thread to use.
stepNum: int or str, step flag for folder name.
upstream: upstream output results, used for pipeline.
verbose: bool, True means print all stdout, but will be slow; False means black stdout verbose, much faster.
Example usage:
bams = ["test1.bam", "test1.bam"]
rmduplicate(bamInput=bams)