rmduplicate =========== This function is used for removing duplicates in WGS data. .. note:: This function is calling gatk, please install GATK before using. `gatk official docs `__ McKenna, Aaron, et al. "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data." Genome research 20.9 (2010): 1297-1303. Parameters ~~~~~~~~~~ .. code:: python rmduplicate(bamInput=None, outputdir=None, threads=1, stepNum=None, upstream=None, verbose=True) - bamInput: list, bam file input. - outputdir: str, output result folder, None means the same folder as input files. - Xmx: How many memory will be used for every thread, default: 4G. - threads: int, how many thread to use. - stepNum: int or str, step flag for folder name. - upstream: upstream output results, used for pipeline. - verbose: bool, True means print all stdout, but will be slow; False means black stdout verbose, much faster. Example usage: .. code:: python bams = ["test1.bam", "test1.bam"] rmduplicate(bamInput=bams)