A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data Brett Trost, Susan Walker, Zhuozhi Wang, Bhooma Thiruvahindrapuram, Jeffrey R. MacDonald, Wilson W.L. Sung, Sergio L. Pereira, Joe Whitney, Ada J.S. Chan, Giovanna Pellecchia, Miriam S. Reuter, Si Lok, Ryan K.C. Yuen, Christian R. Marshall, Daniele Merico, Stephen W. Scherer The American Journal of Human Genetics Volume 102, Issue 1, Pages 142-155 (January 2018) DOI: 10.1016/j.ajhg.2017.12.007 Copyright © 2017 American Society of Human Genetics Terms and Conditions
Figure 1 Overview of the Three Stages of This Study In stage 1 (“algorithm selection”), three WGS datasets and corresponding CNV benchmarks (HuRef,3,4,28 NA12878,34 and AK135) were used to assess the accuracy of six read depth-based CNV-detection algorithms—Canvas, cn.MOPS, CNVnator, ERDS, Genome STRiP, and RDXplorer. In stage 2 (“workflow development”), other factors influencing CNV detection were evaluated in the context of the most accurate algorithms identified in stage 1. Based on results from the first two stages, we propose a comprehensive workflow for detecting CNVs from short-read WGS data. In stage 3 (“workflow evaluation”), we show that our workflow can accurately identify clinically relevant CNVs. Green parallelograms represent data, and gray rectangles represent actions. The blue shape represents the CNV detection workflow developed from the results of the first two stages. The American Journal of Human Genetics 2018 102, 142-155DOI: (10.1016/j.ajhg.2017.12.007) Copyright © 2017 American Society of Human Genetics Terms and Conditions
Figure 2 Overlap in the CNVs Detected by the Six Algorithms The bottom-left bar chart shows the number of CNVs identified by each algorithm. The remainder shows the number of CNVs detected by various intersections of the algorithms; for instance, the far-left bar for deletions represents the number of CNVs detected by RDXplorer only, while the far-right bar represents deletions detected by Canvas, cn.MOPS, CNVnator, and RDXplorer but not ERDS or Genome STRiP. Due to the log scale, zero-height bars represent a count of 1. The American Journal of Human Genetics 2018 102, 142-155DOI: (10.1016/j.ajhg.2017.12.007) Copyright © 2017 American Society of Human Genetics Terms and Conditions
Figure 3 Recommended Workflow for Use of Read Depth-Based Algorithms for Detecting Germline CNVs from Short-Read WGS Data The green and blue shapes represent the beginning and end of the workflow, respectively. Red rectangles represent quality-control steps, and other actions are colored in gray. Yellow diamonds represent decision points. For maximum stringency, the action “Remove CNVs with ≥70% overlap with RLCRs” may be performed using the full RLCR definition, including RepeatMasker (as in the algorithm selection and workflow development sections). For increased sensitivity, such as when examining rare, genic CNVs, it may be performed using the RLCR definition that omits RepeatMasker, as was done in the workflow evaluation section. The American Journal of Human Genetics 2018 102, 142-155DOI: (10.1016/j.ajhg.2017.12.007) Copyright © 2017 American Society of Human Genetics Terms and Conditions