Detection of FLT3 Internal Tandem Duplication in Targeted, Short-Read-Length, Next- Generation Sequencing Data David H. Spencer, Haley J. Abel, Christina M. Lockwood, Jacqueline E. Payton, Philippe Szankasi, Todd W. Kelley, Shashikant Kulkarni, John D. Pfeifer, Eric J. Duncavage The Journal of Molecular Diagnostics Volume 15, Issue 1, Pages 81-93 (January 2013) DOI: 10.1016/j.jmoldx.2012.08.001 Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions
Figure 1 Experimental overview. 1: Genomic DNA derived from bone marrow aspirates, fresh-frozen tumor tissue, or formalin-fixed paraffin-embedded tissue was extracted. 2: The extracted genomic DNA was fragmented on a Covaris E210 instrument, producing a mean insert size of 270 bp. 3: Indexed sequencing libraries were then prepared by linker ligation. FLT3 Genomic DNA sequences are indicated in red, blue (corresponding to the ITD sequence), yellow, and brown; non-FLT3 genomic DNA is indicated in black and green; the sequencing library adaptors are indicated in pink and purple. 4: Sequencing libraries were enriched for genes of interest using a custom SureSelect panel (Agilent Technologies) consisting of 27 genes, including all exons of FLT3, and covering 406 kb in total (WUCaMP27). In this example, genomic DNA containing FLT3 sequence (red, blue, and yellow) is captured by specific biotinylated probes (orange). The captured sequence is then separated from nontargeted DNA by the addition of streptavidin-coated paramagnetic beads. 5: Enriched DNA was then eluted from capture beads and subjected to low-cycle amplification using primers targeting the adaptor linkers (linkers not shown). 6: The resulting DNA was then sequenced in multiplex with 20 to 30 cases per lane on a HiSeq 2000 sequencing system (Illumina). 7: Sequence data were then mapped to the reference genome (hg19, NCBI build GRCh37) using both BWA and Novoalign tools. In this example, aligned paired-end sequencing reads are shown in black. 8: FLT3 ITDs were detected from the mapped data using a variety of publicly available tools, including GATK, SAMtools, Pindel, Dindel, SLOPE, and BreakDancer. The results were then compared with conventional PCR-based findings for FLT3 ITD detection. The Journal of Molecular Diagnostics 2013 15, 81-93DOI: (10.1016/j.jmoldx.2012.08.001) Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions
Figure 2 Coverage data and read alignments at the FLT3 locus from NGS data from samples with and without ITD insertion. A: Shown are total median and interquartile range of raw coverage depth values for FLT3 ITD-positive and FLT3 ITD-negative cases (read depth 0–4000), along with depth in one-end-anchored reads (read depth 0–150), in which one read of a paired read did not map. The presence of an insertion results in an excess of one-end-anchored reads in the FLT3 ITD-positive cases. B: A subset of aligned reads in case 1-8 with an FLT3 ITD insertion. Aligned reads are shown in gray; multicolored bars in the reads indicate discrepancies with the reference sequence. Blocks of discrepancies in a subset of the reads are the result of the ITD insertion. The Journal of Molecular Diagnostics 2013 15, 81-93DOI: (10.1016/j.jmoldx.2012.08.001) Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions
Figure 3 Informatics of insertion detection. A: The FLT3 ITD is an insertion of duplicated and nonduplicated sequence that occurs between exons 13 and 14. These insertions (shown in gray) range in size from 15 to ∼300 bp. B: Standard methods for finding insertions, including the Genome Analysis Toolkit (GATK), SAMtools, and Dindel, apply probabilistic models to make insertion calls based on data obtained during the initial read mapping and alignment process. Because of the difficulty associated with aligning short reads, only small insertion events (generally <15% of the total read length) can be identified by this approach (aligned reads are shown in green; unaligned reads, in purple). Such reads generally have sufficient homology in the regions flanking the insertion to permit accurate alignment. Large insertions (>16 bp), including the FLT3 ITD, are too long to be detected by this method. C: Using a paired-end approach, software such as Pindel and de novo alignment can reliably detect larger insertions, including the FLT3 ITD. In this approach, mate-pairs are identified in which one end is mapped, but the other is not. The unmapped mates are then assembled to form contigs with partial homology to the reference sequence, using a pattern-growth algorithm (Pindel) or de novo assembly with a custom script (unpublished data) executing Phrap assembly software. This method allows for the detection of much larger insertions. The Journal of Molecular Diagnostics 2013 15, 81-93DOI: (10.1016/j.jmoldx.2012.08.001) Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions
Figure 4 Example of FLT3 ITD called by NGS but not PCR. In case 1-6, both Pindel and de novo alignment identified a third FLT3 ITD (27 bp) from NGS data (red asterisk), corresponding to an insertion of 5′-TCTCTGAAATCAACGTAGAAGTACTCA-3′. Retrospective analysis of the capillary traces (performed at ARUP Laboratories) confirmed the presence of a small peak (∼27 bp) in replicate PCR-based testing called by PCR and NGS (blue asterisk). The Journal of Molecular Diagnostics 2013 15, 81-93DOI: (10.1016/j.jmoldx.2012.08.001) Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions
Figure 5 Comparison of ITD size and allele fraction between PCR and capillary electrophoresis and NGS methods for a subset of ITD-positive cases. A: Size of the ITD insertion detected from NGS data using Pindel and de novo assembly, compared with size from PCR and capillary electrophoresis. B: The ITD allele fraction calculated from the areas under the mutant and wild-type peaks from PCR and capillary electrophoresis, compared with Pindel (Pearson's correlation coefficient = 0.37; 95% CI = −0.1 to 0.71; P = 0.11) and assembly (Pearson's correlation coefficient = 0.65; 95% CI = 0.23 to 0.87; P = 0.006). The dashed line indicates a perfect fit (y = x), indicating complete agreement between PCR and NGS methods. The Journal of Molecular Diagnostics 2013 15, 81-93DOI: (10.1016/j.jmoldx.2012.08.001) Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology Terms and Conditions