The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing AGBT 2018: #811 The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing Cassie Schumacher, Ashley Wood, Sukhinder Sandhu, Justin Lenhart, Laurie Kurihara, Vladimir Makarov Swift Biosciences Inc, 58 Parkland Plaza, Suite 100, Ann Arbor, MI 48103 Abstract Panel Specifications Next-generation sequencing of pharmacogenetic and cancer-associated genes is valuable to advance our understanding of disease progression and treatment. However, obstacles including homology to pseudogenes and difficult to sequence motifs arise, making genes like CYP2D6, BRCA1, and BRCA2 challenging to sequence with short-read (SR) chemistry. Long-read (LR) assays overcome these issues, but are restricted due to limited available technologies. Both techniques are suitable for high-throughput sample processing. To enhance the benefits of each platform, we developed targeted, single-tube, multiplexed amplicon assays that allow for overlapping, contiguous coverage. DNA (Coriell Institute) with known CYP2D6 variants was used in the 364-amplicon (~40kB) SR assay, targeting a variety of pharmacogenetic variants, and LR assay (~6.5kB) covering 100% of the CYP2D6 gene and upstream regulatory sequence. SR and LR libraries were sequenced on an Illumina MiniSeq® and PacBio RSII, respectively. Libraries from both assays had >90% on-target and coverage uniformity. 92% of known CYP2D6 variants were detected using the SR assay while all variants were detected with the LR technique. DNA from BC01 (Coriell Institute) was used in the 246-amplicon (~23Kb) SR assay and the 35-amplicon LR assay (~86kB). While both comprehensively cover all exons, the LR assay additionally covers 20 complete introns. Greater than 95% on-target and coverage uniformity was seen in libraries from both SR and LR assays, and the same number of variants were detected. Use of the correct tool is essential to evaluate genetic variation in disease and drug response. SR assays allow for screening of a wide range of targets for immediate and broad use. LR assays enable comprehensive sequencing of entire gene coding regions and neighboring introns, gaining additional insight into variants in difficult-to-sequence regions, large insertion/deletions, and structural variations. A Short-read B Short-read Long-read C D Panel # of Amplicons Amplicon Size (kb) Panel Size (kb) Tiling? ADME - Short 364 0.15 40 Yes CYP2D6- Long 1 6.5 No BRCA1/2 - Short 246 23 BRCA1/2 - Long 36 2.5 90 Figure 3. Integrative Genome Viewer (IGV, Broad Institute) genome-wide view of the 364 pharmacogenetic short-read panel targets (A) IGV view of short- and long-read panel target coverage of the CYP2D6 (B), BRCA1 (C), and BRCA2 (D) genes. Short- and long-read panel specifications (E). Note that the entire exonic regions of both BRCA1 and BRCA2 are covered by both panels, while the entire exonic region of CYP2D6 is covered only by the long-read panel due to high pseudogene homology. Sequencing Metrics CYP2D6 # Aligned Reads % On Target Coverage Uniformity DNA Short Long NA17084 1604933 715 94.1 99.8 91.5 100 NA17221 1786760 535 94.3 99.7 NA17205 1361074 632 93.6 92.2 NA17293 1372737 507 91.9 NA17230 1752193 631 93.3 92.3 NA12244 1728736 538 93.9 92.0 NA17272 1829503 663 93.0 91.8 NA17039 2421000 334 93.1 NA17269 2154029 739 92.9 NA17276 2024712 254 NA17281 1854698 285 93.8 91.2 NA17204 1925657 593 BRCA1/2 # Aligned Reads % On Target Coverage Uniformity DNA Short Long NA14623 2552437 6203 99% 98% 100% NA14624 2763174 6061 NA14626 2495232 6497 97% NA13705 2031846 2416 NA13715 1777741 2070 NA14090 1488832 2449 96% NA14094 1582618 2356 NA14638 1484737 2526 86% NA14634 2748198 2686 NA14636 2683304 3128 NA14637 2389220 3010 NA14170 1970282 3381 Figure 4.Tables list the number of aligned reads, percent of reads on target, and the coverage uniformity in both assay types. All DNA samples were obtained from the Coriell Institute. All short-read samples were sequenced on an Illumina MiniSeq. CYP2D6 long read samples were sequenced on a PacBio RS II, while BRCA1/2 long-read samples were sequenced on a PacBio Sequel™. Aligned reads for all long-read samples are derived from CCS read numbers. Dual Platform Approach Short-Read Workflow Long-Read Workflow PCR 1 PCR 2 Ligation Variant Analysis Using Short- and Long-Read Technology CYP2D6 DNA Haplotype Variant Expected AF Observed AF Short-Read Long-Read NA17084 *1/*10 P34S 0.5 0.68 0.51 S486T 0.52 0.57 R296C 0.69 0.47 NA17221 *1XN/*2 0.65 0.34 0.61 0.39 NA17205 *1/*41 0.45 0.53 0.49 0.55 NA17293 *2/*9 K281del 0.56 NA17230 *4/*41 0.54 L91M 0.48 H94R 0.23 0.4 1 NA12244 *35/*41 V11M 0.50 NA17272 *4/*10 1.00 0.96 NA17039 *2/*17 0.99 T107I NA17269 *2/*41 NA17276 *2/*5 NA17281 *5/*9 0.98 NA17204 *1/*35 0.46 BRCA1/2 DNA Gene Variant Expected AF Observed AF Short-Read Long-Read NA14623 BRCA2 TYR42CYS 0.50 0.53 0.47 NA14624 5946delCT 0.52 0.49 NA14626 LYS3326TER 0.54 NA13705 BRCA1 4-BP DEL, FS1252TER 0.31 0.45 NA13715 1-BP INS, 5382C 0.43 NA14090 2-BP DEL, 185AG 0.46 NA14094 40-BP DEL, FS397TER 0.44 0.56 NA14638 IVS5-11T>G NA14634 4-BP DEL, FS1364TER 0.51 NA14636 5677insA NA14637 ARG1443TER 0.48 NA14170 1-BP DEL, 6174T, FS 0.55 Figure 1. The short-read workflow consists of a multiplexed PCR followed by an indexing step to barcode and adapt the PCR products for Illumina sequencing (Left). The long-read workflow includes two PCR steps to amplify the target sequences and a ligation step to barcode and adapt the PCR products for PacBio sequencing. (Right) Figure 5. Known variants for all 12 samples in both CYP2D6 BRCA1/2 genes are detailed. The expected allele frequency (AF) is given, along with the observed AF in both the short-read and long-read assay. All known variants were identified using both techniques. Sequencing Challenges with Pseudogenes Conclusion Accel-Amplicon™ panels from Swift Biosciences can be used on short- and long-read sequencing technologies for variant detection. Short-read amplicon panels are useful tools to interrogate variants of known significance in coding regions and intron/exon boundaries. Long-read amplicon panels are useful for full gene coverage to not only analyze variants in the coding region, but to also probe neighboring introns which can be difficult to target with short-read amplicons due to repetitive regions and low complexity motifs. Long-read technology provides more accurate alignment to identify structural variants and can overcome pseudogene alignment artifacts. Long-read technology enables phasing of the CYP2D6 gene within the 6.5 kb single amplicon. Both short- and long-read amplicon panel workflows can be completed in a single day and have the ability to process samples in a high-throughput manner. Figure 2. Short-read sequencing poses a significant challenge when it comes to genes with known pseudogenes. Even when care has been taken to design primers to specific, unique regions, reads can still align to the pseudogene. Long-read technology can include much more unique content and additional bases, therefore all of the reads align on target. The figures show short-read and long-read targets and aligned reads to CYP2D6 (top) and the CYP2D7 pseudogene (bottom) for NA02016. Targets Reads Short Long Targets Reads Short Long Swift Biosciences, Inc. 58 Parkland Plaza, Suite 100 • Ann Arbor, MI 48103 wwwswiftbiosci.com © 2018, Swift Biosciences, Inc. The Swift logo and Accel-Amplicon are trademarks and Accel-NGS is a registered trademark of Swift Biosciences. This product is for Research Use Only. Not for use in diagnostic procedures. Illumina and MiniSeq are registered trademarks of Illumina, Inc. Sequel is a trademark of Pacific Biosciences. 18-1947, 02/18 www.swiftbiosci.com