Presentation is loading. Please wait.

Presentation is loading. Please wait.

Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.

Similar presentations


Presentation on theme: "Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO."— Presentation transcript:

1 Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO

2  BRCA 1 and 2 are best known as 'cancer susceptibility' genes  Actually the proteins repair damage in DNA  Large number of known deleterious mutations  Disproportionate number of indels Introduction

3  Mary-Claire King discovered BRCA1 and BRCA2, published the function  Myriad Genetics won the patent History

4 Indel size (nt) Distribution of Known BRCA1 Deletions >3 bp

5 Dominuque Stoppa Lyonnet at Curie Institute „Large scale deletions could account for as many as one-third of all BRCA1 mutations in some populations”

6 BRCA are tumor suppressor genes. 82% lifetime chance of developing breast/ovarian cancer Science 2004, 306:2187-2191 >1,500 deleterious BRCA mutations 17 kbp coding region with mutation rate of: 1/2000 NGS-based BRCA screening Leeds UK, Newgene UK, Ghent Belgium DIY genetic test published by Salzberg

7 82% chance of cancer >90% chance of being false positive/ negative

8

9  False negatives must be avoided  Precision of both sequencing data and the data analysis is key  Looking for indels – indel detection abilities are a key criterion  Repeats are also an issue in BRCA region What kind of NGS data?

10 BRCA Repeats

11 Homopolymer errors look like small indels and can cause noise Problem for:Roche 454 Ion Torrent Homopolymer Errors

12  Read length is a limiting factor for insertion detection  When searching for indels, long reads can help. Long reads can also help with repeats  Roche 454 have the longest reads Long Reads

13 Real Examples With Roche 454 Data

14

15  Paired reads can also help to increase effective 'read length'  Illumina MiSeq now has 2x250bp protocol Paired Reads

16

17  Compare 9 open source and commercial NGS analysis softwares  In silico test with mutated reference BRCA gene  2211 known BRCA variants  1341 SNOs, 320 insertions and 551 deletions  Full GATK pipeline used for variant call, including quality recalibration and indel realignment

18

19

20

21 Overall Sensitivity: 99.2% Paired End 94.5% Single End SNPs Found: 99.5% PE 99.5% SE Deletions Found: 98.5% PE 85.5% SE Insertions Found: 99.4% PE 89.4% SE BWA

22 False Negatives: 17 Paired End 121 Single End False Positives: 23 PE 168 SE The longest (60bp+) deletions were not found, either with PE or SE data BWA

23 Indel Sizes – BWA Single End

24 Indel Sizes – BWA Paired End

25  Most other alignment tools showed a similar trend – much better results overall with Paired data  Only two of the tools tested found the longest deletions, even with Paired data Other Tools

26  Much better for reliable variant detection than equivalent length single reads  Provided much better coverage in the BRCA region (spanning small repeats) If available, paired reads should be preferred Paired Reads - Conclusions

27  Not all tools are good at finding indels  Burrows Wheeler based aligners can't find indels beyond a few base pairs in single reads, but can make better use of paired data – if indel realignment is also used  They still can't detect the longest indels (there is just a gap in coverage) If indel detection is required, an indel sensitive tool should be used Indel Detection - Conclusions

28  None of the alignment tools found all the variants  It will almost certainly require the same data to be analyzed with more than one tool, to get sufficiently accurate results Overall - Conclusions

29 Contact Tim Hague, CEO Omixon Biocomputing Solutions Tim.Hague@omixon.com +36 70 318 4878

30


Download ppt "Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO."

Similar presentations


Ads by Google