Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.

Slides:



Advertisements
Similar presentations
Applications of HGP Genetic testing Forensics. Testing for a pathogenic mutation in a certain gene in an individual that indicate a persons risk of developing.
Advertisements

Considerations for Analyzing Targeted NGS Data HLA
Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.
Development of a BRCA2 screening service – Introduction of high resolution MELT analysis A Grade Trainee Project Nick Camm Yorkshire Regional Genetics.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6.
Gene Mutations. Target #17- I can describe a gene mutation Gene mutation: a permanent heritable change in the sequence of bases in DNA – Effect can cause.
Estimating the penetrances of breast and ovarian cancer in the carriers of BRCA1/2 mutations Silvano Presciuttini University of Pisa, Italy.
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Detection of Hereditary Breast Cancer
DNA marker analysis Mrs. Stewart Medical Interventions Central Magnet School.
The Challenges Of Sequencing FFPE DNA Using NGS
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
BRCA Mutations and Breast Cancer Ruth Phillips and Patty Ashby.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
The Case of Myriad Genetics (Vs. an array of National Government Funded European Union Research Institutes) Amir Zaher UC Berkeley, Senior Department of.
NGS Analysis Using Galaxy
The Cancer Pedigree BRCA What?. Outline Introduction: Understanding the weight of genetics in Ovarian Breast Cancer BRCA 1 and BRCA 2 Genes – Function.
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Whole Exome Sequencing for Variant Discovery and Prioritisation
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Cancer data and efficient sequencing Ruchik S. Yajnik.
How to Build a Horse Megan Smedinghoff.
Section 4 Lesson 1– The Human Genome Project. Applications of DNA Technology Advances in gene manipulation have made many things possible. This section.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Mutation Calling IGV Exercises. Run IGV – Web search IGV (Integrative Genomics Viewer) – Go to Download page – may need to provide – Launch with.
Construction of Substitution Matrices
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation.
Construction of Substitution matrices
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
DNA marker analysis Mrs. Stewart Medical Interventions Central Magnet School.
SNP Scores. Overall Score Coverage Score * 4 optional scores ▫Read Balance Score  = 1 if reads are balanced in each direction ▫Allele Balance Score 
Breast Cancer Ten percent of breast cancer is hereditary. Or 23,000 women a year with a genetic basis for their cancer. The most common mutations in this.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
WHAT IS THE IMPACT OF THE HUMAN GENOME PROJECT FOR DRUG DEVELOPMENT? Arman & Fin.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
From Reads to Results Exome-seq analysis at CCBR
Hereditary Cancer Predisposition: Updates in Genetic Testing
Interpreting exomes and genomes: a beginner’s guide
Cancer Genomics Core Lab
Disease risk prediction
EMC Galaxy Course November 24-25, 2014
Pairwise and NGS read alignment
Jin Zhang, Jiayin Wang and Yufeng Wu
2nd (Next) Generation Sequencing
Geneomics and Database Mining and Genetic Mapping
BIOLOGY 12 Cancer.
Canadian Bioinformatics Workshops
Specific Tumor Suppressor Genes
Presentation transcript:

Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO

Introduction  BRCA 1 and 2 are best known as 'cancer susceptibility' genes  Actually the proteins repair damage in DNA  Large number of known deleterious mutations  Disproportionate number of indels

History  Mary-Claire King discovered BRCA1 and BRCA2, published the function  Myriad Genetics won the patent

Distribution of known BRCA1 deletions >3 bp Indel size (nt)

Dominuque Stoppa Lyonnet at Curie Institute „Large scale deletions could account for as many as one-third of all BRCA1 mutations in some populations”

BRCA are tumor suppressor genes. 82% lifetime chance of developing breast/ovarian cancer. Science 2004, 306: >1,500 deleterious BRCA mutations 17 kbp coding region with mutation rate of 1/2000 NGS-based BRCA screening Leeds UK, Newgene UK, Ghent Belgium DIY genetic test published by Salzberg

82% chance of cancer >90% chance of being false positive/ negative

What kind of NGS data?  False negatives must be avoided  Precision of both sequencing data and the data analysis is key  Looking for indels – indel detection abilities are a key criterion  Repeats are also an issue in BRCA region

BRCA Repeats

Homopolymer Errors Homopolymer errors look like small indels and can cause noise Problem for:Roche 454 Ion Torrent

Long Reads Read length is a limiting factor for insertion detection. When searching for indels, long reads can help. Long reads can also help with repeats. Roche 454 have the longest reads.

Real examples with Roche 454 data

Paired Reads  Paired reads can also help to increase effective 'read length'  Illumina MiSeq now has 2x250bp protocol

 Compare 9 open source and commercial NGS analysis softwares  In silico test with mutated reference BRCA gene  2211 known BRCA variants 1341 SNOs, 320 insertions and 551 deletions  Full GATK pipeline used for variant call, including quality recalibration and indel realignment

BWA Overall Sensitivity: 99.2% Paired End 94.5% Single End SNPs found: 99.5% PE 99.5% SE Deletions found: 98.5% PE 85.5% SE Insertions found: 99.4% PE 89.4% SE

BWA False Negatives: 17 Paired End 121 Single End False Positives: 23 PE 168 SE The longest (60bp+) deletions were not found, either with PE or SE data

Indel sizes - BWA Single End

Indel sizes - BWA Paired End

Other Tools  Most other alignment tools showed a similar trend – much better results overall with Paired data  Only two of the tools tested found the longest deletions, even with Paired data

Paired Reads - Conclusions  Much better for reliable variant detection than equivalent length single reads  Provided much better coverage in the BRCA region (spanning small repeats) If available, paired reads should be preferred

Indel Detection - Conclusions  Not all tools are good at finding indels.  Burrows Wheeler based aligners can't find indels beyond a few base pairs in single reads, but can make better use of paired data – if indel realignment is also used.  They still can't detect the longest indels (there is just a gap in coverage). If indel detection is required, an indel sensitive tool should be used

Overall - Conclusions  None of the alignment tools found all the variants  It will almost certainly require the same data to be analyzed with more than one tool, to get sufficiently accurate results