Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.

Slides:



Advertisements
Similar presentations
RNAseq.
Advertisements

Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
1000 Genomes SV detection Boston College Chip Stewart 24 November 2008.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
NGS Workshop Variant Calling
Habil Zare Department of Genome Sciences University of Washington
Whole Exome Sequencing for Variant Discovery and Prioritisation
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Todd J. Treangen, Steven L. Salzberg
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Detection of structural variants and copy number alterations in cancer: from computational strategies to the discovery of chromothripsis in neuroblastoma.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Large Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization Devin P. Locke, Richard Segraves, Lucia Carbone,
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon’s Cloud R. Jay Mashl October 20, 2014.
AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction  Load Assembly.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Identification of Copy Number Variants using Genome Graphs
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
SV validation plate #1 Format: 384 amplicons ( two 384-well plates of primers ) Events: 4 different types of SVs: Deletions Insertions Tandem duplications.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Canadian Bioinformatics Workshops
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Results for all features Results for the reduced set of features
SVs and CNVs They are often confused…
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Jin Zhang, Jiayin Wang and Yufeng Wu
Haley J. Abel, Hussam Al-Kateb, Catherine E. Cottrell, Andrew J
2nd (Next) Generation Sequencing
Discovery tools for human genetic variations
Figure 1. Biased distribution of different SV classes
Extra chromosomal Agents Transposable elements
BF528 - Genomic Variation and SNP Analysis
AMOS Assembly Validation and Visualization
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Canadian Bioinformatics Workshops
Presentation transcript:

Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College of Medicine

2/12 Today’s paper PhD. Ken Chen – Assistant Professor, Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX – Dr. Chen has designed, developed, and co-developed a set of computational tools such as BreakDancer, TIGRA, CREST, BreakTrans, BreakFusion, PolyScan, SomaticSniper, and VarScan

3/12 Conceptual Overview

4/12 Structural Variation Hurles ME, Trends Genet(2008) 24: 238–245

5/12 Structural Variation Hurles ME, Trends Genet(2008) 24: 238–245

6/12 Structural variation sequence signatures Can Alkan, Nature Reviews Genetics (2011) 12,

7/12 Structural variation sequence signatures Can Alkan, Nature Reviews Genetics (2011) 12,

8/12 BreakDancer Overview

9/12 BreakDancer BreakDancer consists of two complementary algorithms –BreakDancerMax provides genome-wide detection of five types of structural variants –deletions, insertions, inversions and intra/inter-chromosomal translocations –BreakDancerMini focuses on detecting small indels (typically bp) that are not routinely detected by BreakDancerMax In a family- or a population-based study, pooling enhanced the detection of common variants. In a tumor and normal sample paired study, it improved the specificity of somatic variant prediction through effective elimination of inherited variants.

10/12 BreakDancerMax 1.BreakDancerMax starts with the map files produced by MAQ. 2.Read pairs mapped to a reference genome with sufficient mapping quality are independently classified into six types: normal, deletion, insertion, inversion, intrachromosomal translocation and interchromosomal translocation. This classification process is based on a.the separation distance and alignment orientation between the paired reads b.the user-specified threshold c.the empirical insert size distribution estimated from the alignment of each library contributing genome coverage

11/12 BreakDancerMax 3.The algorithm then searches for genomic regions that anchor substantially more anomalous read pairs (ARPs) than expected on average. 4.A putative structural variant is derived from the identification of one or more regions that are interconnected by at least two ARPs. 5.The confidence score is estimated for each variant based on a Poisson model that takes into consideration the number of supporting ARPs, the size of the anchoring regions and the coverage of the genome.

12/12 Confidence score estimation The accuracy of the score depends on many factors. –whether the set of reads is an unbiased sampling of the genome and all alleles –whether the reads are mapped to correct locations –whether the amount of observed evidence is sufficient One of the primary signals for the presence of a structural variant is the clustering of ARPs. –it was important to measure the degree of clustering from the perspective of both depth and breadth

13/12 Confidence score estimation assumed that under the null hypothesis of no variant, the genomic location of one particular type of insert was uniformly distributed. For studies that define more than one insert type, the number of inserts at a particular location forms a mixture Poisson distribution with each mixture component representing one of the insert types.

14/12 Confidence score estimation

15/12 Confidence score estimation

16/12 BreakDancerMini 1.BreakDancerMini analyzes the normally mapped read pairs that were ignored by BreakDancerMax. 2.A genomic region of size equivalent to the mean insert size is classified as either normal or anomalous based on a sliding window test that examined the difference of the separation distances between read pairs that are mapped within the window versus those in the entire genome. 3.A confidence score is assigned based on the significance value of the sliding window test.

17/12 The sliding window test

18/12 Variant calling based on local assembly A local assembly of the breakpoints within a suspected variant region can confirm the existence of the structural variant, precisely define the breakpoint locations and determine any inserted sequences that may be present; MAQ, Velvet, Phrap If the derived contig sequences cumulatively covered over 75% of the region from which the reads were extracted, we aligned the contigs to a region of the human reference sequence containing the structural variant and 1,000 bp of flanking sequence on either side using cross-match. A variant was called if there is a gap or if the tumor and the normal contigs contain consistent breakpoint.

19/12 SV Detection Breakdancer Bam2cfg – Computes the insert size distribution and generate the Breakdancer configuration file – Command /BIO/app/breakdancer-1.1.2/perl/bam2cfg.pl –c 4 –q 35 –h /BIO/ewha/SAMPLES/NA12878.chrom22.bam > NA12878.chrom22.cfg –c : Cut off in unit of standard deviation –q : Minimum mapping quality –h : Plot insert size histogram for each BAM library

20/12 readgroup:ERR platform:ILLUMINA map:/BIO/ewha/SAMPLES/NA12878.chrom22.bam readlen:36.00 lib:g1k-sc-NA12878-CEU-1num:10001 lower: upper: mean: std:10.52 SWnormality: exe:samtools view SV Detection Breakdancer Bam2cfg – Output Upper = mean + std * c Histogram should not be a bimodal Std / mean < 0.3

21/12 SV Detection Breakdancer-max – Calls SVs by detecting cluster of reads that shows an abnormal insert size length or orientations – Command /BIO/app/breakdancer-1.1.2/cpp/breakdancer-max –c 4 –q 35 –r 2 NA12878.chrom22.cfg > NA12878.chrom22.out –c : Cut off in unit of standard deviation –q : Minimum mapping quality –r : minimum number of read pairs required to establish a connection

22/12 SV Detection Breakdancer-max – Output DEL NA12878.chrom22.bam| Chromosome 1 2. Position 1 3. Orientation 1 4. Chromosome 2 5. Position 2 6. Orientation 2 7. Type of a SV 8. Size of a SV 9. Confidence Score 10. Total number of supporting read pairs 11. Total number of supporting read pairs from each bam/library 12. Estimated allele frequency DEL (deletions) INS (insertion) INV (inversion) ITX (intra-chromosomal translocation) CTX (inter-chromosomal translocation)

23/12 Discussion It may be beneficial to incorporate the mapping quality rather than applying a fixed threshold. There is evidence suggesting that integrating read depth may help improve segmentation and genotyping, although an effective integration method is yet to be discovered. Some types of structural variants, such as inversions and translocations, appeared to be more difficult to detect and validate. Many putative predictions overlapped with regions of tandem or inverted repeat and required further sequence analysis and filtering or the use of additional longer reads and longer inserts. The BreakDancerMini code will not be included in the coming releases. Recommend using Pindel to detect intermediate size indels (10-80 bp).