Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.

Similar presentations


Presentation on theme: "Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College."— Presentation transcript:

1 Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher sora15@yuhs.ac Yonsei Biomedical Science Institute Yonsei University College of Medicine

2 2/12 Today’s paper PhD. Ken Chen – Assistant Professor, Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX – Dr. Chen has designed, developed, and co-developed a set of computational tools such as BreakDancer, TIGRA, CREST, BreakTrans, BreakFusion, PolyScan, SomaticSniper, and VarScan

3 3/12 Conceptual Overview

4 4/12 Structural Variation Hurles ME, Trends Genet(2008) 24: 238–245

5 5/12 Structural Variation Hurles ME, Trends Genet(2008) 24: 238–245

6 6/12 Structural variation sequence signatures Can Alkan, Nature Reviews Genetics (2011) 12, 363-376

7 7/12 Structural variation sequence signatures Can Alkan, Nature Reviews Genetics (2011) 12, 363-376

8 8/12 BreakDancer Overview

9 9/12 BreakDancer BreakDancer consists of two complementary algorithms –BreakDancerMax provides genome-wide detection of five types of structural variants –deletions, insertions, inversions and intra/inter-chromosomal translocations –BreakDancerMini focuses on detecting small indels (typically 10-100 bp) that are not routinely detected by BreakDancerMax In a family- or a population-based study, pooling enhanced the detection of common variants. In a tumor and normal sample paired study, it improved the specificity of somatic variant prediction through effective elimination of inherited variants.

10 10/12 BreakDancerMax 1.BreakDancerMax starts with the map files produced by MAQ. 2.Read pairs mapped to a reference genome with sufficient mapping quality are independently classified into six types: normal, deletion, insertion, inversion, intrachromosomal translocation and interchromosomal translocation. This classification process is based on a.the separation distance and alignment orientation between the paired reads b.the user-specified threshold c.the empirical insert size distribution estimated from the alignment of each library contributing genome coverage

11 11/12 BreakDancerMax 3.The algorithm then searches for genomic regions that anchor substantially more anomalous read pairs (ARPs) than expected on average. 4.A putative structural variant is derived from the identification of one or more regions that are interconnected by at least two ARPs. 5.The confidence score is estimated for each variant based on a Poisson model that takes into consideration the number of supporting ARPs, the size of the anchoring regions and the coverage of the genome.

12 12/12 Confidence score estimation The accuracy of the score depends on many factors. –whether the set of reads is an unbiased sampling of the genome and all alleles –whether the reads are mapped to correct locations –whether the amount of observed evidence is sufficient One of the primary signals for the presence of a structural variant is the clustering of ARPs. –it was important to measure the degree of clustering from the perspective of both depth and breadth

13 13/12 Confidence score estimation assumed that under the null hypothesis of no variant, the genomic location of one particular type of insert was uniformly distributed. For studies that define more than one insert type, the number of inserts at a particular location forms a mixture Poisson distribution with each mixture component representing one of the insert types.

14 14/12 Confidence score estimation

15 15/12 Confidence score estimation

16 16/12 BreakDancerMini 1.BreakDancerMini analyzes the normally mapped read pairs that were ignored by BreakDancerMax. 2.A genomic region of size equivalent to the mean insert size is classified as either normal or anomalous based on a sliding window test that examined the difference of the separation distances between read pairs that are mapped within the window versus those in the entire genome. 3.A confidence score is assigned based on the significance value of the sliding window test.

17 17/12 The sliding window test

18 18/12 Variant calling based on local assembly A local assembly of the breakpoints within a suspected variant region can confirm the existence of the structural variant, precisely define the breakpoint locations and determine any inserted sequences that may be present; MAQ, Velvet, Phrap If the derived contig sequences cumulatively covered over 75% of the region from which the reads were extracted, we aligned the contigs to a region of the human reference sequence containing the structural variant and 1,000 bp of flanking sequence on either side using cross-match. A variant was called if there is a gap or if the tumor and the normal contigs contain consistent breakpoint.

19 19/12 SV Detection Breakdancer Bam2cfg – Computes the insert size distribution and generate the Breakdancer configuration file – Command /BIO/app/breakdancer-1.1.2/perl/bam2cfg.pl –c 4 –q 35 –h /BIO/ewha/SAMPLES/NA12878.chrom22.bam > NA12878.chrom22.cfg –c : Cut off in unit of standard deviation –q : Minimum mapping quality –h : Plot insert size histogram for each BAM library

20 20/12 readgroup:ERR001719 platform:ILLUMINA map:/BIO/ewha/SAMPLES/NA12878.chrom22.bam readlen:36.00 lib:g1k-sc-NA12878-CEU-1num:10001 lower:110.35 upper:173.67 mean:140.02 std:10.52 SWnormality:-17.69 exe:samtools view SV Detection Breakdancer Bam2cfg – Output Upper = mean + std * c Histogram should not be a bimodal Std / mean < 0.3

21 21/12 SV Detection Breakdancer-max – Calls SVs by detecting cluster of reads that shows an abnormal insert size length or orientations – Command /BIO/app/breakdancer-1.1.2/cpp/breakdancer-max –c 4 –q 35 –r 2 NA12878.chrom22.cfg > NA12878.chrom22.out –c : Cut off in unit of standard deviation –q : Minimum mapping quality –r : minimum number of read pairs required to establish a connection

22 22/12 SV Detection Breakdancer-max – Output 22 51119695 3+0- 22 51121322 0+3- DEL 1481 74 3 NA12878.chrom22.bam|3 0.16 1. Chromosome 1 2. Position 1 3. Orientation 1 4. Chromosome 2 5. Position 2 6. Orientation 2 7. Type of a SV 8. Size of a SV 9. Confidence Score 10. Total number of supporting read pairs 11. Total number of supporting read pairs from each bam/library 12. Estimated allele frequency DEL (deletions) INS (insertion) INV (inversion) ITX (intra-chromosomal translocation) CTX (inter-chromosomal translocation)

23 23/12 Discussion It may be beneficial to incorporate the mapping quality rather than applying a fixed threshold. There is evidence suggesting that integrating read depth may help improve segmentation and genotyping, although an effective integration method is yet to be discovered. Some types of structural variants, such as inversions and translocations, appeared to be more difficult to detect and validate. Many putative predictions overlapped with regions of tandem or inverted repeat and required further sequence analysis and filtering or the use of additional longer reads and longer inserts. The BreakDancerMini code will not be included in the coming releases. Recommend using Pindel to detect intermediate size indels (10-80 bp).


Download ppt "Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College."

Similar presentations


Ads by Google