Presentation is loading. Please wait.

Presentation is loading. Please wait.

The BreakSeq Project Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library Mark Gerstein.

Similar presentations


Presentation on theme: "The BreakSeq Project Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library Mark Gerstein."— Presentation transcript:

1 The BreakSeq Project Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library Mark Gerstein

2 Overview Introduction The BreakSeq Analysis The BreakSeq Pipeline
SV, event type, and formation mechanism The BreakSeq Analysis Analysis of SVs using a breakpoint library The BreakSeq Pipeline The SV Annotation and Identification Pipeline [Lam et al. Nat. Biotech. ('10)]

3 SV Event Type Deletion Event Insertion Event Reference Query Reference
Breakpoint Reference Query Insertion

4 SV formation mechanism
Non-Allelic Homologous Recombination (NAHR) Non-homologous Recombination(NHR) Non-homologous end joining (NHEJ) Fork Stalling and Template Switching (FoSTeS) Transposable Element Insertion (TEI) Variable Number of Tandem Repeats (VNTR)

5 Some Issues Limited resolution of recent SV surveys (e.g., microarray based) Prevented from intersecting with exons of genes or analyzing gene fusion events. Prevented systematic deduction of the SV formation process. Prevented from inferring the ancestral states of the SV events. Prevented estimation of the physical properties of the SVs.

6 THE BREAKSEQ ANALYSIS Analysis of SVs using a breakpoint library
Lam HY, Mu XJ, Stütz AM, Tanzer A, Cayting PD, Snyder M, Kim PM, Korbel JO, Gerstein MB. “Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library”. Nature Biotechnology 2010 Jan;28(1):47-55.

7 SV Breakpoint Library [Lam et al. Nat. Biotech. ('10)]

8 SV Junction and Identification
[Lam et al. Nat. Biotech. ('10)]

9 Mechanism Classification
NAHR Deletion Highly similar with minor offset Deletion Single RETRO Repeat Element Multiple RETRO RE1 RE2 [Lam et al. Nat. Biotech. ('10)]

10 SV Mechanism Classification
[Lam et al. Nat. Biotech. ('10)]

11 Sensitivity analysis of the classification pipeline
[Lam et al. Nat. Biotech. ('10)] x-axis is the parameter space. y-axis is the number of SVs of different formation mechanisms classified by the pipeline using corresponding value of the varied parameter and default values of other parameters. Dotted vertical lines indicate the default parameters.

12 SV Formation Analysis [Lam et al. Nat. Biotech. ('10)]

13 Formation mechanisms of SVs identified in the 1000 genomes project: split reads
(MTEI + STEI) 16128 Yale SR from Zhengdong Zhang, NA12878, Aug 2009 version, >=200bp 4285 Yale SR from Zhengdong Zhang, NA12878, Aug 2009 version, >=1kb

14 Active L1 Transposition
431 fully rectifiables overlapped with 147 Active L1s by Mills et al consolidated from Brouha et al and Mills et al. 2006 Chr Source Event Start End Size Mech Active L1 Supported chr1 Korbel Insertion 6703 Mech "MTEI"; Rectified "2:2:2" chr1: ['L1HS', 'Ta-1d'] 2 6052 Mech "STEI"; Rectified "2:2:2" chr1: ['L1HS', 'Ta-0'] 3 chr10 6048 Mech "UNSURE"; Rectified "2:2:2" chr10: ['L1HS', 'Ta-1dn(g)'] 1 chr11 6065 chr11: ['L1HS', 'Ta-1d'] Deletion 9443 Mech "NAHR"; Rectified "1:1:1" chr11: ['L1HS', 'Ta-1d'] Venter 6060 Watson 6051 chr11: ['L1HS', 'Ta-1d'] chr15 6208 chr15: ['L1HS', 'Ta-0'] Kim 5654 chr15: ['L1HS', 'L1HS'] chr18 6045 chr18: ['L1HS', 'Pre-Ta (ACG/G)'] chr2 chr2: ['L1HS', 'Ta-1d'] chr20 6064 chr20: ['L1HS', 'Ta-0'] 4 chr4 6042 chr4: ['L1HS', 'L1HS'] chr5 6108 chr5: ['L1HS', 'Ta-0'] chr5: ['L1HS', 'Ta-1d'] 6047 chr5: ['L1HS', 'Ta-1d'] chr6 chr6: ['L1HS', 'Ta-1d'] chr7 chr7: ['L1HS', 'Ta-1d'] chr8 6057 chr8: ['L1HS', 'Ta-1d'] 6012 chr8: ['L1HS', 'Ta-1d'] 5 6101 chr8: ['L1HS', 'L1HS'] chrX 6249 chrX: ['L1HS', 'Ta-1d'] 6083 chrX: ['L1HS', 'Ta-0'] [Lam et al. Nat. Biotech. ('10)]

15 Active L1 Transposition Example

16 Pseudogene Number Variation
431 fully rectifiables overlapped with 13,453 duplicated and processed pseudogenes identified by PseudoPipe based on Ensembl 48 Chr Source Event Start End Size Mech Pgene Type chr10 Kidd Deletion 14241 Mech "NAHR"; Rectified "1:1:1" PSSD chr12 Venter 6639 chr17 255880 Mech "NHR"; Rectified "1:1:1" chr20 33027 chr3 Korbel 7207 chr5 Watson 9927 DUP Insertion 272672 Mech "NAHR"; Rectified "2:2:2" DUP/PSSD chrX 122868

17 Aligning to Ancestral Genome
Deletion Rectification: A C B syn syn syn Chimp Different state Chimp Same state [Lam et al. Nat. Biotech. ('10)]

18 Aligning to Ancestral Genome
Insertion Rectification: A C B syn syn syn Chimp Same state Chimp Different state [Lam et al. Nat. Biotech. ('10)]

19 SV Ancestral State Analysis
[Lam et al. Nat. Biotech. ('10)]

20 Ancestral state analysis reveals balance of insertions and deletions, and biases in formation mechanisms 208 1409 212 419 Insertion Deletion [Lam et al. Nat. Biotech. ('10)]

21 Tracing the origin of recent human insertions
NHR- / RT-based insertions are mostly inter chromosomal NAHR-based insertions involve nearby sequences [Lam et al. Nat. Biotech. ('10)]

22 Relative location of Inserted Sequence
[Lam et al. Nat. Biotech. ('10)]

23 Breakpoint Features Analysis
[Lam et al. Nat. Biotech. ('10)]

24 The SV Annotation and Identification Pipeline
THE BREAKSEQ PIPELINE

25 The Pipeline Workflow [Lam et al. Nat. Biotech. ('10)]
BreakSeq Workflow The BreakSeq Pipeline SV Dataset Sequence Reads Data Conversion The Annotation Pipeline The Identification Pipeline Junction Library Annotating SVs with different features Rapid SV identification for short-read genomes SV Calls Standardized SVs Annotated and Standardized SVs [Lam et al. Nat. Biotech. ('10)]

26 The Pipeline Modules SV Annotation SV Identification
Library Standardization Remove duplicated and out-of-range SVs Mechanism Classification Classify SVs by their formation mechanisms Ancestral State Analysis Rectify SVs’ events based on their ancestral states Features Analysis Calculate physical features Intersect with gene annotation Junction Library Generation Generate an SV junction library Junction Alignment Align junctions to short sequencing reads Alignment Filtering Filter out SVs with alignment mapped to their ref alleles SV Calling Score the SVs with alignment only to their alt alleles [Lam et al. Nat. Biotech. ('10)]

27 BreakSeq enables detecting SVs in Next-Gen Sequencing data based on breakpoint junctions
Leveraging read data to identify previously known SVs (“Break-Seq”) Library of SV breakpoint junctions Map reads onto Detection of insertions Detection of deletions [Lam et al. Nat. Biotech. ('10)]

28 Applying BreakSeq to short-read based personal genomes boosts numbers of bp-level SVs by ~50-fold
Personal genome (ID) Ancestry High support hits (>4 supporting hits) Total hits (incl. low support) NA18507* Yoruba 105 179 YH* East Asian 81 158 NA [1000 Genomes Project, CEU trio] European 113 219 *According to the operational definition we used in our analysis (>1kb events) less than 5 SVs were previously reported in these genomes … [Lam et al. Nat. Biotech. ('10)]

29 PCR validations in NA12891 demonstrate high accuracy of BreakSeq and add 48 validated calls to the CEU trio 48 positive outcomes out of 49 PCRs that were scored in NA12891: 98% PCR validation rate (for low and high-support events) Adrian Stütz 12 amplicons sequenced in NA12891: all breakpoints confirmed [Lam et al. Nat. Biotech. ('10)]

30 Acknowledgement Yale University Stanford U. University of Toronto EMBL
Jasmine Mu Hugo Lam Stanford U. M Snyder University of Toronto Philip Kim EMBL Jan Korbel Adrian Stuetz University of Vienna Andrea Tanzer

31

32 More Information on this Talk
SUBJECT: Assembly DESCRIPTION: Computational Biology Center, IBM T J Watson Research Center, Yorktown Heights, NY, , 11:00-12:00; [I:IBM] (Takes 25' with many questions questions.) MORE DESCRIPTION: Talk works equally well on mac or PC. Paper references in the talk were mostly from Papers.GersteinLab.org. The above topic list can be easily cross-referenced against this website. Each topic abbrev. which is starred is actually a papers “ID” on the site. For instance, the topic pubnet* can be looked up at ) PERMISSIONS: This Presentation is copyright Mark Gerstein, Yale University, Please read permissions statement at . Feel free to use images in the talk with PROPER acknowledgement (via citation to relevant papers or link to gersteinlab.org) PHOTOS & IMAGES. For thoughts on the source and permissions of many of the photos and clipped images in this presentation see . In particular, many of the images have particular EXIF tags, such as kwpotppt , that can be easily queried from flickr, viz: . 32


Download ppt "The BreakSeq Project Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library Mark Gerstein."

Similar presentations


Ads by Google