Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University.

Slides:



Advertisements
Similar presentations
Supplementary Figure S1 (A) Change of reporter activity levels after actinomycin D treatment. HEK293T cells were transiently transfected with the reporter.
Advertisements

MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Protein Synthesis (making proteins)
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
Figure S1. Sequence alignment of yeast and horse cyt-c (Identity~60%), green highly conserved residues. There are 40 amino acid differences in the primary.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Opera: Reconstructing optimal genomic scaffolds with high- throughput paired-end sequences Song Gao, Niranjan Nagarajan, Wing-Kin Sung National University.
KMERSTREAM Streaming algorithms for k-mer abundance estimation Páll joint work with Bjarni V. Halldórsson.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Dr. rer. nat. Diego Mauricio.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
A.B. C. orf60(pOrf60) 042orf orf60(pOrf60-M5 ) orf60(pOrf60-M1) orf60(pOrf60-M4) 042orf60 042orf60(pOrf60-M5) orf60(pOrf60) 042orf60(pOrf60-M1)
Fuzzypath – Algorithms, Applications and Future Developments
CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
Chemokines A family of small proteins secreted by cells to control migration of nearby cells (e.g. during tissue development or immune response) At least.
Whole Genome Assembly with iPlant
Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Topic: Replication of DNA Standard: Explain the role of DNA in storing and transmitting cellular information.
DNA, RNA and Protein.
PERMUTATIONS AND COMBINATIONS
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Adapted from Rayan Chikhi
CAP5510 – Bioinformatics Sequence Assembly
Denovo genome assembly of Moniliophthora roreri
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Modelling Proteomes.
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
DNA and RNA.
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
Fundamentals of Protein Structure
Structure of the 5′ Portion of the Human Plakoglobin Gene
Assembly of Solexa tomato reads
Roye Rozov Shamir group meeting 3/7/13
Presentation transcript:

Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University of California, San Diego 2. Wayne State University, Michigan * Contributed equally to this work

≈ $ billions ≈ several years ≈ hundreds of people ≈ $ thousands ≈ several weeks ≈ two people 2

High Throughput Sequencing Assemblies 3

4 Sample Preparation Sequencing Assembly Analysis, Analysis Analysis, Analysis Fragments Reads Contigs Draft Genome from HTS

5 Sample Preparation Sequencing Analysis, Analysis Analysis, Analysis Fragments Reads Contigs Assembly HTS assemblies (contigs) still contain an abundance of error: subst. errors per 100kbp with SOAPdenovo subst. errors per 100kbp with Velvet. Small (<50 bp) INDEL errors. Misassemblies, large INDELs, etc.

6 Sample Preparation Sequencing Analysis, Analysis Analysis, Analysis Fragments Reads Contigs Assembly Errors in the assembled contigs will profoundly affect any downstream analysis.

7 Sample Preparation Sequencing Analysis, Analysis Analysis, Analysis Fragments Reads Contigs Assembly SEQuel Refined Contigs

De Bruijn Graph for Fragment Assembly

De Bruijn Graph GCC CCA CAT ATT TTA GCC CCT CTTCTT CTTCTT TTT TTA CCT CTA TAT ATT (Pevzner, Tang, Waterman 2001) 9

De Bruijn Graph GCC CCA CAT ATT TTA GCCCCT CTTCTTCTTCTT TTT TTA CCT CTA TAT ATT (Pevzner, Tang, Waterman 2001) 10

De Bruijn Graph GCC CAT ATT TTA GCC CCT CTTCTTCTTCTT TTT TTA CCT CTA TAT ATT CCA (Pevzner, Tang, Waterman 2001) 11

De Bruijn Graph GCC CAT ATT TTA GCC CTTCTTCTTCTT TTT TTA CTA TAT ATT CCA CCT (Pevzner, Tang, Waterman 2001) 12

De Bruijn Graph GCC CAT ATT TTA CTTCTTCTTCTT TTT TTA CTA TAT ATT CCA CCT (Pevzner, Tang, Waterman 2001) 13

De Bruijn Graph 14

Challenges

GCC CCT CTA TAG AGGGGA GAC CAC ACT CTT TTG TGGGGC GCA GCCTAGGAC CACTTGGCA GCCTAGGAC CACTTGGCA 16

17 Sequencing errors cause bulges in the de Bruijn graph GCC CCT CTA TAG AGGGGA GAC CAC ACT CTT TTG TGGGGC GCA GCCTAGGAC CACTTGGCA GCCTAGGAC GCCTTGGAC CACTTGGCA CCTT TGGA CTTG TTGA

18 Sequencing errors cause bulges in the de Bruijn graph GCC CCT CTATAGAGG GGA GAC CAC ACT CTT TTGTTG TGG GGC GCA GCCTAGGAC CACTTGGCA GCCTAGGAC GCCTTGGAC CACTTGGCA

19 Sequencing errors cause bulges in the de Bruijn graph GCC CCT GGA GAC CAC ACT CTT TTGTTG TGG GGC GCA GCCTAGGAC CACTTGGCA GCCTAGGAC GCCTTGGAC CACTTGGCA CACTTGGCA GCCTTGGAC......

The SEQuel Algorithm

21 Sample Preparation Sequencing Analysis, Analysis Analysis, Analysis Fragments Reads Contigs Assembly SEQuel Refined Contigs

Permissively aligned read-pair: a read-pair for which at least one read aligned uniquely The SEQuel Algorithm 22

Positional De Bruijn Graph 23

Positional De Bruijn Graph GCC,111 CCA,112 CAT,113 ATT,114 TTA,115 CCT,112 CTT,113 TTT,114 TTA,115 GCC,975 CCT,976 CTA,977 TAT,978 ATT,979 Positional k-mer: a pair (k-mer, position), e.g. (GCCA, 111). 24

Positional De Bruijn Graph GCC,111 CCA,112 CAT,113 ATT,114 TTA,115CCT,112 CTT,113 TTT,114 TTA,115 GCC,975 CCT,976 CTA,977 TAT,978 ATT,979CCA,112ATT,114 CAT,113 ATT,979 25

Positional De Bruijn Graph

partial contig #1: GCCATTA partial contig #2: GCCTATT The SEQuel Algorithm 27 GTATTCCGAGGACCACTGGATTATGA Original contig

28 The SEQuel Algorithm GTATTCCGAGGACCACTGGATTATGA

29 GTATTCCGAGGACCAC---TGGATTATGA CAAATGGATTACGA GCGGGCCGAGGA The SEQuel Algorithm

30 GTATTCCGAGGACCAC---TGGATTATGA CAAATGGATTACGA GCGGGCCGAGGA The SEQuel Algorithm

31 GCGGGCCGAGGACCAC---TGGATTATGA CAAATGGATTACGA GCGGGCCGAGGA The SEQuel Algorithm

32 GCGGGCCGAGGACCAC---TGGATTATGA CAAATGGATTACGA GCGGGCCGAGGA The SEQuel Algorithm

33 GCGGGCCGAGGACCACAAATGGATTACGA CAAATGGATTACGA GCGGGCCGAGGA The SEQuel Algorithm

34 GCGGGCCGAGGACCACAAATGGATTACGA The SEQuel Algorithm Repeat for all contigs.

35 Results Standard and Single-Cell E. coli. 100 bp paired-end, Illumina (GAII) reads. Mean coverage ≈ 600x. Assemblies compared to reference with & without SEQuel.

Standard E. coli 36

Standard E. coli 37

Single Cell Sequencing Standard Single Cell (Chitsaz et al., 2011) 38

Single Cell E. coli 39

Single Cell E. coli 40

Summary 41 Removed 35% to 96% of small-scale assembly errors. Introduced positional de Bruijn graph for contig refinement. Demonstrated utility in hard (single-cell) assembly. SEQuel can be used in combination with any assembler. Freely available at:

3P41RR S1 Acknowledgments CCF