Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
Next Generation Sequencing, Assembly, and Alignment Methods
DNA Sequencing with Longer Reads Byung G. Kim Computer Science Dept. Univ. of Mass. Lowell
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Henrik Lantz - BILS/SciLife/Uppsala University
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
8. DNA Sequencing. Fred Sanger, Cambridge, England Partition copied DNA into four groups Each group has one of four bases starved ACGTAAGCTA with T starved.
1 De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer David Hernandez, Patrice François, Laurent Farinelli,
Genome Assembly Preliminary Results
GENOME SEQUENCING AND ASSEMBLY Mayo/UIUC Summer Course in Computational Biology.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Sequence Assembly Fall 2015 BMI/CS 576 Colin Dewey
Metagenomics Assembly Hubert DENISE
The Changing Face of Sequencing
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
Towards your own genome. Designing your Sequencing Run Sequencing strategy Genome size and genome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Jan Pačes Institute of Molecular Genetics AS CR
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
De Novo Genome Assembly - Introduction
Effective Parallel Multicore-optimized K-mers Counting Algorithm
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
tracking microbes at the strain level
Genome sequencing and annotation Week 2 reading assignment - pages 63-78, 93-98, Boxes 2.1 and don’t worry about details of similarity scoring.
Assembly algorithms for next-generation sequencing data
Sequence Assembly.
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Denovo genome assembly of Moniliophthora roreri
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Assembly.
Research in Computational Molecular Biology , Vol (2008)
Introduction to Genome Assembly
Removing Erroneous Connections
CS 598AGB Genome Assembly Tandy Warnow.
DNA Sequencing By Dan Massa.
Genome Sequencing and Assembly
Roye Rozov Shamir group meeting 3/7/13
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Presentation transcript:

Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1

2 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

3 DNA packaging

4

5 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

6 Next Generation Sequencing TCTTATTGTGACC TAGGCTAGCTTAG GCAATGCAGTAAC TCCAGCTAGGTTC ACGTAGGCTAGCGTTAGCGA CTGCAT C

7 Genome Assembly 1.GENOME SEQUENCING 2.PRELIMINARY ANALYSIS 3.ASSEMBLY 4.ADVANCED BIOINFORMATIC ANALYSIS OVERLAPPING SEQUENCE ALIGMENT

Sequencing the human genome with shotgun sequencing + assembly is the only feasible strategy Computational assembly of shotgun sequencing data is simply unfeasible, and a bad idea anyway Weber, James L., and Eugene W. Myers. "Human whole-genome shotgun sequencing." Genome Research 7.5 (1997): Green, Philip. "Against a whole-genome shotgun.“ Genome Research 7.5 (1997): They were both right! (…well, Weber and Myers were a bit more right from the practical viewpoint…) On the feasibility of sequence assembly

9 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

10 Genome assembly strategies  Greedy approach → SSAKE  De Bruijn graph (DBG) → Velvet, SOAPdenovo  Overlap Consensus Layout (OLC) → MIRA  Mixed approaches → MaSuRCA

11 Genome assembly strategies DE BRUIJN GRAPH APPROACH (DBG)  Velvet, SOAPdenovo2 Nodes = overlapping sequences of reads of uniform length Edges = kmer (unique subsequences within reads) EULERIAN PATH

12 Genome assembly strategies OVERLAP CONSENSUS LAYOUT (OLC)  MIRA Nodes = reads Edges = overlap between reads 1.OVERLAP 2.LAYOUT 3.CONSENSUS HAMILTONIAN PATH

13 Genome assembly strategies

14 Genome assembly strategies DBGOLC ADVANTAGES Very sensitive to repeatsModular algorithmic design Kmer storaged just onceFlexibility and robustness Eulerian cycle Never explicitly computes pairwise computation DISADVANTAGES Sensitive to sequencing errors (new k-mers) Hamiltonian cycle Large computational memory space requirements Overlap stage istime- consuming Genome-size limitations

15  Greedy approach → SSAKE  De Bruijn graph (DBG) → Velvet, SOAPdenovo  Overlap Consensus Layout (OLC) → MIRA  Mixed approaches → MaSuRCA Genome assembly strategies

16 Genome Assemblers Average Coverage Number of Contigs Number of Contigs > 1Kb N50 contig size Fraction of reads assembled Total consensus (in nt) Number of scaffolds N50 scaffolds size Ion Torrent PGM → MIRA 3.9 Illumina → MaSuRCA MIRA 3.9 too produced good quality results, but it has a longer execution time and it becomes unstable with large amount of small reads

17 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

18 Mycobacteria Assembly: Case Study Responsible for many animal and human diseases M. tuberculosis and M. leprae (TM) M. fortuitum (NTM) outbreak (nail salon, 2002) M. chelonae (NTM) outbreak (face lifts, 2004) Illumina HiSeq sequencing (NGS Facility – CIBIO/UNITN) Twenty mycobacterial strains From 20 different Mycobacteria species → MaSuRCA Novel mycobacteria detection clinical tests

19 Fastq-mcf tool poor quality ends of reads Ns, duplicates and sequencing adapters reads that are too short Reduction up to 73% Raw data quality assessment and pre-processing

20 K-mers: strings of a particular length k, which are shorter than entire reads Best empirical k-mer length: 91 bases long Assembly parameters setting High coverage

21 MaSuRCA results of Mycobacteria Abnormal GC content Genome size too high

22 Examples of environmental contaminations GC content based quality analysis Staphylococcus epidermidis

Thanks Photo coming soon