Metafast High-throughput tool for metagenome comparison

Slides:



Advertisements
Similar presentations
RNA-Seq based discovery and reconstruction of unannotated transcripts
Advertisements

Algorithms for Multisample Read Binning
ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
RNA Assembly Using extending method. Wei Xueliang
Next Generation Sequencing, Assembly, and Alignment Methods
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Greg Phillips Veterinary Microbiology
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
De-novo Assembly Day 4.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Independent Component Analysis (ICA) A parallel approach.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Metagenomics Assembly Hubert DENISE
Biodiversity initiative: Integrating Taxonomy, Genomics and Biodiversity ++ = ????? Speaker: Benjamin Linard Alfried Vogler Team.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Spliced Transcripts Alignment & Reconstruction
YOU ARE WHAT YOU EAT (AND DRINK): IDENTIFYING CULTURAL BOUNDARIES BY ANALYZING FOOD AND DRINK HABITS IN FOURSQUARE Presenter: LEUNG Pak Him.
Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Genomics Quick Start Mikhail Dvorkin Vladislav Isenbaev Eugene Kapun Scientific advisors Acad. Konstantin Skryabin, Bioengineering RAS Prof. Anatoly Shalyto,
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
BLAST Sequences queried against the nr or grass databases. GO ANALYSIS Contigs classified based on homology to known plant or fungal genes Next.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.
Metagenomic dataset preprocessing – data reduction
MERmaid: Distributed de novo Assembler Richard Xia, Albert Kim, Jarrod Chapman, Dan Rokhsar.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
June 2013 Metagenomics of Marine Microbes Tutorial overview Daniel Vaulot Station Biologique de Roscoff.
Assembly algorithms for next-generation sequencing data
Detection of genome regulation sequences
Denovo genome assembly of Moniliophthora roreri
Transcriptomics II De novo assembly
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Comparative metagenomics quantifying similarities between environments
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Genomic Data Clustering on FPGAs for Compression
Kallisto: near-optimal RNA seq quantification tool
Introduction to Genome Assembly
Transcriptome Assembly
Removing Erroneous Connections
Distributed Memory Partitioning of High-Throughput Sequencing Datasets for Enabling Parallel Genomics Analyses Nagakishore Jammula, Sriram P. Chockalingam,
CS 598AGB Genome Assembly Tandy Warnow.
Presented by: Yang Yu Spatiotemporal GMM for Background Subtraction with Superpixel Hierarchy Mingliang Chen, Xing Wei, Qingxiong.
The ability of the SOP to sequence and identify unknown samples.
Schematic representation of a transcriptomic evaluation approach.
Comparison of species and function profiles with ultradeep sequencing data. Comparison of species and function profiles with ultradeep sequencing data.
Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.
Overview of Shotgun Sequence Analysis
Toward Accurate and Quantitative Comparative Metagenomics
Batch variation of formulations from two products by two different genomic-scale techniques. Batch variation of formulations from two products by two different.
Presentation transcript:

Metafast High-throughput tool for metagenome comparison V. Ulyantsev, S. Kazakov V. Dubinkina, Tyakht A., Alexeev D.

Comparative metagenomics Interrelationships between metagenomes from different samples different biomes different time points High level of unknown sequences Mapping can limit the amount of data that can be analyzed

crAss De novo cross-assembly of all sequence reads Number of cross-contigs with reads from both metagenomes

MaryGold Detect and explore genomic variation between metagenomic sequencing samples Detect bubble structures in contig graphs using graph decomposition 454 and Illumina data

Challenges Reference-based (mapping sequences) High level of unknown sequences Assembly-based Slow on large datasets Solution: fast but low quality “semi-assembly” for every library

New algorithm – MetaFast Method, based on simple assembly: For each library: Construct de Bruijn graph Extract simple paths, not contigs For all libraries: Construct de Bruijn graph from found paths Extract components Calculate characteristic vectors for libraries.

1. Construct de Bruijn graph Library de Bruijn graph

2. Extract simple paths de Bruijn graph Simple paths

3. Merge paths Paths 1 Paths 2 Paths combined in single de Bruijn graph

4. Extract components Paths de Bruijn graph Found components

Component K-mers set B1 <= size <= B2 Big component? Iterative algorithm for decomposition

5. Construct characteristic vectors Components Library 1 (15, 0, 6) Characteristic vectors Library 2 (0, 7, 8)

Implementation On Java Open source project http://github.com/ulyantsev/metafast

Experiments 157 Chinese gut metagenomes (600 Gb) 93 % – correlation between distance matrices based on mapping to knows references and our vectors About 10 hours cluster time (not full-loaded)

80 libraries is enough

Components-genes correlation

Results & future work MetaFast – new approach and cross-platform tool for comparative metagenomics Promising initial experiments Experiments with simulated data New information about existing metagenomes Algorithm modifications

Thank you for attention! http://github.com/ulyantsev/metafast V. Dubinkina A. Tyakht D. Alexeev V. Ulyantsev S. Kazakov