Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.

Slides:



Advertisements
Similar presentations
Next-Generation Sequencing: Methodology and Application
Advertisements

Recombinant DNA technology
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Next Generation Sequencing, Assembly, and Alignment Methods
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Greg Phillips Veterinary Microbiology
Bioinformatics and Phylogenetic Analysis
CSE182-L12 Gene Finding.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
CS 6293 Advanced Topics: Current Bioinformatics
Sequencing a genome and Basic Sequence Alignment
De-novo Assembly Day 4.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
Mouse Genome Sequencing
CS 394C March 19, 2012 Tandy Warnow.
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case.
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Sequencing a genome and Basic Sequence Alignment
The iPlant Collaborative
Chapter 21 Eukaryotic Genome Sequences
Construction of Substitution Matrices
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop January 31, 2012.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Metagenome analysis Natalia Ivanova MGM Workshop February 2, 2012.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
Metagenomic dataset preprocessing – data reduction
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
Assembly S.O.P. Overlap Layout Consensus. Reference Assembly 1.Align reads to a reference sequence 2.??? 3.PROFIT!!!!!
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Virginia Commonwealth University
Metagenomic Species Diversity.
The Integrated Microbial Genome (IMG) systems
The Transcriptional Landscape of the Mammalian Genome
Quality Control & Preprocessing of Metagenomic Data
Research in Computational Molecular Biology , Vol (2008)
Very important to know the difference between the trees!
Workshop on the analysis of microbial sequence data using ARB
Metagenomics Image: Iverson et al. 2012, Science.
Identify D. melanogaster ortholog
What do you with a whole genome sequence?
Victor M. Markowitz, I-Min A. Chen, Ken Chu, Amrita Pati, Natalia N
CSCI 1810 Computational Molecular Biology 2018
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
Presentation transcript:

Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012

Advancing Science with DNA Sequence Metagenome is a collective genome of microbial community, AKA microbiome (native, enriched, sorted, etc.). Metagenomic library (or libraries) is constructed from isolated DNA (native, enriched, etc.). Metagenomic library can be single-end (AKA standard) or paired-end Metagenome definitions

Advancing Science with DNA Sequence Single-end (standard) metagenomic library will produce contigs upon assembly (i. e. longer sequences based on overlap between reads) Any Ns found in contigs correspond to low quality bases Paired-end metagenomic library will produce scaffolds upon assembly (non-contigous joining of reads based on read pair information) Ns found in scaffolds correspond either to low quality bases or to gaps of unknown size ATGCAAAGGCCGCATCCAGCAGGTT TACGTTTCCGGCGTAGGTCGTCCAA ATGCAAAGGCCGCATCC TACGTTTCCGGCGTAGG AGCAGGTT TCGTCCAA NNNNNN Metagenome definitions

Advancing Science with DNA Sequence Amplified and Unamplified Libraries Fragmentation (1ug) A-tailing with Klenow exo- End repair / Phosphorylation DNA ChipHeat Inactivation Double SPRI Fragmentation (1ug) A-tailing with Klenow exo- Adaptor Ligation End repair / Phosphorylation DNA Chip Double SPRI SPRI Clean PCR 10-cycle Amplification Amplified Library Unamplified Library Adaptor Ligation DNA Chip qPCR Quantification SPRI Clean DNA Chip qPCR Quantification SPRI Clean

Advancing Science with DNA Sequence Unless the community has very low complexity (i. e. dominated by one or a few clonal populations), assembly at 100% nucleotide identity will be very fragmented. What to do with k-mer based assemblies? Use multiple k-mer settings, combine assemblies with an overlap-layout consensus assembler like minimus2 using minimal % identity of 95%. Tradeoff between overlap length and % identity. Metagenome definitions (contd): overlap = alignment of reads at x% identity

Advancing Science with DNA Sequence Reasoning behind combining multiple assemblies

Advancing Science with DNA Sequence Assembly Pipeline v.0.9 Trimming does not appear to be ideal for this process Picking best kmer – manual process CPU time intensive, no known metagenomic Kmer prediction algorithm 7 A snapshot of older (454- Illumina) metagenome assembly pipeline

Advancing Science with DNA Sequence Assembly of sequences at less than 100% identity => population contigs and scaffolds representing a consensus sequence of species population isolate contigspecies population contigs Metagenome definitions (contd): overlap = alignment of reads at x% identity

Advancing Science with DNA Sequence 2 more important definitions 1.Sequence coverage (AKA read depth) How many times each base has been sequenced => needs to be considered when calculated protein family abundance Per-contig average coverage Per-base coverage => per-gene coverage 2. Bins Scaffolds, contigs and unassembled reads can be binned into sets of sequences (bins) that likely originated from the same species population or a population from a broader taxonomic lineages

Advancing Science with DNA Sequence What IMG does and doesn’t do Scaffolds and contigs are generated by assembly – not provided in IMG/M Sequence coverage can be computed by the assembler based on alignments it generates (preferable) or can be added later by aligning reads to contigs – the latter can be provided in IMG/M Bins are generated by binning software – not provided in IMG/M Scaffolds, contigs and unassembled reads are annotated with non-coding RNAs, repeats (CRISPRs), and protein coding genes (CDSs); the latter are assigned to protein families (COGs, Pfams, TIGRfams, KEGG Orthology, EC numbers, internal clusters) – is provided in IMG/M

Advancing Science with DNA Sequence What’s the difference between IMG and MG-RAST, IMG and CAMERA? We prefer to assemble the data longer sequences -> better quality of gene prediction and functional annotation longer sequences -> chromosomal context and binning -> population-level analysis But we don’t provide assembly services except for metagenomes sequenced at the JGI we may be able to help with assembly of 454 we’re not equipped to assemble massive amounts of Illumina data Contact person: Ed Kirton, IMG does not provide tools for analysis of 16S data from the metagenome itself we do assembly -> assembled 16S sequences are generally not very reliable BLASTn of reads matching conserved regions is misleading we do pyrotags or i-tags for every metagenome sequenced at the JGI