Discussion on Metagenomic Data for ANGUS Course Adina Howe.

Slides:



Advertisements
Similar presentations
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Advertisements

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
BLAST Sequence alignment, E-value & Extreme value distribution.
Molecular Evolution Revised 29/12/06
Comparative ab initio prediction of gene structures using pair HMMs
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Sequence alignment, E-value & Extreme value distribution
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Metagenomics Binning and Machine Learning
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Metagenomic Analysis Using MEGAN4
Analysis of Microbial Community Structure
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
CS 394C March 19, 2012 Tandy Warnow.
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
Comp. Genomics Recitation 3 The statistics of database searching.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Introduction to RNAseq
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Finding new nirK genes in metagenomic data
De novo assembly validation
Drinking from a fire hose: analysis of metagenomic data Rachel Mackelprang, Ph.D. Assistant Professor of Biology California State University Northridge.
No reference available
Accessing and visualizing genomics data
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Metagenomic Species Diversity.
Research Paper on BioInformatics
Phylogeny - based on whole genome data
Gene expression from RNA-Seq
Metagenomic assembly Cedric Notredame
Sequence based searches:
Research in Computational Molecular Biology , Vol (2008)
Toward Next Generation Biodiversity Research
Metabarcoding Presentation
Maximize read usage through mapping strategies
Independent scientist
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
The power of metagenomic read recruitment
Condor: BLAST Tuesday, Dec 7th, 10:45am
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Discussion on Metagenomic Data for ANGUS Course Adina Howe

What I do with my sequencing data Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors Assembly (site specific reference) – Dependent on sequencing depth – Largest soil datasets, 1 billion reads, 10% assembled Co-occurrence / binning – Abundance or nucleotide based – Sequencing coverage dependent, need “unique” bins – Particularly problematic for short reads

Sentinels in metagenomic data? Presence/Absence? Quantification? Phyla? Strain? Functional level? Strong signals of similarity or differences in both “annotatable” and unknown sequences Information about targets for which we need more references (by other methods) Indexes (alpha diversity, copiotroph/oligotroph, % polymorphism, recA/16S gene count)

What I do with my sequencing data Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors Assembly (site specific reference) – Well developed pipelines and tutorials – Almost too many choices Co-occurrence / binning – Some tools available, still in development largely – Eventually need to link to knowns for validation (or experimental validation)

What has been done – MG-RAST Free Easy Growing in usage by many Not used by the “leading wedge” (at least as more than a blast engine) Free data / annotation storage and maintenance NEON – do you want to build your own? Does MG-RAST suffice?

MG-RAST

Fine tuning is not possible What happens when your sequence matches two known genes equally? What happens if there are multiple taxa related to one function? What if you want more than ordination (e.g., primer evaluation?) What if you want to change your distance matrix for ordination? How do (not) you normalize / rarify your data? What if you want more sensitivity than a BLAT analysis (e.g., amplicons) Computationally-optimized, not biologically – Denitrification, Nitrate reductase, nitrate ammonification all distinct categories in Level 3 – If you want to merge, not trivial – especially in dealing with abundance estimations

Looking forward Development of all-pleasing computational, statistical, visualization tool will be impossible Community demands will more than likely be easy to use and aimed at reducing (perhaps overly simplified) complex data into comparable metrics (TBD) Data management and associated “metadata” may be the larger challenge (in the face of rapidly changing datasets)