Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School.
Use of the genomic data o Reconstruction of metabolic properties o Nature’s Microbiome o NGS in Population Genetics.
BLAST Sequence alignment, E-value & Extreme value distribution.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Sequence alignment, E-value & Extreme value distribution
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Metagenomics Binning and Machine Learning
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Metagenomic Analysis Using MEGAN4
Analysis of Microbial Community Structure
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
CS 394C March 19, 2012 Tandy Warnow.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013.
Comp. Genomics Recitation 3 The statistics of database searching.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Introduction to RNAseq
Bioinformatics Lecture to accompany BLAST/ORF finder activity
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
De novo assembly validation
Drinking from a fire hose: analysis of metagenomic data Rachel Mackelprang, Ph.D. Assistant Professor of Biology California State University Northridge.
No reference available
Accessing and visualizing genomics data
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Metagenomic Species Diversity.
Research Paper on BioInformatics
Phylogeny - based on whole genome data
Gene expression from RNA-Seq
Basics of BLAST Basic BLAST Search - What is BLAST?
Metagenomic assembly Cedric Notredame
Sequence based searches:
Research in Computational Molecular Biology , Vol (2008)
Toward Next Generation Biodiversity Research
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
GEP Annotation Workflow
H = -Σpi log2 pi.
Ensembl Genome Repository.
Maximize read usage through mapping strategies
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Genome resolved metagenomics
The power of metagenomic read recruitment
Condor: BLAST Tuesday, Dec 7th, 10:45am
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe

Hi! About me Your own data  all the data  Big Data! Two databases: – NCBI (genomic) – MG-RAST (metagenomic)

What I’ll teach you Recommended ways of getting sequencing data Recommended ways of getting specific sequencing data Methods to download annotations (MG-RAST) A Tour Through MG-RAST – Happy Things – Hard Things

Let’s think… If you were building a database of sequences, what would you want in it? How would you want it organized? How would you make that information available?

What I do with my sequencing data Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors Assembly (site specific reference) – Dependent on sequencing depth – Largest soil datasets, 1 billion reads, 10% assembled Co-occurrence / binning – Abundance or nucleotide based – Sequencing coverage dependent, need “unique” bins – Particularly problematic for short reads

Sentinels in metagenomic data? Presence/Absence? Quantification? Phyla? Strain? Functional level? Strong signals of similarity or differences in both “annotatable” and unknown sequences Information about targets for which we need more references (by other methods) Indexes (alpha diversity, copiotroph/oligotroph, % polymorphism, recA/16S gene count)

What I do with my sequencing data Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors Assembly (site specific reference) – Well developed pipelines and tutorials – Almost too many choices Co-occurrence / binning – Some tools available, still in development largely – Eventually need to link to knowns for validation (or experimental validation)

What has been done – MG-RAST Free Easy Growing in usage by many Not used by the “leading wedge” (at least as more than a blast engine) Free data / annotation storage and maintenance NEON – do you want to build your own? Does MG-RAST suffice?

MG-RAST

Fine tuning is not possible What happens when your sequence matches two known genes equally? What happens if there are multiple taxa related to one function? What if you want more than ordination (e.g., primer evaluation?) What if you want to change your distance matrix for ordination? How do (not) you normalize / rarify your data? What if you want more sensitivity than a BLAT analysis (e.g., amplicons) Computationally-optimized, not biologically – Denitrification, Nitrate reductase, nitrate ammonification all distinct categories in Level 3 – If you want to merge, not trivial – especially in dealing with abundance estimations

Looking forward Development of all-pleasing computational, statistical, visualization tool will be impossible Community demands will more than likely be easy to use and aimed at reducing (perhaps overly simplified) complex data into comparable metrics (TBD) Data management and associated “metadata” may be the larger challenge (in the face of rapidly changing datasets)