Metagenomics.

Slides:



Advertisements
Similar presentations
Frontiers of Genetics Chapter 13.
Advertisements

16S sequencing for microbiome studies Nicola Segata and Nick Loman
Section H Cloning Vectors
Tucson High School Biotechnology Course Spring 2010.
Biotechnology Chapter 11.
Metabarcoding 16S RNA targeted sequencing
Determination of host-associated bacterial communities In the rhizospheres of maize, acorn squash, and pinto beans.
A Look into the Process of Marker Development Matt Robinson.
High performance computational analysis of DNA sequences from different environments Rob Edwards Computer Science Biology edwards.sdsu.eduwww.theseed.org.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Biotechnology & Recombinant DNA. What is biotechnology?  Using living microorganisms or cell components to make products Often via genetic engineering.
THE GLOBAL MARINE VIRIOME Rob Edwards Dept. Biology, SDSU Computational Sciences Research Center, SDSU Center for Microbial Sciences, San Diego, Fellowship.
Biotechnology and Recombinant DNA
Central Dogma Information storage in biological molecules DNA RNA Protein transcription translation replication.
VIRUS PROPERTIES Infectious – must be transmissible horizontally Intracellular – require living cells RNA or DNA genome, not both* Most all have protein.
Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?] Rob Edwards San Diego State University.
The Microbiome and Metagenomics
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Metagenomics Binning and Machine Learning
Biotechnology Packet #26 Chapter #9. Introduction Since the 1970’s, humans have been attempted to manipulate and modify genes in a way that was somewhat.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
AP Biology Review Session 4
Molecular Microbial Ecology
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Probes can be designed in an evolutionary hierarchy.
Cottrell, M. T., L. A. Waldner, L. Yu, and D. L. Kirchman Bacterial diversity of metagenomic and PCR libraries from the Delaware River. Environmental.
Diversity of uncultured candidate division SR1 in anaerobic habitats James P. Davis Microbial & Molecular Genetics Oklahoma State University.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Microbial diversity: a super quick intro, I swear Meade Krosby.
Next Generation DNA Sequencing
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Chapter 21 Eukaryotic Genome Sequences
Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of microbes is obviously larger than 6000 “Imagine if our.
Diversity and quantification of candidate division SR1 in various anaerobic environments James P. Davis and Mostafa Elshahed Microbiology and Molecular.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, C. Perla 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino.
15.2, slides with notes to write down
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
The Microbiome and Metagenomics
ORF Calling. Why? Need to know protein sequence Protein sequence is usually what does the work Functional studies Crystallography Proteomics Similarity.
Molecular Biology II Lecture 1 OrR. Restriction Endonuclease (sticky end)
Accurate estimation of microbial communities using 16S tags
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Shotgun sequencing reveals transkingdom alterations in immunodeficiency associated enteropathy Xiaoxi Dong (Oregon State University), Jialu Hu (Oregon.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Copyright © 2010 Pearson Education, Inc. Lectures prepared by Christine L. Case Chapter 9 Biotechnology and Recombinant DNA.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year How much has been sequenced? Environmental sequencing.
Genomics, Metagenomics, And Google Rob Edwards San Diego State University, San Diego, CA Argonne National Laboratory, Argonne, IL
Rob Edwards San Diego State University
ORF Calling.
Biotechnology.
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
The bioinformatics behind
Metagenomics Rob Edwards.
Bioinformatics Outline
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Workshop on the analysis of microbial sequence data using ARB
Metagenomics Image: Iverson et al. 2012, Science.
Genomes and Their Evolution
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
Microbiome studies for microbial disease pathogenesis research
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomics

What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Screening from the environment Random fragments of DNA Clone into a vector Low copy vectors BACs YACs

BACs Science Creative Quarterly

Screening from the environment Random fragments of DNA Clone into a vector Low copy vectors BACs YACs Screen for a phenotype e.g. Diversa patents > 1,000 amylase genes Why did Diversa sequence whale-falls?

Screening from the environment Expression host? Pathway or single gene? Get what you select But remember … A selection is worth a thousand screens

16S sequencing Catalogs the bacteria that are present PCR amplify the 16S gene with standard primers Sequence the primers Compare to known databases

Ribosomes Ribosomes are made of proteins and RNA Prokaryotic ribosome: Large subunit: 50S 5S and 23S rRNA Small subunit: 30S 16S rRNA

30S Thermus aquaticus subunit Blue: protein Orange: rRNA

16S rRNA secondary structure E. coli 16S rRNA secondary structure Highly conserved Base pairs = stems No pairing = loops

16S rRNA secondary structure E. coli 16S rRNA secondary structure V7 V6 V5 V8 V4 V9 V3 V1 Variable regions in the 16S rRNA. Vn – 9 regions forward/rev primers V2

16S Primers 27F – 1492R full length 1,465 base pairs 967F – 1046R V6 region 1380F – 1510R V9 region 1,465 base pairs 79 base pairs 130 base pairs

Variable regions = Variable results! V1-V3 V3-V5 V6-V9

16S databases Greengenes SILVA – ARB VAMPS http://greengenes.lbl.gov/ Gary Andersen, Lawrence Berkeley National Laboratory SILVA – ARB http://www.arb-silva.de/ Frank Oliver Glöckner, MPI, Bremen, Germany VAMPS http://vamps.mbl.edu/ Mitch Sogin, Woods Hole, USA Ribosomal Database Project (RDP) http://rdp.cme.msu.edu/ James Cole, Michigan State University, USA

16S sequencing Cheap Easy Portable PCR bias Variable regions give variable answers Only tells you which organisms are present & abundance Does not explain much of the variance of the data What does 16S sequencing actually tell you?

What does 16S sequencing tell you?

What does 16S sequencing tell you?

What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

16S sequencing is not good for functions

How much of the data? Topography of [fungi and] bacteria on the skin Study = 5,000 taxa 14 skin sites 10 people 3 skin types 5,000 variables The remainder of the variance (85.1%) is explained by a few taxa each Each dimension only adds marginal information Findley et al, Nature 2013 doi: 10.1038/nature12171 They don't explain the meaning of j-q

How much of the data? Nine biomes paper Variance: 1,040,665 reads total (from 45 samples) 30 subsystems 9 biomes 30 variables Fewer of the variables explain more of the data The variables are distinctive for each environment Dinsdale et al., Nature 2008 doi:10.1038/nature06810

Shotgun sequencing (HiSeq) Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

16S sequencing (MiSeq) Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

Shotgun + 16S (HiSeq) Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

There is no 16S for viruses Rohwer and Edwards, 2002. The phage proteomic tree. doi: 10.1128/JB.184.16.4529-4535.2002

Random community genomics 26262626 200 liters water 5-500 g fresh fecal matter Concentrate and purify viruses or bacteria Extract nucleic acids DNA/RNA LASL Epifluorescent Microscopy Sequence

How do you sequence the environment? 27272727 How do you sequence the environment? Extract DNA Soil extraction kit Water extraction kit Create library LASLs fosmids Sequence fragments

Linker-Amplified Shotgun Libraries (LASLs) 28282828 Soil Extraction Kit This method produces high coverage libraries of over 1 million clones from as little as 1 ng DNA David Mead - Breitbart (2002) PNAS

Early Attempts at a Metagenomics Platform http://phage.sdsu.edu/~rob/cgi-bin/remoteblast.cgi Submit BLAST to local and remote databases Local (as fast as possible) NCBI (one search every 3 seconds) Many concurrent searches One search versus 1,000 searches Parse data into tables for Excel Access to taxonomy etc

Human-associated viruses 30303030 Human-associated viruses More bacteria than somatic cells by at least an order of magnitude More phages than bacteria by an order of magnitude Sample the bacteria in the intestine by sampling their phage

Most Viral DNA Sequences in Adult Human Feces are Unknown Phages 31313131 Most Viral DNA Sequences in Adult Human Feces are Unknown Phages Eukaryotic Viruses 6% Known 40% Unknown 60% Phages 94% Breitbart (2003) J. Bacteriol.

Abundance of viruses in twins Reyes et al, Nature 2010

Abundance of viruses in twins Microbial samples in guts don't change very much Reyes et al, Nature 2010

Abundance of viruses in twins Phage samples in guts change a lot Reyes et al, Nature 2010

Abundance of viruses in twins Microbial Phage Reyes et al, Nature 2010

Most Human RNA Viruses are Known 36363636 Other Plant Viruses 9% Unknown 8% Other 26% Pepper Mild Mottle Virus 65% Known 92% Zhang (2006) PLoS Biology

Pepper Mild Mottle Virus (PMMV) 37373737 ssRNA virus; ≈6 kb genome Related to Tobacco Mosaic Virus Infects members of Capsicum family Widely distributed – spread through seeds Fruits are small, malformed, mottled Rod-shaped virions Viral particles in fecal sample TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/virusems/

PMMV is common in Human Feces 38383838 Fecal samples Extract total RNA RT-PCR for PMMV S1 S2 S3 S4 S5 S6 S7 S8 S9 PMMV Add san diego samples San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food 106-109 PMMV copies per gram dry weight of feces

Which Foods Contain PMMV? 39393939 Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS Hong Kong chili sauce Hong Kong green chili Pork noodle red chili Vegetarian chili Indian curry Chicken rice Chinese food

PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater PMMoV was detected in 100% of raw sewage samples collected among the dif states and in most of the treated effluent except for FLK Note dif efficiency in removing PMMoV below detection levels of the assayn (100). Except for FLK, all effluent values were > 10^4. According to these results PMMoV cannot be used as an indicator of untreated Concentrations of PMMoV are >1 million copies per milliliter raw sewage ated wastewater or raw sewage… Rosario et al. AEM (2009) 40

Different PMMV families Lib3 Contig[0064] Lib3 Contig[0064] Lib2 Contig[0070] Lib2 Contig[0070] AB084456.1 AB084456.1 AB062049.1 AB062049.1 I I AB062051.1 AB062051.1 AF103778 AF103778 AY632863.1 AY632863.1 AB119482.1 AB119482.1 AJ429088.1 AJ429088.1 AB062054.1 AB062054.1 CoatProtein CoatProtein AB069853.1 AB069853.1 AB062052.1 AB062052.1 II II Diverse populations Differences between individuals and over time AB000709.2 AB000709.2 M87827.1 M87827.1 AJ429087.1 AJ429087.1 AF525080.1 AF525080.1 Lib2_2217 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[0494] Lib3_Contig[1213] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[0458] Lib2_Contig[1099] Lib2_Contig[1099] Lib3_65 Lib3_65 Lib3_Contig[0273] Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0078] Lib3_Contig[0863] Lib3_Contig[0863] AJ308228.1 AJ308228.1 III III AB062053.1 AB062053.1 Same person 6 months apart AJ429089.1 AJ429089.1 X72587.1 X72587.1 Lib2_1377 Lib2_1377 Library 1 Library 2 Library 3 Lib2_2914 Lib2_2914 Lib1_2299 Lib1_2299 Lib3_928 Lib3_928 Lib2_1656 Lib2_1656 Lib2_2549 Lib2_2549 Lib3_462 Lib3_462 Lib2_492 Lib2_492 Lib3_Contig[0655] Lib3_Contig[0655] Lib2_133 Lib2_133 Lib1_Contig[0253] Lib1_Contig[0253] No colors:GenBank It’s a diverse group of PMMoV strains w/in individual and between individuals Lib1_Contig[0123] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0279] Lib1_Contig[0107 Lib1_Contig[0107 IV IV Lib1_Contig[0052 Lib1_Contig[0052 ] ] ] Lib1_Contig[0004 ] Lib1_Contig[0004 ] ] Lib2_Contig[0995] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_Contig[0657] Lib1_1449 Lib1_1449 Lib1_2211 Lib1_2211 Lib1_Contig[0029] Lib1_Contig[0029] Lib1_1733 Lib1_1733 Lib1_Contig[0076] Lib1_Contig[0076] Lib1_1168 Lib1_1168 Lib1_Contig[0261] Lib1_Contig[0261] Lib1_2361 Lib1_2361 Lib2 1468 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[0031] Lib2 Contig[1202] Lib2 Contig[1202] V V Lib1_Contig[0005] Lib1_Contig[0005] Lib1_Contig[0558] Lib1_Contig[0558] AF103776.1 AF103776.1 AB062050.1 AB062050.1 0.1 0.1 41

Human-fecal borne PMMV can infect plants Spread of infection to Hungarian wax pepper evident within 1 week Infected leaf was positive by RT-PCR for PMMV Animals may serve as vectors for plant viruses Fecal sample Viral concentrate Plant leaf inoculation Total RNA PMMV RT-PCR Leave pic!!! Still infectious Infected leaf Control 42

Koch’s Postulates 43434343 Thesunmachine.net http://www.sweatnspice.com

Random community genomics

Eukaryotic metagenomics ITS sequences Internal transcribed spacer regions http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4113289/ Individual genes Cox1 Exome sequencing Pull out ESTs and sequence

Why Metagenomics? What is there? How many are there? 46464646 Why Metagenomics? What is there? How many are there? What are they doing? Experimental manipulations Diagnostics

Sequencing costs decreasing http://genome.gov/sequencingcosts

How much has been sequenced? 48484848 100 bacterial genomes Environmental sequencing Number of known sequences First bacterial genome 1,000 bacterial genomes Year

How much will be sequenced? 49494949 Everybody in USA Everybody in San Diego One genome from every species 100 people All cultured Bacteria Most major microbial environments Training people for the future X-Prize competition, sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 per genome Year

Most pipelines work the same way!

Metagenomics Processing Merge paired-end reads Preprocessing Functional Assignments Taxonomic assignments Contamination removal Gene Prediction Contig Clustering Binning reads

Metagenomics Quality control – Prinseq Statistics Deconseq Annotation FOCUS Real time metagenomics mg-rast Super FOCUS Statistics STAMP Population genomes crAss metabat ContigClustering

Metagenomics Processing AbundanceBin CompostBin concoct crAss tetra Contig clustering FASTQC FastX Toolkit fitGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment