Download presentation
Presentation is loading. Please wait.
Published byMerry Phillips Modified over 9 years ago
1
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case
2
Advancing Science with DNA Sequence Minoan eruption and metagenomics …it seemed as though the sea was being sucked backwards, as if it were being pushed back by the shaking of the land…Behind us were frightening dark clouds, rent by lightning twisted and hurled, opening to reveal huge figures of flame. These were like lightning, but bigger. From Pliny the Younger’s Letter
3
Advancing Science with DNA Sequence Apart from Minoan eruption… from Chernicoff & Stanley, Geology, 2007 Diagram by Gary Massoth/PMEL
4
Advancing Science with DNA Sequence Sampling sites white mat red mat Key gradients white vs red: Temperature60 vs 18 o C CO2 tension>99% vs <1%
5
Advancing Science with DNA Sequence This is what it looks like
6
Advancing Science with DNA Sequence Chimney material may be of biological origin
7
Advancing Science with DNA Sequence Standard JGI metagenome pipeline DNA sample DNA QC SSU pyrotags shotgun libraries http://pyrotagger.jgi-psf.org Community composition Semi-quantitative – OTU abundance Illumina long mate pair Illumina standard 454 standard 454 long mate pair Metagenome IMG/M-ER contigs + unassembled reads Community composition Functional analysis Assembly Analysis
8
Advancing Science with DNA Sequence Pyrotag results – BLASTn against Greengenes database
9
Advancing Science with DNA Sequence PhyloDistribution results – BLASTp of metagenome CDSs against isolates in IMG
10
Advancing Science with DNA Sequence Pyrotags vs PhyloDistribution – white mat Big differences in abundance (an order of magnitude or more) of Bacteroidetes and Thermotogae
11
Advancing Science with DNA Sequence Possible explanations Primer bias in pyrotags (against Proteobacteria)? Amplification artifacts in pyrotags – well known for metagenome data Sequencing GC bias in the metagenome – low and high ( 65%) are underrepresented in Illumina data K-mer assembler problems: abundant populations may be undrrepresented in assembly if incorrect k-mer/coverage parameters selected
12
Advancing Science with DNA Sequence PCR artifacts in metagenome data 12 Reason: presence of free beads during the library prep step; escaped emPCR products bind to free beads and are disproportionately amplified 454 technology includes an emulsion PCR step, which may lead to artificial overrepresentation of certain sequences
13
Advancing Science with DNA Sequence Low GC (Brachyspira) What about GC bias? Medium GC (Arcanobacterium) High GC (Cellulomonas) Question: how do you find average/max/min GC content for a clade? Answer: IMG=>Genome Browser=>View Phylogenetically=>click on green + to select the clade, then “Add selected to Genome Cart”=>Compare Genomes=>Genome Statistics Result: Thermotogae GC percent 41 average/47 max/31 min Bacteroidetes GC percent 42.5 average/66 max/31 min
14
Advancing Science with DNA Sequence Are there any abundant populations that could be filtered out in assembly? Typical Pyrotagger output There are 2 highly abundant populations – just 2 clusters account for nearly all Bacteroidetes and Thermotogae in the sample
15
Advancing Science with DNA Sequence Let’s take a closer look at the assemblies and unassembled reads White matRed mat 454 reads total299,9751,429,091 Illumina reads total49,227,14645,337,178 Assembled contigs195,59088,776 N50, bp659869 Longest contig, bp28,14575,483 Illumina reads mapped to assembly, % total 42.312.5 454 reads mapped to assembly, % total 62.115.3
16
Advancing Science with DNA Sequence Functional analysis: metagenome as a bag of functions Red mat is taxonomically more diverse Is it more diverse functionally? White matRed mat COG clusters36313402 Pfam clusters38473505 Question: where do you find this information? Answer: IMG=>Taxon Details=>Metagenome Statistics; Genes with Pfam=>Display as a list =>Export Rarefaction curves: white mat is expected to have ~4000 different Pfams; red mat ~3600
17
Advancing Science with DNA Sequence Abundance Comparisons Motility and chemotaxis genes are overrepresented in white mat (detected by both Pfams and COG Categories) white matred mat
18
Advancing Science with DNA Sequence Is motility/chemotaxis common to all organisms in white mat? Scenario 1: the function/pathway is overrepresented because it is present in all members of the community, possibly at higher copy number Scenario 2: the function/pathway is overrepresented because it is present in one clade, which is absent from the second sample Question: can we distinguish between the two scenarios? Answer: click on the gene count for protein family/functional category, add all genes to Gene Cart=>add scaffolds to Scaffold Cart=>PhyloDistribution of all scaffolds in the Scaffold Cart
19
Advancing Science with DNA Sequence Are Sulfurimonas-like bacteria present in both samples? The total number of sequences in all clusters assigned to Epsilonproteobacteria is 50 in white mat and 66 in red mat Largest cluster in white mat includes 125K+ sequences Largest cluster in red mat includes 14K+ sequences Question: what about the presence of Sulfurimonas-like bacteria in the metagenomes? Answer: go to Compare Genomes=>PhyloDistribution=>Genome vs Metagenomes, select the genome; the histogram shows the number of BLASTp hits from CDSs in all metagenomes to this genome
20
Advancing Science with DNA Sequence Are there any methylotrophs in the white mat?
21
Advancing Science with DNA Sequence Conclusions Two communities have different composition; white mat sampled next to the hydrothermal vent has lower complexity Community composition as sampled by pyrotags and the metagenome may be quite different due to a number of biases Some protein families/functional categories are more abundant in one sample as compared to the other because of different community composition, and not necessarily because they are more important in this environment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.