Rob Edwards San Diego State University Population Genomics Rob Edwards San Diego State University Image: Lisa Brown for National Public Radio
Phages in the Worlds Oceans GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites LI 4 sites
Most Marine Phage Sequences are Novel
Marine Single-Stranded DNA Viruses 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) 40% viral particles in SAR are ssDNA phage Several full-genome sequences were recovered via de novo assembly of these fragments Confirmed by PCR and sequencing
SAR metagenome and Chlamydia φ4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 genome Chl4 ORF calls 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome
The phage proteomic tree
Signature sequences HECTOR and PARIS Degenerate primers that amplify T7 DNA polymerase Tested samples from around world Tested by different investigators in different laboratories Mya Breitbart
T7 phages are globally distributed ~1026 copies of each sequence on the planet = 60 metric tons of this DNA sequence Breitbart et al, FEMS Micro Lett
Some phages are everywhere ssDNA λ-like Phage Proteomic Tree v. 5 (Edwards, Rohwer) T4-like T7-like
Compare viruses to all metagenomes Phage P4 – 11kb, 10 ORFs Azul – individual sequence reads in a metagenome Verde – coverage across genome
Parts of viruses are everywhere # metagenome hits P4 phage genome
Viruses have lots of unknown genes Microbial Viral
Bas Dutilh
cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4
cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4
HMP viruses Reyes et al. Nature 2010
Phages are more variable than microbes Reyes et al. Nature 2010 Functions present in samples
Complete crAssphage genome
Complete crAssphage genome
https://twitter.com/Venustwofacecat What are chimeras? Chimerization is more frequent between closely related strains Similar sequences Venus the chimeric cat https://www.facebook.com/VenusTheAmazingChimeraCat https://twitter.com/Venustwofacecat What are intra-phyla chimeras??? Aziz et al. NAR 2010
What is the host? Requires correct host strain Requires phage makes visible plaques Often requires correct concentrations of Mg++, Ca++, etc No PCR hits in at least 100 plaques isolated from 10 pooled viral preparations on Bacteroides fragilis and B. thetaiotaomicron lawns.
crAssphage is abundant! Abundance-ubiquity plot
Potential caveats Phage or contamination? Amplification skews? Highly abundant in viral metagenomes size- and density-filtered for VLPs ORFs show similarity to bacteriophage and bacterial proteins (no conserved bacterial or archaeal metabolic genes) Phage-like modularity among functions Coding structure of the ORFs is typical of a phage genome Putative prokaryotic promoter patterns Genome detected in many metagenomes around the world Amplification skews?
Summary crAssphage is everywhere everyone has it (rounding up) we don't know what it does
metagenomics metagenomics 1.0: profiling metagenomics 2.0: population genomics
Tools for population genomics AbundanceBin CompostBin CONCOCT crAss GroopM Metabat mmgenome
Discussion points How many genomes would you expect in a population? More coverage versus more samples? Cutoffs for inclusion (e.g. GC, closeness, etc)
Discussion points How do you know the contigs are from the same organism (genotype) http://edwards.sdsu.edu/GenomePeek BLAST hits GC content or k-mer composition Single copy genes abundance profiles across metagenomes Paired ends/mate pairs PCR PFGE and size comparisons SIP and metabolically active fraction Single cell genomics Culturing / genome sequencing