Metagenomics in Phage Ecology Peter Salamon SDSU REU 2007
Mathematical Modeling Discovering new mathematics in phenomena. Biology good place to look. – 30 years ago, quantitative data in biology was scarce, expensive, and noisy. – today it is abundant, cheap, and clean. – ripe for mathematization.
The two hat model of a modeler Change Hats Often MathematicianBiologist Anthropomorphise
Our Modeling Problem: What can we learn about phage from their metagenomes? Genome of a community in an ecosystem. Why would we want to?
Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.
–that’s a billion times the number of stars in the universe. –that’s about 10 million moles of phage. –most numerous biological entities. –closest to chemistry/physics. –good place to look for new biomathematics. There are phage on earth
Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.
–possibly huge effect on global warming. –phage are easily bioengineered. The big question : –can we safely bioengineer a phage to remove CO 2 from the atmosphere? safety requires control of its global population and the spreading of its genes. first step toward control is understanding. 25% of oceanic carbon cycles through phage every day.
Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.
Promise of next generation magic bullets against infections. –current microbiotics less and less effective due to bacterial adaptation and horizontal gene transfer phage are the primary players in horizontal gene transfer. –easy to select phage evolved to a specific host. –Institute in Tbilisi (dates back to 1920’s) –Ken SeaTech Corporation, San Diego uses phages for control of bacterial infection on fish farms. (Feb. 2007) –FDA approval of listeria-specific bacteriophage preparation on ready-to-eat meat and poultry products (Aug. 2006).
Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.
–in a solution of ingredients, phage self assemble. –Virus-Based Toolkit for the Directed Synthesis of Magnetic and Semiconducting Nanowires (Mao et al., Science, 2004). –Assembly and functionalization of phage onto substrates patterned by dip-pen nanolithography.
Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.
Metagenomics Shotgun sequence an entire ecosystem. 200 liters of ocean water 10 8 strands of M different genotypes – (represented here as colors) Each has a certain abundance and length. Chop it up Sample and read ACCATGGT… TTACGAT… GGCACGT… TATAGGC… …
Our source of assembled metagenomes Forest (the phage) Rohwer never met a phage metagenome he didn’t want to sequence Rob (the stringologist) Edwards never met a metagenome he couldn’t assemble
Assemble Look for contigs –Perfect overlaps (>20bp with 98% identity) Assume contigs occur only between identical genotypes. Estimate abundances. …CCATGATAGGCTAACGTGCATTCGGTA AGGCTAACGTGCATTCGGTACCTTACGA…
overlaps were observed ~4% of the sequences overlapped with at least one other sequence # of sequences in overlap 17 2 The First Contig Spectrum Contigs in water off Scripps Pier
Submodel -- One genome Randomly chosen positions lead to exponentially distributed spacing. L n = number of samples from this genome mean spacing = L/n Length of all genomes taken to be L=50Kbp. Read lengths of x=600. Probability of overlap is Pdf of distance to nearest neighbor
One genome (cont.) A q-contig has a no overlap followed by q-1 overlaps followed by a no overlap. Its probability is The probability that a randomly selected fragment is part of a q-contig is With the expected number of q-contig members given by
Population Model Fit observed contig spectrum by combining contigs from different genomes Quasi-likelihood from squared deviation to observed contig spectrum. log L
Population Model Fit parametric models by maximizing likelihood Nonparametric models?
Compare Conventional Models Power Law Fits Contig Spectrum Best
2001 The modeling took ~3 integers from experiment and produced: Estimates of phage population structure (rank-abundance) including number of types and frequency of the most populous type. Our original mandate was to add some improvements to PHACCS. many users
2007 In 2001 technology and cost could only sequence about 1,000 fragments of length 500 bp. Today for a comparable cost we can sequence 400,000 fragments of length 100 bp. We are still using the 2001 model. In a few years … So are a lot of other people. With today’s sample sizes, the coverage of the more abundant phage is high and completely different modeling may be a better idea. But …
If we knew what we were doing, it wouldn’t be called research. -- A. Einstein A model should be as simple as possible and yet no simpler.