Metagenomics in Phage Ecology Peter Salamon SDSU REU 2007.

Slides:



Advertisements
Similar presentations
How much have I studied? A lot Some A little Not at all
Advertisements

Lecture 8 Probabilities and distributions Probability is the quotient of the number of desired events k through the total number of events n. If it is.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Sequence Assembly for Single Molecule Methods Steven Skiena, Alexey Smirnov Department of Computer Science SUNY at Stony Brook {skiena,
Evolution of Populations
Tucson High School Biotechnology Course Spring 2010.
SBI4U Population Dynamics
Lesson Overview 1.3 Studying Life.
Introduction to Ecology. Ecology is the scientific study of interactions among organisms and between organisms and their environment.
Basics of Linkage Analysis
Microbial Genetics (Micr340) Lecture 10 Lytic Bacteriophages (II)
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Providence University College of Management Calculating and Reporting Wu-Lin Chen Department of Computer Science and Information Management.
Mathematical Modelling of Phage Dynamics: Applications in STEC studies Tom Evans.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Mahanalobis Distance Dr. A.K.M. Saiful Islam Source:
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Computational Analysis of Transcript Identification Using GenBank Slides by Terry Clark.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
10.1 – what Is Biodiversity?.
Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.
Molecular Microbial Ecology
Evolution Test Review Session!!
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Quantitative Skills: Data Analysis
Probes can be designed in an evolutionary hierarchy.
The role of the Chequamegon Ecosystem-Atmosphere Study in the U.S. Carbon Cycle Science Plan Ken Davis The Pennsylvania State University The 13 th ChEAS.
Distribution of Mutation Effects and Adaptation in an RNA Virus Christina Burch UNC Chapel Hill.
Population Ecology 4 CHAPTER
Ecosystems. Ecosystem Ecology Ecosystem ecology is the study of how energy and materials are used in natural systems.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Introduction to Biology Estimated 5-30 million species Only 2 million have been identified Only a few thousand have been studied Believed that life arose.
Neanderthals Noonan, et al. Sequencing and Analysis of Neanderthal Genomic DNA Green, et al. Analysis of one million base pairs of Neanderthal DNA Kristine.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
© File copyright Colin Purrington. You may use for making your poster, of course, but please do not plagiarize, adapt, or put on your own site. Also, do.
Environmental Studies, 2e © Oxford University Press 2011 All rights reserved Environmental Studies, 2e © Oxford University Press 2011All rights reserved.
5 Evolution and Community Ecology CHAPTER. Black and White, and Spread All Over Zebra mussels and quagga mussels were accidentally introduced into Lake.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Green House Effect and Global Warming. Do you believe that the planet is warming? 1.Yes 2.No.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
Page 1 Model interoperations: Community models, models as services, and model webs NASA Biodiversity and Ecological Forecasting Team Meeting New York 8.
Systems Microbiology Biology 475. Systems microbiology aims to integrate basic biological information with genomics, transcriptomics, metabolomics, glycomics,
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Parameter versus statistic  Sample: the part of the population we actually examine and for which we do have data.  A statistic is a number summarizing.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
A NASA satellite to track carbon dioxide in the Earth’s atmosphere failed to reach its orbit during launching Tuesday morning, scuttling the $278 million.
Human History & Global Impacts. The 200,000 yrs of human history and the recent population explosion  We evolved as homo sapiens 200,000 years ago 
CP: Evolution and Ecology Review. Natural selection Organisms with traits best suited to their environment are more likely to survive and reproduce ie.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Rob Edwards San Diego State University
k-Nearest neighbors and decision tree
Parameter versus statistic
Plankton Ecology: Primary production, Phytoplankton and Zooplankton
Quantitative Data Analysis P6 M4
H = -Σpi log2 pi.
Bacteriophages.
POINT ESTIMATOR OF PARAMETERS
Star Light The spectrum of light a star emits can tell us what type of atom is present both inside the core and in its atmosphere. The pattern of dark.
Genome resolved metagenomics
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Metagenomics in Phage Ecology Peter Salamon SDSU REU 2007

Mathematical Modeling Discovering new mathematics in phenomena. Biology good place to look. – 30 years ago, quantitative data in biology was scarce, expensive, and noisy. – today it is abundant, cheap, and clean. – ripe for mathematization.

The two hat model of a modeler Change Hats Often MathematicianBiologist Anthropomorphise

Our Modeling Problem: What can we learn about phage from their metagenomes? Genome of a community in an ecosystem. Why would we want to?

Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.

–that’s a billion times the number of stars in the universe. –that’s about 10 million moles of phage. –most numerous biological entities. –closest to chemistry/physics. –good place to look for new biomathematics. There are phage on earth

Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.

–possibly huge effect on global warming. –phage are easily bioengineered. The big question : –can we safely bioengineer a phage to remove CO 2 from the atmosphere? safety requires control of its global population and the spreading of its genes. first step toward control is understanding. 25% of oceanic carbon cycles through phage every day.

Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.

Promise of next generation magic bullets against infections. –current microbiotics less and less effective due to bacterial adaptation and horizontal gene transfer phage are the primary players in horizontal gene transfer. –easy to select phage evolved to a specific host. –Institute in Tbilisi (dates back to 1920’s) –Ken SeaTech Corporation, San Diego uses phages for control of bacterial infection on fish farms. (Feb. 2007) –FDA approval of listeria-specific bacteriophage preparation on ready-to-eat meat and poultry products (Aug. 2006).

Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.

–in a solution of ingredients, phage self assemble. –Virus-Based Toolkit for the Directed Synthesis of Magnetic and Semiconducting Nanowires (Mao et al., Science, 2004). –Assembly and functionalization of phage onto substrates patterned by dip-pen nanolithography.

Bacteriophage are important There are phage on earth. 25% of oceanic carbon cycles through phage every day. Promise of next generation magic bullets against infections. Self assembling -- dramatic implications for nanotechnology.

Metagenomics Shotgun sequence an entire ecosystem. 200 liters of ocean water 10 8 strands of M different genotypes – (represented here as colors) Each has a certain abundance and length. Chop it up Sample and read ACCATGGT… TTACGAT… GGCACGT… TATAGGC… …

Our source of assembled metagenomes Forest (the phage) Rohwer never met a phage metagenome he didn’t want to sequence Rob (the stringologist) Edwards never met a metagenome he couldn’t assemble

Assemble Look for contigs –Perfect overlaps (>20bp with 98% identity) Assume contigs occur only between identical genotypes. Estimate abundances. …CCATGATAGGCTAACGTGCATTCGGTA AGGCTAACGTGCATTCGGTACCTTACGA…

overlaps were observed ~4% of the sequences overlapped with at least one other sequence # of sequences in overlap 17 2 The First Contig Spectrum Contigs in water off Scripps Pier

Submodel -- One genome Randomly chosen positions lead to exponentially distributed spacing. L n = number of samples from this genome mean spacing = L/n Length of all genomes taken to be L=50Kbp. Read lengths of x=600. Probability of overlap is Pdf of distance to nearest neighbor

One genome (cont.) A q-contig has a no overlap followed by q-1 overlaps followed by a no overlap. Its probability is The probability that a randomly selected fragment is part of a q-contig is With the expected number of q-contig members given by

Population Model Fit observed contig spectrum by combining contigs from different genomes Quasi-likelihood from squared deviation to observed contig spectrum. log L

Population Model Fit parametric models by maximizing likelihood Nonparametric models?

Compare Conventional Models  Power Law Fits Contig Spectrum Best

2001 The modeling took ~3 integers from experiment and produced: Estimates of phage population structure (rank-abundance) including number of types and frequency of the most populous type. Our original mandate was to add some improvements to PHACCS. many users

2007 In 2001 technology and cost could only sequence about 1,000 fragments of length 500 bp. Today for a comparable cost we can sequence 400,000 fragments of length 100 bp. We are still using the 2001 model. In a few years … So are a lot of other people. With today’s sample sizes, the coverage of the more abundant phage is high and completely different modeling may be a better idea. But …

If we knew what we were doing, it wouldn’t be called research. -- A. Einstein A model should be as simple as possible and yet no simpler.