A computational phylogenetic approach to interaction analysis Cynthia Sims Parr University of Maryland College Park Ecological Society of America Montreal,

Slides:



Advertisements
Similar presentations
UMBC an Honors University in Maryland The Semantic Web … It Just Might Work. Joel Sachs Joint work with: Cyndy Parr, Andriy Parafiynyk,
Advertisements

UMBC an Honors University in Maryland Examples of Integrating Ecological Information on the Semantic Web Joel Sachs and Cynthia Simms Parr contact:
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Reconstructing and Using Phylogenies
Molecular Evolution Revised 29/12/06
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
EcoLens and TreePlus: Tools for exploring ecological interaction data Cynthia Sims Parr Bongshin Lee, Ben Bederson University of Maryland, College Park.
Probabilistic methods for phylogenetic trees (Part 2)
Bell Work Dogs of a certain breed can have black fur or white fur. Black fur is dominant, but the breeder only wants puppies with white fur. Cross two.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
What Is Phylogeny? The evolutionary history of a group.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Automatic methods for functional annotation of sequences Petri Törönen.
Comparative methods: Using trees to study evolution.
Terminology of phylogenetic trees
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Interactive Visualizations for Biodiversity Information Bongshin Lee Researcher Visualization and Interaction Research Group Microsoft Research Bongshin.
UMBC an Honors University in Maryland 1 Using the Semantic Web to Support Ecoinformatics Andriy Parafiynyk University of Maryland, Baltimore County
Predicting food web connectivity Phylogenetic scope, evidence thresholds, and intelligent agents Cynthia Sims Parr Ecological Society of America Memphis,
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogeny & the Tree of Life
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
CS Machine Learning Instance Based Learning (Adapted from various sources)
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Classification Biology I. Lesson Objectives Compare Aristotle’s and Linnaeus’s methods of classifying organisms. Explain how to write a scientific name.
Reconstructing and Using Phylogenies 16. Concept 16.1 All of Life Is Connected through Its Evolutionary History All of life is related through a common.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
PHYLOGENY AND THE TREE OF LIFE.  Phylogeny is the evolutionary history of a species or a group of species.  To determine how an organism is classified,
Section 2: Modern Systematics
Phylogeny and the Tree of Life
Phylogeny & the Tree of Life
Inferring a phylogeny is an estimation procedure.
CJT 765: Structural Equation Modeling
Section 2: Modern Systematics
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Browsing with TaxonTree: Visualizing Biodiversity Information
Clustering.
Systematics: Tree of Life
DATA MINING Introductory and Advanced Topics Part II - Clustering
Systematics: Tree of Life
Morphological Phylogenetics in the Genomic Age
Algorithms for Inferring the Tree of Life
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

A computational phylogenetic approach to interaction analysis Cynthia Sims Parr University of Maryland College Park Ecological Society of America Montreal, Canada August 9, 2005

Predicting Ecological Interactions ?

Terminology & Outline Describe computational framework for predicting links Propose general algorithms and discuss implications Preliminary results Simple model using large database and evolutionary trees does a surprisingly good job. web nodelink

Evolutionary trees Family Genus Species

Computational framework Database Interaction Web Database ADW DB and Graph Vis tools Algorithms Field Test Predictions Explore for patterns Phylogenies Classifications Note: More than one way to do it!

Predicting Links: parameterized functions Step 1. Select functions that might predict links using characteristics of taxa. For example, size or stoichiometry. Step 2. Determine parameters using known links among all taxa across whole or partial database. For taxon A and taxon B with known link status: LinkStatus AB LinkStatus AB = ƒ(α, size A, size B ) + ƒ(β, stoich A, stoich B ) Step 3. Use parameterized equation to estimate LinkStatus between target taxa C and D.

Implications: parameterized functions Requires good data for target species Can incrementally add natural history functions to get better estimate, try different functions from literature or use genetic algorithms Parameterizing functions: multivariate statistics, machine learning, fuzzy inference Could use evolutionary info if you localize parameter estimates to clades or taxonomic subsets LinkPredicted CD = ƒ(α, size C,size D ) + ƒ(β, stoich C,stoich D )

Predicting Links: neighbor distance weighting E.g. for taxa X and Y, where X has nearest neighbor A and Y has nearest neighbor B, where LinkStatus between A,B is known N LinkPredicted XY = 1 (LinkStatus AB ) 1 + distance XA + distance YB  Step 1. Provide distance threshold or number of neighbors N to use. Step 2. Find nearest neighbors to your target nodes in evolutionary or trait space with known link status. Step 3. Combine LinkStatus weighted by distances:

Implications: Neighbor distance weighting Evolutionary Uses phylogeny or classification or combination of these Distance could be branch length or # steps Does not explicitly take advantage of natural history Trait space e.g. Euclidean distance in N-space Uses richest possible natural history data Could include evolutionary distance as a term

Missing data avoid it avoid comparisons with nodes without complete data substitute value of relative otherwise closest in trait space “Ancestral” Node Reconstruction e.g. Phylogenetic Mixed Model (Houseworth et al. 2001) Nodes that do not map to taxa e.g. detritus, suspended organic matter Treat as if they are a phylogenetic unit all in one polytomy Can create a “phylogeny” of neighbors. For example, “detritus” may be part of a reasonable heirarchy of organic material. Nodes that are not resolved to species Doesn’t matter for these algorithms Problems and suggested solutions

Picture of tree from TaxonTree overview Take advantage of all information as needed

Whole web solutions Some links affect others use a priori prediction of strongest links to run first, allow status of these links to enter link predictions. Webs should be realistic Vary parameters (e.g. scale of parameterization, thresholds) and rerun analyses until criterion met for the whole web Criteria: “natural” values for connectedness, stability, chain length, trophic level ratios, etc. Methodology: parsimony or likelihood analysis Computational demands will be high S 2 possible links, simultaneous multivariate equations by all variants of runs. May need heuristics.

Summary of approaches Link prediction Parameterized functions Weighted distances Evolutionary Trait space Total community solution Parsimony or likelihood solution Include other links as terms and run prioritized, stepwise analysis

Data needed Wide range of well-identified taxa Cross section of habitats Natural history data

Database status 4214 unique taxa Evolutionary tree as in Parr et al Bioinformatics.

LinkPredictor preliminary results Data 43% of nodes mapped to species level 16% nodes have no evolutionary information at all. Using only presence or absence of links Procedure Pulling out one food web at a time and predicting its links based on the rest of the data Up to 4 steps up and down the evolutionary tree, no weighting yet for distance Results On average, 49% of actual links are correctly predicted 38% of predicted links are false positives Take home: Our DB and evolutionary approach does surprisingly well at predicting food links …With SPIRE at UMBC

More questions What about predicting links among taxa from big studies outside the current database? How much improvement comes from adding links to the DB? How robust are results to differing degrees of phylogenetic resolution or taxon sampling? How robust are results to missing data? How to handle data quality issues? Error estimates?

Future work with SPIRE Role in ELVIS – LinkEP (Evidence Provider) Integrate into platform that takes location as input generates list of taxa gives evidence for interaction among taxa models change due to invasive species Pull data from semantic web rather than local database

Acknowledgements NSF IDM/ITR (PI Bederson) Bongshin Lee NBII Joel Sachs and Andrey Parafiynyk Bill Fagan and lab members Michael Kantor EcoWeb (Joel Cohen) NCEAS Interaction Web Database (Diego Vázquez) WoW (J. Dunne and N. Martinez)