Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since.

Slides:

Advertisements

Similar presentations

Boosting Textual Compression in Optimal Linear Time.

Advertisements

Trees Chapter 11.

An introduction to maximum parsimony and compatibility

Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.

3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.

Chapter 4 Probability and Probability Distributions

Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.

1 Section 9.1 Introduction to Trees. 2 Tree terminology Tree: a connected, undirected graph that contains no simple circuits –must be a simple graph:

Introduction to stochastic process

Point and Confidence Interval Estimation of a Population Proportion, p

This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.

Random Variables and Distributions Lecture 5: Stat 700.

DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…

Phylogenetic Tree Construction and Related Problems Bioinformatics.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Graphs, relations and matrices

Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?

Physical Mapping of DNA Shanna Terry March 2, 2004.

Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.

7 Graph 7.1 Even and Odd Degrees.

Chapter 16 Random Variables.

Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.

Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.

Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Random Variables.

Today Graphical Models Representing conditional dependence graphically

COMPSCI 102 Introduction to Discrete Mathematics.

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Lecture Slides Elementary Statistics Twelfth Edition

Chapter 15 Probability Rules!

Linear Algebra Review.

Computational Geometry

Monte Carlo simulation

Chapter 16 Random Variables.

Statistics: The Z score and the normal distribution

CONTINUOUS RANDOM VARIABLES

Dynamic Graph Partitioning Algorithm

Graph theory Definitions Trees, cycles, directed graphs.

Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.

Combining Random Variables

Multiple Alignment and Phylogenetic Trees

Lecture Slides Elementary Statistics Thirteenth Edition

Hidden Markov Models Part 2: Algorithms

Depth Estimation via Sampling

The Tree of Life From Ernst Haeckel, 1891.

Week 6 Statistics for comparisons

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Propagation Algorithm in Bayesian Networks

Trees CMSC 202, Version 5/02.

CMSC 202 Trees.

Reading Phylogenetic Trees

2 Chapter Numeration Systems and Sets

Business and Management Research

Summary (Week 1) Categorical vs. Quantitative Variables

Summary (Week 1) Categorical vs. Quantitative Variables

Artificial Intelligence 6. Decision Tree Learning

Text Book: Introduction to algorithms By C L R S

Randomized Algorithms CS648

Discrete Mathematics for Computer Science

Trevor Brown DC 2338, Office hour M3-4pm

Mathematical Analysis of Algorithms

The Selection Problem.

Experiments, Outcomes, Events and Random Variables: A Revisit

Life’s Richness Pageant or Summary Measures of Diversity

Sets, Combinatorics, Probability, and Number Theory

Presentation transcript:

Slide 1: Thank you Elizabeth for the introduction, and hello everybody. So, I have been a PhD student with Charles Semple and Mike Steel at the UoC since February.

Slide 2: A central question in conservation biology is how to measure, predict and counter the loss of biodiversity as species face extinction. I am working on problems that are related to the so called phylogenetic diversity, which is a measure for describing how much of an evolutionary tree is spanned by a subset of species. I am particularly interested in questions which require tools in combinatorics, complexity theory, algorithms and probability theory.

Slide 3: My talk will present a joint work with Mike Steel and Fabio Pardi. It will discuss the distribution of phylogenetic diversity under random extinction.

Slide 4: Before presenting the results, I’m going to describe the probabilistic model under consideration. First I would like to define the notion of phylogenetic diversity or briefly PD. We have a set X of present species and a rooted phylogenetic X-tree, which represents the evolutionary development of these taxa from their common ancestor. In my example in the figure, present species are illustrated by coloured cyrcles. We have also lengths on the edges, more precisely, there is a map Lambda which assigns a non-negative length to each edge. We denote these lengths by lambdas. For example, edge e has length Lambda_e, and so on. Such a length don’t necessarily refer to the temporal duration of the development on the edge but rather it may represent the amount of genetic change on that edge or perhaps other features such as morphological diversity. For a subset X’ of X, the phylogenetic diversity of that species set is the sum of the lengths of the edges of the tree that connects this subset and the root vertex. So phylogenetic diversity is a quantitative tool for measuring how diverse genetically a species set is. Now look at the figure again; the PD score of the subset containing the blue and the green species is the sum of the lengths lambda_a, lambda_b, lambda_d, and so on, because these are the lengths of the edges of the subtree connecting the blue and the green species and the root. The edges of that subtree are indicated by coloured lines int he figure.

Slide 5: In the ’Field of bullets’ model, we assume that each species is given a so called survival probability, that is, we are given a map p that assigns to each taxon i a survival probability p_i. We construct a random set X’ by assigning each element of X to X’ independently with its survival probability. For example, taxon i will be in X’ with its survival probability p_i. We regard X’ as the set of taxa that will still exist at some time in the future. In the simple FOB model, each taxon has the same probability of surviving, whereas the more realistic general FOB model allows each species to have its own survival probability. Extinction events are in both models independent.

Slide 6: Under the FOB model, we define the future phylogenetic diversity as the random variable which is the PD score of the random future taxon set X’. Let Phi denote this random variable. The figure shows the situation where the random survival set consists of the blue and the green species. As we have seen, the PD of this set is the sum of the lengths of the coloured edges. So there are edges that we sort of count and other edges that we don’t count. It is easy to see that Phi can be written as the sum of the terms Lambda_e times Y_e, where Y_e is the random variable which takes the value 1 if e lies on at liest one path between an element of X’ and the root and which is 0 otherwise. In our example, the future PD would be 0 x lambda_g + 1 x lambda_h and so on. So this is the FOB model. Now imagine that we have a sequence of such models, a sequence of trees of increasing size, each tree having its edge length function and its survival probability function, and consider the sequence of the corresponding future phylogenetic diversities. Our goal was to determine the distribution of Phi along these sequences, that is, as the size of the trees goes to infinity. Lets forget about the sequences and the asymptotic behavior for a while and start with the mean and variance of Phi. In order to do that, please have a second look at the formula for Phi. Megvan?

Slide 7: Since Y_e is a binary random variable with values 0 and 1, its mean is just the probability that it takes the value 1. Let us denote this probability by P_e. With this notation, we get this nice and simple formula for the expectation. The variance is also quite easy to compute. It includes a sum over all edges in the tree, and another one, which is a sum over edge pairs (e,f), such that e and f are distinct edges and the path from the root to f includes edge e, or equivalently, the species set below f is a proper subset of the species set below e. So, we know how to determine the main parameters of the distribution we set out to study.

Slide 8: We have seen that Phi is sum of many random variables. This suggests that for large trees, Phi might be normally distributed. It turns out, that under two mild conditions, Phi has asymptotically a normal distribution, even under the general model. The first condition is, that most of the survival probabilities are not too extreme, so most of them are neither arbitrarily close to 0 nor arbitrarily close to 1. The second one is, that the pendant edge lengths on average are not too small in relation to the largest edge length in the tree, where pendant edges are edges that are incident with a leaf of the tree. Of course these conditions can be formulated in a mathematically precise way.

Slide 9: The first question one could ask is, does the result hold if we drop one of the or both conditions? The answer to this question is no. To see why, consider the situation where all species have survival probability 0 or 1. In this case the first condition fails. Since it leads to a degenerate distribution, it is clear that condition 1 can’t be dropped completely. For the second condition, consider the tree in the figure with n leaves. It has n-1 pendant edges that have a length of 1 over n-1 squared, and two more edges that have length 1. Furthermore, assume that all species have the same survival probability, which is strictly between 0 and 1. Consider now the sequence of such trees as n gets larger and larger. In particular, consider the sequence of the corresponding future phylogenetic diversities. It can be seen that the sequence of these random variables does not converge to a normal distribution. I didn’t tell you the precise form of condition2 but I can tell you that in this example, C1 is satisfied but C2 fails and this implies, that C2 can’t be removed.