Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Identifying Lateral Gene Transfer Events

Similar presentations


Presentation on theme: "Towards Identifying Lateral Gene Transfer Events"— Presentation transcript:

1 Towards Identifying Lateral Gene Transfer Events
L. Addario-Berry, M. Hallett, J. Lagergren Presented By: Jeff Mathew

2 Roadmap Key terms τ-transfer problem H-moves and I-moves algorithm
Tree generation for simulation Experimental results Conclusions and future work

3 Lateral transfer scenario
LGT = HGT Root of scenario tree must correspond to root of gene tree The scenario tree is connected and respects the direction of evolution implied by the arcs of T and S. Lateral gene transfers are the same thing as horizontal gene transfers Clarifications S’ is still a tree; we have a mixed graph with S’ and lateral transfer edges. Mixed graph cannot contain directed cycles Cannot have 2 outgoing lateral transfers from the same vertex

4 α-activity An α-active scenario for a gene tree and species tree allows at most alpha copies of a gene to simultaneously exist in the genome of an ancestral taxon. Authors focus on 1-active scenarios though intractability results have been proved earlier for α ≥ 1. An αlpha-active scenario for a gene tree and species tree allows at most αlpha copies of a gene to simultaneously exist in the genome of an ancestral taxon. Figures 1 (ii) and (iii) give an intuitive graphical explanation of activity level. At any point during the evolution represented by the shaded region in these diagrams, there exist two copies of the gene in the genome of these ancestral organisms. This scenario is said to be 2 active. We justify our focus on 1-activity here by at least 2 observations: it is computationally the most feasible level and earlier experimental papers make the assumption that higher activity levels are caused only by gene duplication events.

5 τ-transfer problem Input: Species tree S, gene tree T, integer τ
Output: A τ* lateral transfer scenario for S and T, τ* ≤ τ Intractability result The decision version of the α-Active, τ-Transfer Problem (does there exist a α-active scenario with cost ≤ τ?) is NP-complete. τ is the number of lateral transfer events needed to explain the difference between S and T

6 Algorithm 2 Phase approach Phase 1
While H-fat or I-fat vertices remain Perform H-fat move or I-fat move At the end of phase 1, we are guaranteed that the scenario is 1-active. What about cycles? Phase 2 Remove minimum number of LGT events from each candidate to make it acyclic. Running Time: 24τ n2

7 Simulating species trees
Create random species tree S on n-leaves. Θ(log n) expected depth S is supposed to reflect the actual evolutionary relationships between taxa S is ultrametric. Therefore, edge-weights correspond to time. Randomly assign weights to every edge such that every root-to-leaf path has weighted sum 1. We experimented with several alternative approaches for generating ultrametric species trees and found that this produces trees that minimize the difference between arc weights under the L∞ norm. This is important during the gene tree creation phase as it tends to “distribute” the lateral transfer events more evenly throughout the species tree.

8 Simulating gene trees Begin with generated ultrametric species tree
Lateral transfer events occur according to a Poisson process with mean rate λ Moving from root to leaves, for each vertex x0 with children x1 and x2, examine both edges If the Poisson process provides us with a lateral transfer event along (x0, x1), we add it and point it to a randomly chosen edge alive at that point in time. Else add a speciation event for x1 Repeat the analysis for (x0, x2) The exponential distribution occurs naturally when describing the lengths of the inter-arrival times in a homogeneous Poisson processes. λ varies between 0 and 1 and is proportional to the number of LGT events in the tree Suppose that arc <x, y> is the tail and <x’, y’> is the head of the lateral transfer. If there already exist events scheduled for time t’, t’ ≥ t, along arc <x’ , y’>, then all such events are aborted. The gene lineages associated with these events are lost. This corresponds to the foreign gene “knocking out” the resident gene in the genome of the organism. Such a protocol is necessary if we are to guarantee the 1-activity constraint of the scenario.

9 Degenerate Cases Simulation can result in plausible biological events that are not detectable by the algorithm. Useless transfers: LGTs that don’t change the gene tree Transfer-loss events: One child of a node is a LGT event. Another child is a loss event. Useless transfer: At the point of evolution marked by X, there is a lateral transfer between two arcs in the species tree that share a common parent. Clearly, the gene tree is not changed by such transfers. In other words, the root of the species tree has children ABC and DEF and the root of the gene tree has children ABC and DEF even though a transfer has occurred at X. In the subtree labeled Y of the species tree, we show an example of two useless transfers that together do not cause the gene tree to disagree with the species tree. This subtree of the species tree has the ancestor AB and C as siblings. In the gene tree, A remains closer to B due to the “later” lateral transfer and C remains being a sibling with the ancestor AB via the “earlier” lateral transfer. Transfer-loss event: At the point marked Z in the diagram, a lateral transfer occurs from taxon n to taxon 2. Between point Z and Z, this lineage is lost. Note that one child of the vertex of the gene tree at point Z is a transfer event and one child is a loss event; we term this a transfer-loss event. Let T be a gene tree and S be a species tree and let τ be the true number of lateral transfer events that occurred during the period of evolution (the true number of lateral transfer events generated by our simulation of evolution). Let τ' be the minimum cost scenario found by our algorithm. When a transfer-loss event occurs, it may be the case that τ' < τ, τ' = τ or τ' > τ. We term these helpful, harmless, and harmful resp. The example in shows that a single harmful transfer-loss event can cause the algorithm to require Ω(n) lateral transfers to explain the disagreement between the gene and species tree. It is easy to verify that the minimum cost scenario for this particular example requires n−2 transfer events. It is equally easy to create examples of helpful and harmless transfer-loss events.

10 Results Ω = number of repetitions τ = true number of LGT events
τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process In Figure (a), we can see that the number of transfers in a simulation rises linearly as a function of λ for a fixed n. This is consistent with a Poisson process and our species tree generation routine. As the mean rate parameter λ grows, the number of transfers τ will eventually become sufficiently large so that no further transfers will be detected be the algorithm. This is trivially the case if τ exceeds n−2 for a species tree with n leaves. Figure (b) plots the estimated number of transfers versus the λ. As τ' is consistently less than τ, we may conclude the majority of transfers are not harmful transfer-loss events. When λ = 0.6, the average value for τ is approximately 11 and τ' is 8.6 with variance Note that for λ > 0.6, one can see slight saturation occurring.

11 Finding the saturation point
The point when the average τ‘ stops increasing. Random trees from a large pool were chosen as gene trees and species trees Trials suggest that saturation point is slightly above n/2, i.e., when τ > n/2, the algorithms stops detecting new LGT events Thus, if τ’ > n/2, the correspondence between T and S via LGT events is not very meaningful.

12 Results Ω = number of repetitions τ = true number of LGT events
τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process

13 Results Ω = number of repetitions τ = true number of LGT events
τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process The last 2 figures together reaffirm the intuition that harmful transfer-loss events are very rare and, when they do occur, their effect is negated by the occurrence of useless transfers and the effects of saturation. These experiments allow us to conclude that harmful transfer-loss events are rare, or if they do occur, there exist alternative scenarios with approximately the same overall cost and/or the number of helpful transfer-loss events and useless transfers is sufficiently high. We also note that over some 10, 000 trials, we did not find a scenario that required cycle elimination (the second phase of the algorithm). Although it is possible to construct an example where the algorithm will require this phase (see Figure 12 in the appendix), it appears that these scenarios are extremely rare or the rate of useless transfers and helpful transfer-loss events is sufficiently high that a scenario with cost τ' ≤ τ is created.

14 Results Ω = number of repetitions τ = true number of LGT events
τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process Shows number of scenarios with certain costs. τ’’ = τ’ + k where k = 0…3 Number of scenarios with cost τ’ is 2.09 Number of scenarios with cost τ’ + 1 is ~13 Number of scenarios with cost τ’ + 2 is ~32 Number of scenarios with cost τ’ + 3 is 51.32 Another interesting finding since you would expect the number of minimum cost scenarios to increase exponentially

15 Conclusions Empirically verified feasibility of the τ- transfer algorithm Degenerate events such as transfer-loss events that result in over-estimates of transfers occur with low probability Achieved near-optimal scenarios when λ is low enough not to cause saturation The cycle elimination phase of the algorithm is extremely rare in practice implying a O(22τ n2) running time.

16 Future work and open problems
Use weighted gene trees and species trees Species trees are nearly ultra-metric while gene trees are not Do fast algorithms exist when the input is a set of gene trees with no species tree? Tractability on larger phylogenies Can we consider gene duplication, lateral gene transfers, and other events simultaneously? Can we use probabilistic models that assign likelihood events to various events and optimize over such models in a tractable manner?


Download ppt "Towards Identifying Lateral Gene Transfer Events"

Similar presentations


Ads by Google