Presentation is loading. Please wait.

Presentation is loading. Please wait.

Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803) 777-8923.

Similar presentations


Presentation on theme: "Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803) 777-8923."— Presentation transcript:

1 Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina jtang@cse.sc.edu (803) 777-8923

2 Outline Backgrounds Maximum Likelihood Methods for Phylogenetic Reconstruction Maximum Likelihood Methods for Ancestral Genome Inferrence Conclusions

3 Phylogenetic Reconstruction

4 Data Type Sequence Data DNA/RNA/Protein Sequences String on an alphabet of 4 or 20 characters Gene-Order Data

5

6 Simple Rearrangements

7 Rearrangement Phylogeny

8

9

10 Median Problem Goal: find M so that D AM +D BM +D CM is minimized NP hard for most metric distances

11 Binary Encoding

12 Biased Model Model of evolution: Duplications, insertions and deletions of syntenic blocks Rearrangements: inversions, translocations, fusions, fissions Binary sequences: 1(presence) vs. 0(absence) Adjacency: Pr (1 ->0) vs. Pr (0 -> 1) Gene content: Pr (1 -> 0) vs. Pr (0 -> 1) Strong bias: Pr (1 ->0) >> Pr (0 ->1) for adjacency Lose an existing adjacency: Pr (1->0)  1/O(n) Gain a new adjacency: Pr (0 -> 1)  1/O(n 2 )

13 ML Phylogenetic Reconstruction

14 Simulated Results

15 Ancestral Inference Step 1. Encoding gene orders into binary sequences. Step 2. Setup the biased transition model. Step 3. Arrange target ancestor to the root, and calculate the probabilities of character states for each character in the root. Step 4. Building the adjacency graph and use a greedy heuristic to assemble adjacencies into valid gene order for the target ancestor.

16 Probabilities are calculated with a bottom-up recursive manner, so the target ancestor is placed to the root to prevent information loss. Step 3 – Root Tree

17 Likelihood of a tree given sequence data at leaves can be computed (Felsenstein1981) XYZ W 110 0 01 XYZ W Pick one tree Pick one site Step 3 –Probabilities of Adjacencies

18 Posterior probabilities of character states (0 and 1) can be calculated according to Yang (Yang1995). This is calculated by summing over all other ancestral states except root 110 0 01 1 110 0 0 110 0 8 histories 4 histories + 4 histories Step 3 –Probabilities of Adjacencies

19 Independent adjacencies are assembled into valid gene order permutations by a greedy heuristic proposed by Jian Ma (Ma2007). Sort the edges by weight. Add the current heaviest edge to the path until a cycle is formed, then repeat the process until all vertices are traversed. Remove the lightest edge in each cycle. (1 -4 -3 5 2) Step 4 – Assemble Adjacencies

20 Transition model and reroot procedure are necessary Simulation Result

21 PMAG was compared with InferCarsPro (Ma2011) and GRAPPA_DCJ(Xu2008) Results-2

22 Genome # Gene # Tree Diameter 1n2n3n4n PMAG2010000 Tests on Large Scale Dataset

23 ML on Binary Encoding is more accurate and thousands of times faster than other methods Binary encoding reduces the complexity and allows us to using existing methods for sequence data Biased transition model and rerooting procedure are very useful Future work: Extend PMAG to handle a more general model of evolution, including gene indel and duplication Missing Adjacencies? Conclusions

24 Thank You!


Download ppt "Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803) 777-8923."

Similar presentations


Ads by Google