Download presentation
Presentation is loading. Please wait.
Published byMaija-Leena Turunen Modified over 5 years ago
1
Volume 7, Issue 2, Pages 208-218.e11 (August 2018)
Continuous-Trait Probabilistic Model for Comparing Multi-species Functional Genomic Data Yang Yang, Quanquan Gu, Yang Zhang, Takayo Sasaki, Julianna Crivello, Rachel J. O'Neill, David M. Gilbert, Jian Ma Cell Systems Volume 7, Issue 2, Pages e11 (August 2018) DOI: /j.cels Copyright © 2018 The Author(s) Terms and Conditions
2
Cell Systems 2018 7, 208-218.e11DOI: (10.1016/j.cels.2018.05.022)
Copyright © 2018 The Author(s) Terms and Conditions
3
Figure 1 Overview of the Phylo-HMGP Model
(A) Example of the state space and state-transition probabilities of the Phylo-HMGP model associated with the continuous genomic data in (C). Si represents a hidden state. Each hidden state is determined by a phylogenetic model ψi, which is parameterized by the selection strengths αi, Brownian motion intensities σi, and the optimal values θi of ancestor species and observed species on the corresponding phylogenetic tree. αi, σi, and θi are all vectors. (B) Illustration of the Ornstein-Uhlenbeck (OU) processes along the species tree specified in (C). X(t) represents the continuous trait at time t. The trajectories of different colors along time correspond to the evolution of the continuous trait in different lineages specified by the corresponding colors in (C), respectively. The time points t1, t2, t3, and t4 represent the speciation time points, which correspond to the speciation events shown in (C). The observations of the five species also represent an example of state S2 in (C). (C) Simplified representation of input and output of the Phylo-HMGP model. The five tracks of continuous signals represent the observations from five species. Si represents the underlying hidden states. Specifically, the example is the replication timing data, where “early” and “late” represent the early and late stages of replication timing, respectively. The species tree alongside the continuous data tracks shows the evolutionary relationships among the five species in this study. See also Figures S2, S8, and S9. Cell Systems 2018 7, e11DOI: ( /j.cels ) Copyright © 2018 The Author(s) Terms and Conditions
4
Figure 2 Evaluation Using Simulated Datasets
(A) Evaluation of Gaussian-HMM, GMM, K-means clustering, Phylo-HMGP-BM, and Phylo-HMGP-OU on six simulation datasets in simulation study I in terms of AMI (Adjusted Mutual Information), ARI (Adjusted Rand Index), and F1 score. (B) Evaluation of Gaussian-HMM, GMM, K-means clustering, Phylo-HMGP-BM, and Phylo-HMGP-OU on six simulation datasets in simulation study II in terms of AMI, ARI, and F1 score. In both (A) and (B), the SE of the results of ten repeated runs for each method is also shown as the error bar. See also Tables S1 and S2 and Figure S1. Cell Systems 2018 7, e11DOI: ( /j.cels ) Copyright © 2018 The Author(s) Terms and Conditions
5
Figure 3 RT Evolution Patterns Identified by Phylo-HMGP
(A) Panel 1 (leftmost): proportions of the 30 RT states on the entire genome. The RT states are categorized into five groups: conserved early (E), weakly conserved early (WE), weakly conserved late (WL), conserved late (L), and other stages (NC), respectively. Panel 2: patterns of the 30 states. Each row of the matrix corresponds to the state at the same row in panel 1, and columns are species. Each entry represents the median of the RT signals of the corresponding species in the associated state. Panel 3: enrichment of different types of histone marks and CTCF binding site (higher fold change represents higher enrichment). Panel 4: enrichment of subcompartment A1, A2, B1, B2, and B3. (B) Four examples of RT signal distributions in states with different patterns (state 1: E; state 5: L; state 22: WE; state 9: NC with human-chimpanzee-specific early RT). (C) Comparison of predicted RT patterns with the constitutively early/late RT regions identified across cell types. (D) Examples of different RT states and RT groups in five species predicted by Phylo-HMGP. TADs called by the Directionality Index method are shown at the top. See also Figures S2–S7. Cell Systems 2018 7, e11DOI: ( /j.cels ) Copyright © 2018 The Author(s) Terms and Conditions
6
Figure 4 Comparisons between the RT Evolution Patterns and Other Genomic Features (A) Example gene ontology (GO) analysis results of states 9, 11, and 14. (B) Percentages of the distances between TAD boundaries and boundaries of predicted states in different intervals. The expected distances are calculated based on randomly shuffled TADs. Two types of TADs from different methods are used, namely TADs called by the Directionality Index method and TADs called by Arrowhead. (C) Transposable element enrichment in different RT states. (D) Motif enrichment in different lineage-specific RT states. State 9: human-chimpanzee-specific early RT. State 11: human-chimpanzee-orangutan-specific early RT. State 14: orangutan-specific early RT. State 18: green monkey-specific early RT. See also Figure S2, Tables S3 and S4. The GO analysis results of other lineage-specific RT states are included in Table S4. Cell Systems 2018 7, e11DOI: ( /j.cels ) Copyright © 2018 The Author(s) Terms and Conditions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.