Download presentation
Presentation is loading. Please wait.
1
Yamanishi, M., Itoh, M., Kanehisa, M.
Extraction of Organism Groups from Phylogenetic Profiles Using Independent Component Analysis Yamanishi, M., Itoh, M., Kanehisa, M. Genome Informatics 13: 61-70, 2002 Summarized by Jeong-Ho Chang
2
Goal: extract organism groups and their hierarchy from phylogenetic profiles using ICA.
Find independent components that characterize major organism groups. Identify genes that are characteristic to each organism group. Grouping of organisms Phylogenetic profiles Extraction of Independent Components Hierarchical clustering of organisms Gene identification
3
Phylogenetic profiles
Definition: a bit pattern that encodes the presence or absence of conserved (orthologous) genes in a set of organisms. Application: Functional prediction of genes: when two genes share similar phylogenetic profiles, it is assumed that these genes are functionally correlated. Construction of genome trees: stems from the assumption that gene losses or acquisitions are major evolution phenomena. 1 … G1: G2: O1 O2 O3 O4 ON
4
Independent Component Analysis
ICA A linear transformation method in the field of statistics and signal processing. Represent a set of variables as a linear combination of latent variables which are statistically independent each other. IC score
5
Experiments Data set Grouping of organisms
Phylogenetic profiles constructed from 2875 orthologous genes in 77 organisms. KEGG/GENES database as of May 2002. 6 eukaryotes, 13 archaea, and 58 bacteria. Grouping of organisms 2875 x 77 2875 x 18 For the interpretation of biological meanings of each ICs, correlation coefficients for all combinations of 77 organisms and 18 ICs were computed.
6
74 organisms were well represented by the 9 ICs.
9 out of 18 components were well correlated with specific organism groups. 74 organisms were well represented by the 9 ICs. Exception: Deinococcus raiodurans, Aquifex aeolicus, Thermotoga maritima
7
Hierarchy of organism groups
Original data set result of ICA. Distance in original data set: hamming distance. Distance in reduced set: correlation coefficient. In case of the reduced set, only 9 ICs are used. Complete linkage hierarchical clustering.
9
Identification of Genes
The result of ICA can be used to identify genes that are clustered at high and low scores along each independent component.
11
Discussion Proposed to use the ICA for extraction of organism groups from phylogenetic profiles. ICA is an appropriate method to detect biological features ICA attempts to maximize nongaussianity. PCA attempts to maximize variance interrupt the process of detecting biologically meaningful features. Future works The development from “independent” components to “tree” components. Incorporating phylogenetic tree structure for the similarity of two phylogenetic profiles.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.