Yamanishi, M., Itoh, M., Kanehisa, M.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Outlines Background & motivation Algorithms overview
Chapter 17 Overview of Multivariate Analysis Methods
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Principal Component Analysis
Independent Component Analysis (ICA)
Factor Analysis There are two main types of factor analysis:
Bioinformatics and Phylogenetic Analysis
Independent Component Analysis (ICA) and Factor Analysis (FA)
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Discriminant Analysis Testing latent variables as predictors of groups.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Segmentation Analysis
The Science of Life Biology unifies much of natural science
Summarized by Soo-Jin Kim
Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Element 2: Discuss basic computational intelligence methods.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
1/17 Identification of thermophilic species by the amino acid compositions deduced from their genomes Reporter: Yu Lun Kuo
Chapter 17 Partial Correlation and Multiple Regression and Correlation.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Feature Selection and Extraction Michael J. Watts
Factor & Cluster Analyses. Factor Analysis Goals Data Process Results.
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Chapter 10 Canonical Correlation Analysis. Introduction Canonical correlation analysis focuses on the correlation between a linear combination of the.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
FACTOR ANALYSIS CLUSTER ANALYSIS Analyzing complex multidimensional patterns.
Data statistics and transformation revision Michael J. Watts
Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective) Community ecology (lack of alternatives)
Principal Component Analysis
Phylogeny and the Tree of Life
bacteria and eukaryotes
Principal Component Analysis (PCA)
Unsupervised Learning
PREDICT 422: Practical Machine Learning
FLiPS Functional Linkage Prediction Service.
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Gene Family Ancestral State Phylogenetic Profiling
Residuals and Residual Plots
A Fast Fixed-Point Algorithm for Independent Component Analysis
Gautam Dey, Tobias Meyer  Cell Systems 
Restructuring Sparse High Dimensional Data for Effective Retrieval
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Unsupervised Learning
What is Artificial Intelligence?
Presentation transcript:

Yamanishi, M., Itoh, M., Kanehisa, M. Extraction of Organism Groups from Phylogenetic Profiles Using Independent Component Analysis Yamanishi, M., Itoh, M., Kanehisa, M. Genome Informatics 13: 61-70, 2002 Summarized by Jeong-Ho Chang

Goal: extract organism groups and their hierarchy from phylogenetic profiles using ICA. Find independent components that characterize major organism groups. Identify genes that are characteristic to each organism group. Grouping of organisms Phylogenetic profiles Extraction of Independent Components Hierarchical clustering of organisms Gene identification

Phylogenetic profiles Definition: a bit pattern that encodes the presence or absence of conserved (orthologous) genes in a set of organisms. Application: Functional prediction of genes: when two genes share similar phylogenetic profiles, it is assumed that these genes are functionally correlated. Construction of genome trees: stems from the assumption that gene losses or acquisitions are major evolution phenomena. 1 … G1: G2: O1 O2 O3 O4 ON

Independent Component Analysis ICA A linear transformation method in the field of statistics and signal processing. Represent a set of variables as a linear combination of latent variables which are statistically independent each other. IC score

Experiments Data set Grouping of organisms Phylogenetic profiles constructed from 2875 orthologous genes in 77 organisms. KEGG/GENES database as of May 2002. 6 eukaryotes, 13 archaea, and 58 bacteria. Grouping of organisms 2875 x 77  2875 x 18 For the interpretation of biological meanings of each ICs, correlation coefficients for all combinations of 77 organisms and 18 ICs were computed.

74 organisms were well represented by the 9 ICs. 9 out of 18 components were well correlated with specific organism groups. 74 organisms were well represented by the 9 ICs. Exception: Deinococcus raiodurans, Aquifex aeolicus, Thermotoga maritima

Hierarchy of organism groups Original data set  result of ICA. Distance in original data set: hamming distance. Distance in reduced set: correlation coefficient. In case of the reduced set, only 9 ICs are used. Complete linkage hierarchical clustering.

Identification of Genes The result of ICA can be used to identify genes that are clustered at high and low scores along each independent component.

Discussion Proposed to use the ICA for extraction of organism groups from phylogenetic profiles. ICA is an appropriate method to detect biological features ICA attempts to maximize nongaussianity. PCA attempts to maximize variance  interrupt the process of detecting biologically meaningful features. Future works The development from “independent” components to “tree” components. Incorporating phylogenetic tree structure for the similarity of two phylogenetic profiles.