ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Outlines Background & motivation Algorithms overview
Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Mutual Information Mathematical Biology Seminar
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Microarray GEO – Microarray sets database
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University.
Inferring Function From Known Genes Naomi Altman Nov. 06.
Dr Paul Lewis Lecturer in Bioinformatics Lecturer in Bioinformatics Cardiff University Cardiff University Biostatistics & Bioinformatics Unit Biostatistics.
Gene expression analysis
A Short Overview of Microarrays Tex Thompson Spring 2005.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University DATA WAREHOUSE FOR BIO-GEO HEALTH CARE.
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Statistical Testing with Genes Saurabh Sinha CS 466.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
Flat clustering approaches
Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
1 Department of Engineering, 2 Department of Mathematics,
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Gene expression analysis
Presentation transcript:

ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department of Electrical Engineering, 2 Department of Computer Science, Stanford University 1. ABSTRACT To cluster genes from DNA microarray, an unsupervised methodology using independent component analysis (ICA) is proposed. Based on an ICA mixture model of genomic expression patterns, linear and nonlinear ICA finds components that are specific to certain biological processes. Genes that exhibit significant up-regulation or down- regulation within each component are grouped into clusters. We test the statistical significance of enrichment of gene annotations within each cluster. ICA-based clustering outperformed other leading methods in constructing functionally coherent clusters on various datasets. This result supports our model of genomic expression data as composite effect of independent biological processes. Comparison of clustering performance among various ICA algorithms including a kernel-based nonlinear ICA algorithm shows that nonlinear ICA performed the best for small datasets and natural-gradient maximization-likelihood worked well for all the datasets. 2. GENE EXPRESSION MODEL Expression pattern of genes in a certain condition is a composite effect of independent biological processes that are active in that condition. For example, suppose that there are 9 genes and 3 biological processes taking place inside a cell. Gene 1 Gene 2 Gene 3Gene 4 Gene 5 Gene 6 Gene 7Gene 8Gene 9 Ribosome Biosynthesis Oxidative Phosphorylation Gene 1 Gene 2 Gene 3Gene 4 Gene 5 Gene 6 Gene 7Gene 8Gene 9 Cell Cycle Regulation Gene 1 Gene 2 Gene 3Gene 4 Gene 5 Gene 6 Gene 7Gene 8Gene 9 Cell Cycle Regulation Oxidative Phosphorylation Ribosome Biosynthesis In an Experimental Condition Observed genomic expression pattern can be seen as a combinational effect of genomic expression programs of biological processes that are active in that condition. Genome messenger RNA Each biological process becomes active by turning on genes associated with the processes. We can measure expression level of genes using Microarray. 3. Microarray Data Microarray Data display expression levels of a set of genes measured in various experimental conditions. Expression Levels of aGene G i across Experimental Conditions Expression Patterns of Genes under an Experimental Condition Exp i Exp 1 Exp 2 Exp 3 Exp i Exp M G 1 G 2 G N-1 G N Examples Heat shock, G phase in cell cycle, etc … conditions Liver cancer patient, normal person, etc … samples 4. Mathematical Modeling The expression measurement of K genes observed in three conditions denoted by x 1, x 2 and x 3 can be expressed as linear combinations of genomic expression programs of three biological processes denoted by s 1, s 2 and s 3. Ribosome Biogenesis Oxidative Phosphorylation Heat Shock Starvation Hyper-Osmotic Shock Unknown Mixing System Cell Cycle Regulation Genomic Expression Programs of Biological Processes Genomic Expression Pattern in Certain Experimental Conditions Given a microarray dataset, can we recover genomic expression programs of biological processes? In other words, can we decompose a matrix X into A and S so that each row of S represents a genomic expression program of a biological process? 5. ICA Algorithm Using the log-likelihood maximization approach, we can find W that maximizes log-likelihood L(y,W). 8. Microarray Datasets For testing, five microarray datasets were used and for each dataset, the clustering performance of our approach was compared with another approach applied to the same dataset. y i ’s are assumed to be statistically independent Prior information on y Super-Gaussian or Sub-Gaussian ? 7. Measuring significance of ICA-based clusters Statistical significance of biological coherence of clusters was measure using gene annotation databases like Gene Ontology (GO). 6. ICA-based Clustering Step 1 Apply ICA to microarray data X to obtain Y Step 2 Cluster genes based on independent components, rows of Y. Based on our gene expression model, Independent Components y 1,…, y n are assumed to be expression programs of biological processes. For each y i, genes are ordered based on activity levels on y i and C% (C=7.5) showing significantly high/low level are grouped into each cluster. Cluster 1 Cluster 2 Cluster 3 Cluster n GO 1 GO 2 GO i GO m Clusters from ICA GO categories Cluster i GO j k genes For every combination of our cluster and a GO category, we calculated the p-value, a change probability that these two clusters share the observed number of genes based on the hypergeometric distribution. g: # of genes in all clusters and GOs f: # of genes in the GO j n: # of genes in the Cluster i k: # of genes GO j and Cluster i share 9. Results For each method, the minimum p-values (<10 -7 ) corresponding to each GO functional class were collected and compared. IDDescription Genes Exps Compared with D1 Yeast during cell cycle PCA D2Yeast during cell cycle k-means clustering D3Yeast under stressful conditions Bayesian approach Plaid model D4C.elegans in various conditions Topomap approach D519 kinds of normal Human tissue PCA