Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Exhaustive Signature Algorithm
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Mutual Information Mathematical Biology Seminar
Evaluation and optimization of clustering in gene expression data analysis A. Fazel Famili, Ganming Liu and Ziying Liu National Research Council of Canada.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Unsupervised clustering in mRNA expression profiles D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department.
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Inferences About Process Quality
Supplementary Material Supplementary Tables Supplementary Table 1. Sequencing statistics for ChIP-seq samples. Supplementary Table 2. Pearson correlation.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Presentation 12 Chi-Square test.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
14 Elements of Nonparametric Statistics
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Cluster validation Integration ICES Bioinformatics.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
1/39 Motion Adaptive Search for Fast Motion Estimation 授課老師:王立洋老師 製作學生: M 蔡鐘葳.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.
Cluster Analysis of Gene Expression Profiles
SEG5010 Presentation Zhou Lanjun.
Predicting Gene Expression from Sequence
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Fig. 2 E2F1 affects alternative splicing of E2F target genes.
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Inferring Cellular Processes from Coexpressing Genes
Presentation transcript:

Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De

Outline  Introduction  Bi-correlation clustering algorithm (BCCA)  Results  Conclusion

Introduction  Biclustering  Performs simultaneous grouping on genes and conditions of a dataset to determine subgroups of genes that exhibit similar behavior over a subset of experimental condition.  A new correlation-based biclustering algorithm called bi-correlation clustering algorithm (BCCA)  Produce a diverse set of biclusters of co-regulated genes  All the genes in a bicluster have a similar change of expression pattern over the subset of samples.

Introduction  Cluster analysis  Most cluster analysis try to find group of genes that remains co-expressed through all experimental conditions.  In reality, genes tends to be co-regulated and thus co-expressed under only a few experimental conditions.

Bi-correlation clustering algorithm  Notation  A set of n genes  Each gene has m expression values  For each gene g i there is an m-dimensional vector, there is the j-th expression value of g i.  A set of m microarry experiments (measurements)  n genes will have to be grouped into K overlapping biclusters

Bi-correlation clustering algorithm  Bicluster:  A bicluster can be defined as a subset of genes possesing a similar behavior over a subset of experiments  Represented as  A bicluster contains a subset of genes and a subset of experiments where each gene in is correlated with a correlation valued greater than or equal to specified threshold, with all other genes in over the measurements in.

Bi-correlation clustering algorithm  BCCA  Use person correlation coefficient for measuring similarity between expression patterns of two genes and.

Bi-correlation clustering algorithm  Step 1:  The set of bicluster S is initialized to NULL and number of bicluster Bicount is initialized to 0  Step 2A  BCCA generate a bicluster (C) for each pair of genes in a dataset under a set of conditions  For each pair of genes.BCCA creates a bicluster, where and.

Bi-correlation clustering algorithm  In step 2C:  For a pair of genes in C, if then a sample is detected from C, deletion of which caused maximum increase in correlation value between and.  If being a threshold, the sample is deleted from. otherwise, C is discarded.  Deletion of a measurement for which genes differ in expression value the most will result in the highest increase in correlation value.  BCCA deletes one measurement at a time from.

Bi-correlation clustering algorithm  In step 2D(a):  Other genes from, which satisfy the definition of a bicluster are included in C for its augmentation.  In step 2D(b):  Whether present bicluster C has been found. If it is so then we do not to include C, otherwise, C is considered as a new bicluster.

Bi-correlation clustering algorithm

Results  Datasets  We demonstrate the affectiveness of BCCA in determining a set of co-regulated genes (i.e. the genes having common transcription factors) and functionally enriched clusters (and atributes) on five dataset

Results  Variation with respect to threshold  Plot of YCCD dataset : Average number of functionally enriched attributes (computed using P-values) versus correlation threshold value

Results  Follow a guideline on this value from a previous study by Allocco et al. (2004) which has concluded that if two genes have a correlation between their expression profiles >0.84 then therre is >50% chance of being bounded by a common transcription factor.

Results  By locating common transcription factors  At first, we only consider those biclusters that have less than or equal to 50 genes.  Use a software TOUCAN 2 (Aerts et al., 2005) for performance comparison by extracting information on the number of transcription factors present in proximal promoters of all the genes in a single bicluster.  Presence of common transcription factors in the promoter regions of a set of genes is a good evidence toward co-regulation.

Results

Sequences of all the five genes found in a bicluster generated by BCCA from SPTD dataset. Any transcription factor may be found present in more than one location in upstream region.

Results  Functional enrichment :  P-value  The functional enrichment of each GO category in each of the bicluster  employed the software Funcassociate (Berriz et al., 2003).  P-value represents the probability of observing the number of genes from a specific GO functional category within each cluster.  A low P-value indicates that the genes belonging to the enriched functional categories are biologically significant in the corresponding clusters.

Results  P-value of a functional category  Suppose we have total population of N genes, in which M has a particular annotation.  If we observe x genes with that annotation, in a sample of n genes, then we can calculate the probability of that observation.  The probability of seeing x or more genes with an annotation, out of n, given that M in the population of N have that annotation

Results  Only functional categories with are reported.  Analysis of the 10 biclusters obtained for the YCCD, the highly enriched category in bicluster Bicluster 1 is the ‘ribosome’ with P-value of

Results

Conclusion  BCCA is able to find a group of genes that show similar pattern of variation in their expression profiles over a subset of measurements.  Better than other biclustering algorithm:  Find higher number of common transcription factors of a set of gene in a bicluster  More functionally enriched