Download presentation
Presentation is loading. Please wait.
Published byJayde Benham Modified over 9 years ago
1
An Association Analysis Approach to Biclustering e-mail: gaurav@cs.umn.edu website: http://www.cs.umn.edu/~kumar/dmbiogaurav@cs.umn.eduhttp://www.cs.umn.edu/~kumar/dmbio Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers and Vipin Kumar Department of Computer Science and Engineering, University of Minnesota MOTIVATION REFERENCES This work has been supported by NSF grants #CRI-0551551, #IIS-0308264 and #ITR-0325949. Computational resources for this work were provided by MSI. Computational Approaches for Protein Function Prediction: A Survey, Gaurav Pandey, Vipin Kumar, Michael Steinbach, Technical Report 06-028, October 2006, Department of Computer Science, University of Minnesota Association Analysis-based Transformations for Protein Interaction Networks: A Function Prediction Case Study, Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, Vipin Kumar, Proceedings of ACM KDD, pp 540-549, 2007 Association Analysis for Real-valued Data: Definitions and Application to Microarray Data, Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Vipin Kumar, TR 08-007, Department of Computer Science, University of Minnesota, 2008 H. Xiong, G. Pandey, M. Steinbach, and V. Kumar. Enhancing data analysis with noise removal. IEEE Transactions on Knowledge and Data Engineering, 18(3):304–319, 2006. H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and S. R. Holbrook. Identification of functional modules in protein complexes via hyperclique pattern discovery. In Proc. Pacific Symposium on Biocomputing (PSB), pages 221–232, 2005. H. Xiong, P.-N. Tan, and V. Kumar. Hyperclique pattern discovery. Data Min. Knowl. Discov., 13(2):219–242, 2006. ACKNOWLEDGEMENTS Pruned supersets Found to be Infrequent APPROACH RESULTS Functional enrichment for large classes (31-500 members) Functional enrichment for small classes (1-30 members) Fraction of patterns (biclusters) enriched by several groups of small classes at p-value 1x10 -5 Fraction of class covered by patterns (biclusters) among several groups of small classes at p-value 1x10 -5 Constant row (column) biclusters Constant addition biclusters Bicluster: Group of objects showing similarity over only a subset of the features in a data set. Problem studied extensively for microarray data for finding various type of biclusters Finds more functionally enriched groups of genes than hierarchical clustering [Prelic et al, 2006] Constant addition biclusters Constant value biclusters Define an objective function/measure for coherence of a bicluster Reorder rows and columns for global minimum Eliminate rows and columns for local minimum Eliminate rows and columns from random seed Coclustering Cheng & Church (CC) ISA Common Issues CURRENT BICLUSTERING APPROACHES Non-exhaustive Heuristic search scheme doesn’t enumerate all biclusters satisfying the specified condition Bias towards larger biclusters Objective function/measure satisfied early Non-overlapping biclusters (some) Madeira & Oliveira, 2004 Association patterns are biclusters! Range Support: An anti-monotonic support measure for real-valued data! Constraints imposed: Consistency of expression values Same direction of expression These conditions satisfied over substantial number of conditions Can be used within an Apriori-like framework [Agrawal et al. 1994] Implementation at http://www.cs.umn.edu/vk/gaurav/rap.http://www.cs.umn.edu/vk/gaurav/rap Advantages Disadvantage Exhaustive (and efficient) discovery of biclusters. Can discover small biclusters owing to bottom-up search procedure. Need to binarize or discretize the original real- valued data set which causes a loss of information [Becquet et al, 2002; Creighton et al, 2003; McIntosh et al, 2007]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.