Unsupervised clustering in mRNA expression profiles D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Machine Learning and Data Mining Clustering
Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Mutual Information Mathematical Biology Seminar
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Clustering with FITCH en UPGMA Bob W. Kooi, David M. Stork and Jorn de Haan Theoretical Biology.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Evaluating Performance for Data Mining Techniques
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Clustering.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Principal Components Analysis ( PCA)
Unsupervised Classification
Sparse nonnegative matrix factorization for protein sequence motifs information discovery Presented by Wooyoung Kim Computer Science, Georgia State University.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
MicroArray Data Analysis Candice Quadros & Amol Kothari.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Machine Learning and Data Mining Clustering
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Unsupervised Learning - Clustering 04/03/17
Molecular Classification of Cancer
Unsupervised Learning - Clustering
Dimension reduction : PCA and Clustering
Unsupervised Learning: Clustering
Presentation transcript:

Unsupervised clustering in mRNA expression profiles D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR Patras, Greece University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR Patras, Greece Computers in Biology and Medicine In Press, Corrected Proof, Available online 24 October 2005

K-Windows Clustering Adaptation of K-means, originally proposed in 2002 by Vrahatis et. al. Windowing technique improves speed and accuracy Tries to place a d-dimensional window (box) containing all patterns that belong to a single cluster

K-Windows – Basic Concepts Move windows to find cluster centers (fig a) 1.Select k points as centers of d-windows of size a. 2.Window means becomes new center. 3.Repeat until stopping criterion (movement of center). Enlarge windows to determine cluster edges (fig b) 1.Enlarge one dimension by a specified percent. 2.Relocate window as above. 3.Keep only if increase in instances in window exceeds threshold

Unsupervised K-Windows (UKW) Start with sufficiently large number of windows Merge to automatically determine the number of clusters For each pair of overlapping windows, calculate proportion of overlap for each window. a)Large overlap, considered same cluster, W1 is deleted. b)Many points in common, considered the same cluster. c)Low overlap, considered two different clusters.

Experimental Setup Leukemia dataset – well characterized Default UKW parameters used Supervised dimension reduction –Two previously published gene subsets and their union Unsupervised dimension reduction –Biclustering with UKW –PCA –PCA and UKW hybrid

Supervised Feature Selection Use two gene subsets selected in previously published papers using supervised techniques. All algorithms did best on combined set, results below.

Unsupervised Feature Selection (Biclustering Technique) Apply UKW to cluster genes, select one gene, closest to cluster center, as representative from each cluster. Apply UKW to samples, using those genes (239). UKW accuracy: 93.6% (ALL) and 76% (AML) No results reported for other algorithms

Unsupervised Feature Selection (PCA Techniques) PCA and scree plot to reduce features –Poor Performance Hybrid PCA and UKW method –Partition genes using UKW –Transform each partition using PCA –Select representative factors from each cluster –UKW accuracy: 97.87% (ALL) and 88% (AML)

UKW Results Summary DatasetALL AccuracyAML Accuracy Published Gene Subsets (Supervised) 90%100% UKW Biclustering (Unsupervised) 93.6%76% PCA (Unsupervised) N/A PCA-UKW Hybrid (Unsupervised) 97.87%88%

Default parameters –initial window size a=5a=5 –enlargement threshold θe=0.8θe=0.8 –merging threshold θm=0.1θm=0.1 –coverage threshold θc=0.2θc=0.2 –variability threshold θv=0.02θv=0.02 Link to article