Gene Clustering Using Self- Organizing Maps and Particle Swarm Optimization R. Earl Lewis, Jr. CMSC 838 Presentation.

Slides:

Advertisements

Similar presentations

Clustering Basic Concepts and Algorithms

Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,

Self Organization of a Massive Document Collection

Self Organization: Competitive Learning

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Kohonen Self Organising Maps Michael J. Watts

Artificial neural networks:

Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.

Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Decision Support Systems

Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

CS Instance Based Learning1 Instance Based Learning.

Neural Networks Lecture 17: Self-Organizing Maps

Lecture 09 Clustering-based Learning

Radial Basis Function (RBF) Networks

Evaluating Performance for Data Mining Techniques

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.

CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.

KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.

Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.

CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.

Chapter 9 Neural Network.

Artificial Neural Network Unsupervised Learning

Swarm Intelligence 虞台文.

A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.

A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.

NEURAL NETWORKS FOR DATA MINING

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Particle Filters for Shape Correspondence Presenter: Jingting Zeng.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.

Applying Neural Networks Michael J. Watts

Hierarchical Clustering of Gene Expression Data Author : Feng Luo, Kun Tang Latifur Khan Graduate : Chien-Ming Hsiao.

Last lecture summary. SOM supervised x unsupervised regression x classification Topology? Main features? Codebook vector? Output from the neuron?

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

IE 585 Competitive Network – Learning Vector Quantization & Counterpropagation.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

381 Self Organization Map Learning without Examples.

Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.

CHAPTER 14 Competitive Networks Ming-Feng Yeh.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Evolutionary Computation Evolving Neural Network Topologies.

Big data classification using neural network

Data Mining, Neural Network and Genetic Programming

PSO -Introduction Proposed by James Kennedy & Russell Eberhart in 1995

Meta-heuristics Introduction - Fabien Tricoire

Lecture 22 Clustering (3).

Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.

Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee

Feature mapping: Self-organizing Maps

Artificial Neural Networks

Unsupervised Networks Closely related to clustering

Presentation transcript:

Gene Clustering Using Self- Organizing Maps and Particle Swarm Optimization R. Earl Lewis, Jr. CMSC 838 Presentation

CMSC 838T – Presentation Talk Overview u Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization u Authors: Xiang Xiao, Ernst Dow, Russell Eberhart, Zina Miled, Robert Oppelt u Overview of talk  Motivation  Techniques  Evaluation  Related Work  Observations

CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Produce Better Clustering of Gene Datasets  Determine if Unsupervised Learning Using Neural Network Self-Organizing Map (SOM) Algorithm Produces Better Results When Used With Particle Swarm Optimization (PSO)  Verify Value of Using Conscience Factors with SOM  Confirm Benefit of Parallel Implementation of Resampling

CMSC 838T – Presentation Motivation u Why do we care  Gene Clustering Computational Intelligence Methods are Essential to the Analysis of Gene Expression Data  Use of Conscience Factor Could Reduce the Epochs and Produce a More Robust Solution  The Parallel Implementation of Resampling May Improve Execution Times and Allow Robustness to be Evaluated for Larger Data Sets and an Increased Number of Patterns

CMSC 838T – Presentation Techniques u Approach Defined Techniques to Be Studied >SOM: high dimensional datasets projected to one/two dimensional space. Unsupervised learning process. >Particle Swarm Optimization: evolutionary computational method. Update current solution using information obtained for entire population of solutions. >Conscience: trying to obtain a better approximation of the pattern of distribution in the dataset. Assigns each output neuron a bias so each component has the same chance to win. >Resampling: Measures robustness of clustering result using 60% of original data. Measure mean MERIT (lower is better) after resampling 20 to 100 times. Main Intuition Behind Approach Particle Swarm Optimization had not been used to cluster gene expression data in the past. How will results stack up to other clustering algorithms such as hierarchical, principle component analysis, genetic algorithms, and artificial neural networks.

CMSC 838T – Presentation Techniques u Algorithm > SELF-ORGANIZING MAP (SOM) -Neural Networks are computer programs designed to recognize patterns and learn like the human brain. Used for prediction and classification. Iteratively determine best weights. (input/hidden/output layers) -SOMs developed by Teuvo Kohonen in early 1980s -Colors used to indicate clusters. -Software: Viscovery, SOM_PAK (public domain)

CMSC 838T – Presentation Techniques u Algorithm > PARTICLE SWARM OPTIMIZATION (PSO) -PSOs are an evolutionary computation method. Trying to find an optimal or near optimal solution. Each particle has set of attributes: current velocity and position, best position discovered by particle and neighbors. Randomly initialized velocity and position. Updated using: Vi,n(t+1)=w*Vi,n(t) +c1*(Gi(t)-Xi,n(t)) +c2*(li,n(t)-Xi,n(t)) Xi,n(t+1) = Xi,n(t) + Vi,n(t +1) Where w is the inertia weight, c1 & c2 are random numbers, and Gi is the best particle found so far within the neighbors and li,n is the best position discovered so far by the corresponding particle.

CMSC 838T – Presentation Techniques u Algorithm > CONSCIENCE -Conscience directs each component that takes part in competitive learning toward having the same probability to win. Conscience is added to SOM by assigning each output neuron a bias. The output neuron must overcome its own bias to win. The objective is to obtain a better approx. of pattern distrib. An intermediary parameter, Yi is calculated for the ith output neuron as follows: Yi = 1 : ith output neuron is the winner, 0 : ith output neuron is not the winner Then the bias factor Pi and the final biases Bi are calculated: Pi(new) = Pi(old) + B(Yi – Pi(old)) and Bi = C(1/N – Pi) Where N is the number of output neurons, and B and C are two user selected parameters.

CMSC 838T – Presentation Techniques u Algorithm > RESAMPLING -The patterns that are in the same cluster in the original clustering should also be in the same cluster based on the clustering subset resampling. This is measured by the MERIT function. MERIT = SQRT(Sum(j)Sum(i) (Tij(u)-Tij)sqrd)/No. Patterns in selected subset) Tij(u) is and element in the original similarity matrix and Tijis an element in the resampled similarity matrix. T(ij) = 1: pattern I and j are in the same cluster, 0: pattern I and j not in same cluster. The smaller the value of MERIT the more robust the algorithm is.

CMSC 838T – Presentation Techniques u Algorithm Using Yeast Dataset (6554 gene expression profiles) & Rat Dataset(4116 gene expression profiles)  Steps of algorithm used > Stage1: weights are trained using SOM > Stage2: weights are optimized using PSO to refine clustering > Stage3: weights are trained using standalone PSO > Stage4: for each yeast and rat dataset, repeated resampling process 20 times. Avg MERIT was calculated for each cluster size >Stage 5: used MERIT analysis to select appropriate cluster size, 8 x 8 was the best choice due to robustness for size. >Stage 6: compared results of three methods >Stage 7: recalculated weights using SOM with conscience/repeated PSO >Stage 8: repeated resampling to recalculate MERIT for conscience method >Stage 9: compared results with and without conscience for three methods >Stage 10: compared SOM with other clustering methods

CMSC 838T – Presentation Evaluation u Experimental environment  Input data sets >Yeast: 6554 gene expression profiles each profile normalized to unit length so comparisons made on basic shape and relative heights >Rat: 4116 gene expression profiles same methodology as yeast  Hardware platforms Linux cluster with 1 master node and 2 slave nodes >Master Node:1 Pentium III 1.2 GHz with 1024 Mbytes RAM >Slave Node: each had 2 Pentium III 1.2 GHz and 1024 Mbytes RAM > Resampling: Parallel virtual machine environment  Software environment > SOM, PSO

CMSC 838T – Presentation Evaluation u Evaluation results  Performance of techniques >For the Rat Dataset, the SOM and SOM/PSO Clustering Results Were Essentially the Same >For the Yeast Dataset SOM/PSO Produced Better Clustering Results. See Table Below. Method Cluster No. Size of Cluster Number of Matches SOM PSO SOM/PSO

CMSC 838T – Presentation Evaluation u Evaluation results  Performance of techniques >For the Rat and Yeast Dataset, the SOM and SOM w/Consciencealgorithm were compared >For both datasets, conscience reduced the number of epochs See Table Below. Data Set SOM w/o Conscience SOM w/ Conscience Rat Yeast

CMSC 838T – Presentation Evaluation u Evaluation results  Performance of techniques >For the Rat and Yeast Dataset, the SOM and SOM/PSO w/Conscience algorithm showed improved MERIT. >For both datasets, conscience and the parallel implementation reduced execution time and improved robustness as measured by MERIT during resampling. See Table Below. Data Set SOM w/o Conscience SOM w/ Conscience SOMSOM/PSOSOMSOM/PSO Rat Yeast

CMSC 838T – Presentation Related Work u Similar / previous approaches  The authors compared the SOM approach with other techniques based on a referenced study using 252 data sets.  SOM outperformed hierarchical clustering for 191 data sets by having higher accuracy and being more robust.  Hierarchical clustering algorithms produce a hierarchy of nested clusterings. It starts with one cluster containing all items and then splits.  THE authors used a second reference study to compare SOM with k-means, partitioning around medroids, etc… They produced similar results.

CMSC 838T – Presentation Observations u Your observations  SOM is useful but the usefulness of the technique in combination with PSO is questionable based on the results of this analysis. The MERIT for SOM/PSO was not better than SOM alone.  The use of Conscience is valuable as a competitive learning technique that reduces the number of epochs necessary to produce a robust solution. Allows larger data set to be analyzed.  The authors did not do a good job of comparing the results documented in the paper with other techniques. They just referenced other papers. These papers conducted more generic comparisons.  Statement Made Concerning SOM comparisons in referenced article: “ Since the number of outputs was limited to the number of known clusters, and linear topology was chosen, the conscience probably would not have been useful.” An example of weak analysis.  The use of SOM with PSO produced no significant improvement over the previous work.  The technique could be improved and the analysis may be more convincing if they significantly increased the number of data sets they compared against each process. The article they referenced ran comparisons and made hypothesizes based on 252 data sets compared to 2 for this articles analysis.