Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Algorithms and applications

Artificial Neural Networks (1)
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
PARTITIONAL CLUSTERING
An Introduction of Support Vector Machine
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Adaptive fuzzy clustering Kharkiv National University of Radio Electronics Control Systems Research Laboratory.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Data Clustering Methods
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Aula 4 Radial Basis Function Networks
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function (RBF) Networks
Collaborative Filtering Matrix Factorization Approach
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
Artificial Neural Network Unsupervised Learning
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
CS 478 – Tools for Machine Learning and Data Mining SVM.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
381 Self Organization Map Learning without Examples.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
Thanh Le, Katheleen J. Gardiner University of Colorado Denver
3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron.
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Introduction of Fuzzy Inference Systems By Kuentai Chen.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Fuzzy Logic in Pattern Recognition
CSIE Dept., National Taiwan Univ., Taiwan
Clustering (3) Center-based algorithms Fuzzy k-means
Data Mining Lecture 11.
Chapter 12 Advanced Intelligent Systems
Critical Issues with Respect to Clustering
Collaborative Filtering Matrix Factorization Approach
SMEM Algorithm for Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Presentation transcript:

Adaptive Clustering Of Incomplete Data Using Neuro-fuzzy Kohonen Network

Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic fuzzy clustering algorithm for data with missing values Adaptive algorithm for possibilistic fuzzy clustering Adaptive algorithm for possibilistic fuzzy clustering of data with missing values

Introduction The clustering problem for multivariate observations often encountered in many applications connected with Data Mining and Exploratory Data Analysis. Conventional approach to solving these problems requires that each observation may belong to only one cluster. There are many situations when a feature vector with different levels of probabilities or possibilities can belong to several classes. This situation is the subject of fuzzy cluster analysis, intensively developing today. In many practical tasks of Data Mining, including clustering, data sets may contain gaps, information in which, for whatever reasons, is missing. More effective in this situation are approaches based on the mathematical apparatus of Computational Intelligence and first of all artificial neural networks and different modifications of classical fuzzy c- means (FCM) method.

Processing of data with gaps Data Set With Gaps Filling in missing values ​​using specialized algorithm Clustering Data Set

Algorithms for filling the missing values

Data with gaps clustering on the basis of neuro-fuzzy Kohonen network The clustering of multivariate observations problem often occurs in many applications associated with Data Mining. The traditional approach to solving these tasks requires that each observation may relate to only one cluster, although the situation more real is when the processed feature vector with different levels of probabilities or possibilities may belong more than one class. [Bezdek, 1981; Hoeppner 1999; Xu, 2009]. Notable approaches and solutions are efficient only in cases when the original array data set has batch form and does not change during the analysis. However there is enough wide class of problems when the data are fed to the processing sequentially in on-line mode as this occurs when training Kohonen self-organizing maps [Kohonen, 1995]. In this case it is not known beforehand which of the processed vector-images contains missing values. This work is devoted to solving the problem on-line clustering of data based on the Kohonen neural network, adapted for operation in presence of overlapping classes.

Adaptive algorithm for probabilistic fuzzy clustering   1 … p j n x11 x1p x1j x1n i xi1 xip xij xin k xk1 xkp xkj xkn N xN1 xNp xNj xNn  ”object-property“ table (1) 7/22

The steps algorithm wq Introducing the objective function of clustering (3) (4) (5)

Adaptive probabilistic fuzzy clustering algorithm for data with gaps In the situation if the data in the array contain gaps, the approach discussed above should be modified accordingly. For example, in [Hathaway, 2001] it was proposed the modification of the FCM-procedure based on partial distance strategy (PDS FCM).

Processing of data with gaps Data Set With Gaps Filling in missing values ​​using specialized algorithm Data Set Clustering

1.Partition of original data set into classes 2. Centered 3. Standardized Data set

Objective function of clustering (6) Objective function of clustering (7) here

Adaptive probabilistic fuzzy clustering algorithm for data with gaps (8) … (9)

Adaptive probabilistic fuzzy clustering algorithm for data with gaps (10) (11) Kohonen’s “Winner – takes – more” rule

Adaptive algorithm for possibilistic fuzzy clustering The main disadvantage of probabilistic algorithms is connected with the constraints on membership levels which sum has to be equal unity. This reason has led to the creation of possibilistic fuzzy clustering algorithms [Krishnapuram, 1993]. In possibilistic clustering algorithms the objective function has the form where the scalar parameter determines the distance at which level of membership equals to 0.5 (12) (13)

Adaptive algorithm for possibilistic fuzzy clustering of data with gaps Adopting instead of Euclidean metric partial distance (PD), we can write the objective function as and then solving the equations system Thus, the process of fuzzy possibilistic clustering data with gaps can also be realized by using neuro-fuzzy Kohonen network. (14) (16) (15)

Results Iris data set: This is a benchmark data set in pattern recognition analysis, freely available at the UCI Machine Learning Repository. It contains three clusters (types of Iris plants: Iris Setosa, Iris Versicolor and Iris Virginica) of 50 data points each, of 4 dimensions (features): sepal length, sepal width, petal length and petal width. The class Iris Setosa is linearly separable from the other two, which are not linearly separable in their original clusters. Validation Cluster validity refers to the problem whether a given fuzzy partition fits to the data all. The clustering algorithm always tries to find the best fit for a fixed number of clusters and the parameterized cluster shapes. However this does not mean that even the best fit is meaningful at all. Either the number of clusters might be wrong or the cluster shapes might not correspond to the groups in the data, if the data can be grouped in a meaningful way at all. Two main approaches to determining the appropriate number of clusters in data can be distinguished: 17/20

Results Validity measures: Partition Coefficient (PC) - measures the amount of "overlapping" between cluster. The optimal number of cluster is at the maximum value. Classification Entropy (CE) - it measures the fuzzyness of the cluster partition only, which is similar to the Partition Coeffcient. Partition Index (SC) - is the ratio of the sum of compactness and separation of the clusters. It is a sum of individual cluster validity measures normalized through division by the fuzzy cardinality of each cluster. A lower value of SC indicates a better partition. Separation Index (S) - on the contrary of partition index (SC), the separation index uses a minimum-distance separation for partition validity Xie and Beni’s Undex (XB) - it aims to quantify the ratio of the total variation within clusters and the separation of clusters. The optimal number of clusters should minimize the value of the index. 18/20

Results Clustering algorithm PC SC XB Adaptive algorithm for probabilistic fuzzy clustering 0.2547 2.7039e-004 0.0169 Adaptive probabilistic fuzzy clustering algorithm for data with gaps 0.3755 6.4e-003 0.1842 Adaptive algorithm for possibilistic fuzzy clustering 0.2691 4.4064e-005 0.0122 Adaptive algorithm for possibilistic fuzzy clustering of data with gaps 0.2854 5.103e-003 0.4571 Fuzzy c-means 0.5036 1.6255 0.191 Gustafson-Kessel 0.2755 4.7481e-002 0.5717 Gath-Geva 0.2542 4.941e-005 3.3548 19/20