Recent Trends in Fuzzy Clustering: From Data to Knowledge Shenyang, August 2009

Slides:



Advertisements
Similar presentations
Sequential Three-way Decision with Probabilistic Rough Sets Supervisor: Dr. Yiyu Yao Speaker: Xiaofei Deng 18th Aug, 2011.
Advertisements

Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Using the Crosscutting Concepts As conceptual tools when meeting an unfamiliar problem or phenomenon.
1 CLUSTERING  Basic Concepts In clustering or unsupervised learning no training data, with class labeling, are available. The goal becomes: Group the.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Reference Models مدل های مرجع معماری.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
What is Cluster Analysis
Software Requirements
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Developed by Reneta Barneva, SUNY Fredonia Component Level Design.
1 Objective of today’s lesson S oftware engineering occurs as a consequence of a process called system engineering. Instead of concentrating solely on.
System Engineering Instructor: Dr. Jerry Gao. System Engineering Jerry Gao, Ph.D. Jan System Engineering Hierarchy - System Modeling - Information.
CSC Proprietary CATALYST OCMM ASSESSMENT PART OF THE CATALYST TOPIC INTRODUCTION SERIES FOR CSC INTERNAL USE ONLY.
Romaric GUILLERM Hamid DEMMOU LAAS-CNRS Nabil SADOU SUPELEC/IETR ESM'2009, October 26-28, 2009, Holiday Inn Leicester, Leicester, United Kingdom.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Architectural Design.
Chapter 16 DATA SECURITY, PRIVACY AND DATA MINING Cios / Pedrycz / Swiniarski / Kurgan.
Software Development Process
Budowa reguł decyzyjnych z rozmytą granulacją wiedzy Zenon A. Sosnowski Wydział Informatyki Politechnika Białostocka Wiejska 45A, Bialystok
Granular Computing for Machine Learning JingTao Yao Department of Computer Science, University of Regina
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan.
Probability-based imputation method for fuzzy cluster analysis of gene expression microarray data Thanh Le, Tom Altman and Katheleen Gardiner University.
Design Science Method By Temtim Assefa.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
SOFTWARE DESIGN.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
The 5th annual UK Workshop on Computational Intelligence London, 5-7 September 2005 The 5th annual UK Workshop on Computational Intelligence London, 5-7.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Ch 4 - Learning Objectives Scope Management You should be able to: n Discuss the relationship between scope and project failure n Describe how strategic.
1 Introduction to Software Engineering Lecture 1.
Methodology - Conceptual Database Design
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Neumaier Clouds Yan Bulgak October 30, MAR550, Challenger 165.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Witold Pedrycz Department of Electrical & Computer Engineering University of Alberta, Edmonton, Canada and Systems Research Institute, Polish Academy of.
Information Granulation and Granular Relationships JingTao Yao Department of Computer Science University of Regina
Chapter 5 KNOWLEDGE REPRESENTATION Cios / Pedrycz / Swiniarski / Kurgan.
Algorithmic Facets of Human Centricity in Computing with Fuzzy Sets ISDA-2009, Pisa, Italy, November 30-December 2, 2009
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 15a: Product Metrics for Software Software Engineering: A Practitioner’s Approach, 6/e Chapter.
Fuzzy C-Means Clustering
Panel Discussion on Granular Computing at RSCTC2004 J. T. Yao University of Regina Web:
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
February 19, February 19, 2016February 19, 2016February 19, 2016 Azusa, CA Sheldon X. Liang Ph. D. Software Engineering in CS at APU Azusa Pacific.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Software Engineering Lecture 10: System Engineering.
APPLYING FUZZY LINGUISTIC QUANTIFIER TO SELECT SUPPLY CHAIN PARTNERS AT DIFFERENT PHASES OF PRODUCT LIFE CYCLE Advisor: Prof. Chu, Ta Chung 朱大中 Student:
CHAPTER 3 Selected Design and Processing Aspects of Fuzzy Sets.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Semi-Supervised Clustering
Clustering (3) Center-based algorithms Fuzzy k-means
CSCI N317 Computation for Scientific Applications Unit Weka
Block Matching for Ontologies
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Recent Trends in Fuzzy Clustering: From Data to Knowledge Shenyang, August 2009

Agenda Introduction: clustering, information granulation and paradigm shift Key challenges in clustering Fuzzy objective-based clustering Knowledge-based augmentation of fuzzy clustering Collaborative fuzzy clustering Concluding comments

Clustering Areas of research and applications: Data analysis Modeling Structure determination Google Scholar -2, 190,000 hits for “clustering” (as of August 6, 2009)

Clustering as a conceptual and algorithmic framework of information granulation Data  information granules (clusters) abstraction of data Formalism of: set theory (K-Means) fuzzy sets (FCM) rough sets shadowed sets

Main categories of clustering Graph-oriented and hierarchical (single linkage, complete linkage, average linkage..) Objective function-based clustering Diversity of formalisms and optimization tools (e.g., methods of Evolutionary Computing)

Key challenges of clustering Data-driven methods Selection of distance function (geometry of clusters) Number of clusters Quality of clustering results

The dichotomy and the shift of paradigm

Fuzzy Clustering: Fuzzy C-Means (FCM) Given data x 1, x 2, …, x N, determine its structure by forming a collection of information granules – fuzzy sets Objective function Minimize Q; structure in data (partition matrix and prototypes)

Fuzzy Clustering: Fuzzy C-Means (FCM) V i – prototypes U- partition matrix

FCM – optimization Minimize subject to (a) prototypes (b) partition matrix

Optimization - details Partition matrix – the use of Lagrange multipliers d ik = ||x k -v i || 2  –Lagrange multiplier

Optimization – partition matrix (1)

Optimization- prototypes (2) Gradient of Q with respect to v s Euclidean distance

Fuzzy C-Means (FCM): An overview

Geometry of information granules m =1.2 m =2.0m =3.5 n=1

Domain Knowledge: Category of knowledge-oriented guidance Partially labeled data: some data are provided with labels (classes) Proximity knowledge: some pairs of data are quantified in terms of their proximity (closeness) Viewpoints: some structural information is provided Context-based guidance: clustering realized in a certain context specified with regard to some attribute

Clustering with domain knowledge (Knowledge-based clustering)

Context-based clustering To align the agenda of fuzzy clustering with the principles of fuzzy modeling, the following features are considered: Active role of the designer [customization of the model] The structural backbone of the model is fully reflective of relationships between information granules in the input and output space Clustering : construct clusters in input space X Context-based Clustering : construct clusters in input space X given some context expressed in output space Y

Context-based clustering: Computing considerations computationally more efficient, well-focused, designer-guided clustering process Data structure Data structure context

Context-based clustering Context-based Clustering : construct clusters in input space X given some context expressed in output space Y Context – hint (piece of domain knowledge) provided by designer who actively impacts the development of the model

Context-based clustering: Context design Context – hint (piece of domain knowledge) provided by designer who actively impacts the development of the model. As such, context is imposed by the designer at the beginning Realization of context Designer  focus  information granule (fuzzy set) (a) Designer, and (b) clustering of scalar data in output space Context – fuzzy set (set) formed in the output space

Context-based clustering: Modeling Determine structure in input space given the output is high Determine structure in input space given the output is medium Determine structure in input space given the output is low Input space (data)

Context-based clustering: examples Find a structure of customer data [clustering] Find a structure of customer data considering customers making weekly purchases in the range [$1,000 $3,000] Find a structure of customer data considering customers making weekly purchases at the level of around $ 2,500 Find a structure of customer data considering customers making significant weekly purchases who are young no context context (compound)

Context-oriented FCM Data (x k, target k ), k=1,2,…,N Contexts: fuzzy sets W 1, W 2, …, W p w jk = W i (target k ) membership of j-th context for k-th data Context-driven partition matrix

Context-oriented FCM: Optimization flow Objective function Iterative adjustment of partition matrix and prototypes Subject to constraint U in U(W j )

Viewpoints: definition Description of entity (concept) which is deemed essential in describing phenomenon (system) and helpful in casting an overall analysis in a required setting “external”, “reinforced” clusters

Viewpoints: definition viewpoint (a,b)viewpoint (a,?)

Viewpoints: definition Description of entity (concept) which is deemed essential in describing phenomenon (system) and helpful in casting an overall analysis in a required setting “external”, “reinforced” clusters

Viewpoints: definition viewpoint (a,b)viewpoint (a,?)

Viewpoints in fuzzy clustering B- Boolean matrix characterizing structure: viewpoints prototypes (induced by data)

Viewpoints in fuzzy clustering

B- Boolean matrix characterizing structure: viewpoints prototypes (induced by data)

Viewpoints in fuzzy clustering

Labelled data and their description Characterization in terms of membership degrees: F = [f ik ] i=12,…,c, k=1,2, …., N and supervision indicator b = [b k ], k=1,2,…, N

Augmented objective function  > 0

Proximity hints Characterization in terms of proximity degrees: Prox(k, l), k, l=1,2, …., N and supervision indicator matrix B = [b kl ], k, l=1,2,…, N Prox(k,l) Prox(s,t)

Proximity measure Properties of proximity: (a)Prox(k, k) =1 (b)Prox(k,l) = Prox(l,k) Proximity induced by partition matrix U:

Augmented objective function  > 0

Two general development strategies SELECTION OF A “MEANINGFUL” SUBSET OF INFORMATION GRANULES

Two general development strategies (1) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES (INFORMMATION GRANULES OF HIGHER TYPE) Information granules Type -1 Information granules Type -2

Two general development strategies (2) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES AND THE USE OF VIEWPOINTS Information granules Type -1 Information granules Type -2 viewpoints

Two general development strategies (3) HIERARCHICAL DEVELOPMENT OF INFORMATION GRANULES – A MODE OF SUCCESSIVE CONSTRUCTION

Information granules and their representatives Represent v k [ii] with the use of z 1, z 2, …, z c F ii F

Representation of fuzzy sets: two performance measures Entropy measure Reconstruction criterion (error)

Expressing performance through entropy measure

Reconstruction error where Requirement of “coverage” condition

Optimization problem Form a collection of prototypes Z = {z 1, z 2, …, z c } such that entropy (or reconstruction error) is minimized while satisfying coverage criterion Min Z Q subject to Optimization of fuzzification coefficient (m) Min Z Q subject to m>1 and

Collaborative structure development (2) phenomenon, process, system… Information granules data-1data-2 data-P Information granules of higher type

Collaborative structure determination: Information granules of higher order D[1] D[2] D[P] prototypes Clustering Prototypes (higher order)

Determining correspondence between clusters (3) Clustering Prototypes (higher order) zjzj Select prototypes in D[1], D[2], …, D[p] associated with z j with the highest degree of membership

Determining correspondence between clusters (4) v i [ii] zjzj D[ii] Prototype i 0 associated with prototype z j

Family of associated prototypes Prototype i 1 in D[1] associated with prototype z j Prototype i 2 in D[2] associated with prototype z j Prototype i p in D[p] associated with prototype z j …

From numeric prototypes to granular prototypes individual coordinate of the associated prototypes: a 1 a 2 …. a p  1  2 ….  p Information granule R [0,1]

The principle of justifiable granularity: Interval representation a 1 a 2 …. a p  1  2 ….  p bd 1 0 a0a0

The principle of justifiable granularity: Interval representation a 1 a 2 …. a p  1  2 ….  p bd 1 0 a0a0

The principle of justifiable granularity: optimization criterion

Hyperbox prototypes HiHi HjHj

Interval-valued fuzzy sets and granular prototypes HiHi HjHj x

Interval-valued fuzzy sets and granular prototypes vivi x Bounds of distances determined coordinate-wise

Interval-valued fuzzy sets: membership function Upper bound Lower bound

Collaborative structure determination: Structure refinement Feedback and structure refinement

Collaborative structure determination: Structure refinement Iterate Clustering at the local level Sharing findings and clustering at the higher (global) level Assessment of quality of clusters in light of the global structure  i (U)[ii] formed at the higher level Refinement of clustering Until termination criterion satisfied

Concluding comments Paradigm shift from data-based clustering to knowledge-based clustering Accommodation of knowledge in augmented objective functions Emergence of type-2 (higher type) information granules when working with collaborative clustering