Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Clustering II.
Conceptual Clustering
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining Techniques: Clustering
Important clustering methods used in microarray data analysis Steve Horvath Human Genetics and Biostatistics UCLA.
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Clustering II.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Data Mining Techniques Outline
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Basic Data Mining Techniques
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense.
What is Cluster Analysis?
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
CogNova Technologies 1 COMP3503 Automated Discovery and Clustering Methods COMP3503 Automated Discovery and Clustering Methods Daniel L. Silver.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Exploratory Data Analysis Exploratory Data Analysis Dr.Lutz Hamel Dr.Joan Peckham Venkat Surapaneni.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Hierarchical Clustering
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
MATH-138 Elementary Statistics
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
CS 685: Special Topics in Data Mining Jinze Liu
Clustering and Multidimensional Scaling
Clustering.
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSE572, CBS572: Data Mining by H. Liu
Birch presented by : Bahare hajihashemi Atefeh Rahimi
Hierarchical Clustering
CSE572: Data Mining by H. Liu
Presentation transcript:

Clustering II

2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities of attribute values in that cluster Finite mixtures: finite number of clusters Individual distributions are usually normal Combine distributions using cluster weights Each normal distribution can be described in terms of μ (mean) and σ (standard deviation) For a single attribute with two clusters –μ A, σ A for cluster A and μ B, σ B for cluster B –The attribute values are obtained by combining values from cluster A with a probability of P A and from cluster B with a probability P B –Five parameters μ A, σ A, μ B, σ B and P A (because P A +P B =1) describe the attribute value distribution

3 EM Algorithm EM = Expectation – Maximization –Generalize k-means to probabilistic setting Input: Collection of instances and number of clusters, k Output: probabilities with which each instance belongs to each of the k clusters Method: –Start by guessing values for all the parameters of the k clusters (similar to guessing centroids in k-means) –Repeat E ‘Expectation’ Step: Calculate cluster probability for each instance M ‘Maximization’ Step: Estimate distribution parameters from cluster probabilities Store cluster probabilities as instance weights Estimate parameters from weighted instances –Until we obtain distribution parameters that predict the input data

4 Incremental Clustering (Cobweb/Classit) Input: Collection of instances Output: A hierarchy of clusters Method: –Start with an empty root node of the tree –Add instances one by one –if any of the existing leaves is a good ‘host’ for the incoming instance then form a cluster with it Good host has high category utility (next slide) –If required restructure the tree Cobweb - for nominal attributes Classit – for numerical attributes

5 Category Utility Category Utility, CU(C 1,C 2,…,C k ) = {∑ l P[C l ] ∑ i ∑ j (P[a i =v ij |C l ] 2 -P[a i =v ij ] 2 )}/k Computes the advantage in predicting the values of attributes of instances in a cluster –If knowing the cluster information of an instance does not help in predicting the values of its attributes, then the cluster isn’t worth forming The inner term of difference of squares of probabilities, (P[a i =v ij |C l ] 2 -P[a i =v ij ] 2 ) is computing this information The denominator, k is computing this information per cluster

6 Weather Data with ID IDOutlookTemperatureHumidityWindyPlay asunnyhothighfalseno bsunnyhothightrueno covercasthothighfalseyes drainymildhighfalseyes erainycoolnormalfalseyes frainycoolnormaltrueno govercastcoolnormaltrueyes hsunnymildhighfalseno isunnycoolnormalfalseyes jrainymildnormalfalseyes ksunnymildnormaltrueyes lovercastmildhightrueyes movercasthotnormalfalseyes nrainymildhightrueno Artificial data, therefore not possible to find natural clusters (two clusters of yeses and nos not possible)

7 Trace of Cobweb a:no e:yes a:nob:noc:yes d:yes 1 2 a:nob:noc:yes d:yes e:yesf:no e is the best host CU of e&f as cluster high e&f are similar 3 No good host for the first five instances

8 Trace of Cobweb (Contd) a:nob:noc:yes d:yes 4 e:yesf:nog:yes At root: e&f cluster best host At e&f: no host, so no new cluster, g added to e&f cluster f&g are similar b:no c:yes d:yes e:yesf:nog:yes 5 a:noh:no At root: a is the best host and d is the runner-up Before h is inserted runner-up, d is evaluated CU of a&d is high, so d merged into a to form a new cluster At a&d: no host, so no new cluster, h added to a&d cluster

9 Trace of Cobweb (Contd) n:nom:yesj:yesf:nog:yes i:yes e:yesl:yesc:yes h=no d:yesa:no k:yesb:no For large data sets, growing the tree to individual instances might lead to overfitting. A similarity threshold called cutoff used to suppress growth

10 Hierarchical Agglomerative Clustering Input: Collection of instances Output: A hierarchy of clusters Method: –Start with individual instances as clusters –Repeat Merge the ‘closest’ two clusters –Until only one cluster remains Ward’s method: Closeness or proximity between two clusters is defined as the increase in squared error that results when two clusters are merged Squared error measure used for only the local decision of merging clusters –No global optimization

11 HCE A visual knowledge discovery tool for analysing and understanding multi-dimensional (> 3D) data Offers multiple views of –input data and clustered input data –where views are coordinated Many other similar tools do a patch work of statistics and graphics HCE follows two fundamental statistical principles of exploratory data analysis –To examine each dimension first and then find relationships among dimensions –To try graphical displays first and then find numerical summaries

12 GRID Principles GRID – graphics, ranking and interaction for discovery Two principles –Study 1D, study 2D and find features –Ranking guides insight, statistics confirm These principles help users organize their knowledge discovery process Because of GRID, HCE is more than R + Visualization GRID can be used to derive some scripts to organize exploratory data analysis using R (or some such statistics package)

13 Rank-by-Feature Framework A user interface framework based on the GRID Principles The framework –Uses interactive information visualization techniques combined with –statistical methods and data mining algorithms –Enables users to orderly examine input data HCE implements rank-by-feature framework –This means HCE uses existing statistical and data mining methods to analyse input data and Communicate those results using interactive information visualization techniques

14 Multiple Views in HCE Dendrogram Colour Mosaic 1 D histograms 2D scatterplots And more

15 Dendrogram Display Results of HAC are shown visually using a dendrogram A dendrogram is a tree –with data items at the terminal (leaf) nodes –Distance from the root node represents similarity among leaf nodes Two visual controls –minimum similarity bar allows users to adjust the number of clusters –Detail cut-off bar allows users to reduce clutter A B C D

16 Colour Mosaic Input data is shown using this view Is a colour coded visual display of tabular data Each cell in the table is painted in a colour that reflects the cell’s value Two variations –The layout of the mosaic is similar to the original table –A transpose of the original layout HCE uses the transposed layout because data sets usually have more rows than columns A colour mapping control Original layout Table Transposed Layout

17 1D Histogram Ordering This data view is part of the rank-by-feature framework Data belonging to one column (variable) is displayed as a histogram + box plot –Histogram shows the scale and skewness –Box plot shows the data distribution, center and spread For the entire data set many such views are possible By studying individual variables in detail users can select the variables for other visualizations

18 2D Scatter Plot Ordering This data view is again part of the rank-by-feature framework Three categories of 2D presentations are possible –Axes of the plot obtained from Principal Component Analysis Linear or non-linear combinations of original variables –Axes of the plot obtained directly from the original variables –Parallel coordinates HCE uses the second option of plotting pairs of variables from the original variables Both 1D and 2D plots can be sorted according to some user selected criteria such as number of outliers