COMP3740 CR32: Knowledge Management and Adaptive Systems

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems Data Mining outputs: What knowledge can Data Mining learn? By Eric Atwell, School of Computing,
Conceptual Clustering
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Clustering.
Albert Gatt Corpora and Statistical Methods Lecture 13.
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Data Mining Techniques: Clustering
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Unsupervised Learning and Data Mining
Cluster Analysis (1).
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Radial Basis Function (RBF) Networks
Clustering Unsupervised learning Generating “classes”
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Inductive learning Simplest form: learn a function from examples
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering CSC 600: Data Mining Class 21.
Machine Learning Clustering: K-means Supervised Learning
Data Science Algorithms: The Basic Methods
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
K-means and Hierarchical Clustering
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Text Categorization Berlin Chen 2003 Reference:
Chapter 7: Transformations
Unsupervised Learning: Clustering
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 23
Unsupervised Learning
Data Mining CSCI 307, Spring 2019 Lecture 11
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

COMP3740 CR32: Knowledge Management and Adaptive Systems Unsupervised ML: Association Rules, Clustering Eric Atwell, School of Computing, University of Leeds (including re-use of teaching resources from other sources, esp. Knowledge Management by Stuart Roberts, School of Computing, University of Leeds)

Today’s Objectives (I showed how to build Decision Trees and Classification Rules last lecture) To compare classification rules with association rules. To describe briefly the algorithm for mining association rules. To describe briefly algorithms for clustering To understand the difference between Supervised and Unsupervised Machine Learning

Association Rules The RHS of classification rules (from decision trees) always involves the same attribute (the class). More generally, we may wish to look for rule-based patterns involving any attributes on either side of the rule. These are called association rules. For example, “Of the people who do not share files, whether or not they use a scanner depends on whether they have been infected before or not”

Learning Association Rules The search space for association rules is much larger than for decision trees. To reduce the search space we consider only rules with large ‘coverage’ (lots of instances match lhs). The basic algorithm is: Generate all rules with coverage greater than some agreed minimum coverage; Select from these only those rules with accuracy greater than some agreed minimum accuracy (eg 100%!).

Rule generation First find all combinations of attribute-value pairs with a pre-specified minimum coverage. These are called item-sets. Next Generate all possible rules from the item sets; Compute the coverage and accuracy of each rule. Prune away rules with accuracy below pre-defined minimum.

Generating item sets Minimum coverage = 3 “1-item” item sets: F= yes; S = yes; S = no; I = yes; I = no; Risk = High “2-item” item sets: F= yes, S = yes; F= yes, I=no; F= yes, Risk = High; I = no, Risk = High; “3-item” item sets: F= yes, I = no, Risk = High;

Rule generation First find all combinations of attribute-value pairs with a pre-specified minimum coverage. These are called item-sets. Next Generate all possible rules from the item sets; Compute the coverage and accuracy of each rule. Prune away rules with accuracy below pre-defined minimum.

Example rules generated Minimum coverage = 3 Rules from F= yes: IF _ then F= yes; (coverage 5, accuracy 5/7)

Example rules generated Minimum coverage = 3 Rules from F= yes, S=yes: IF S = yes then F= yes; (coverage 3, accuracy 3/4) IF F = yes then S = yes (coverage 3, accuracy 3/5) IF _ then F=yes and S=yes (coverage 3, accuracy 3/7)

Example rules generated Minimum coverage = 3 Rules from : F= yes, I = no, Risk = High; IF F=yes and I=no then Risk=High (3/3) IF F=yes and Risk=High then I=no (3/4) IF I=no and Risk=High then F=yes (3/3) IF F=yes then I=no and Risk=High (3/5) IF I=no then Risk=High and F=yes (3/4) IF Risk=High then I=no and F=yes (3/4) IF _ then Risk=High and I=no and F=yes (3/7)

Rule generation First find all combinations of attribute-value pairs with a pre-specified minimum coverage. These are called item-sets. Next Generate all possible rules from the item sets; Compute the coverage and accuracy of each rule. Prune away rules with accuracy below pre-defined minimum.

If we require 100% accuracy… Only two rules qualify: IF I=no and Risk=High then F=yes IF F=yes and I=no then Risk=High (Note: second happens to be a rule that has the classificatory attribute on the rhs, in general this need not be the case).

Clustering v Classification Decision trees and Classification Rules assign instances to pre-defined classes. Association rules don’t group instances into classes, but find links between features / attributes Clustering is for discovering ‘natural’ groups (classes) which arise from the raw (unclassified) data. Analysis of clusters may lead to knowledge regarding underlying mechanism for their formation.

Example: what clusters can you see? Here is an example from SQL Server documentation. The table doesn’t immediately tell us much.

Example 3 clusters Interesting gap

You can try to “explain” the clusters Young folk are looking for excitement perhaps, somewhere their parents haven’t visited? Older folk visit Canada more, Why? Particularly interesting is the gap. Probably the age where they can’t afford expensive holidays and educate the children The client (domain expert – eg travel agent) may “explain” clusters better, once shown them

Hierarchical clustering: dendrogram

N-dimensional data Consider point of sale data: item purchased price profit margin promotion store shelf-length position in store date/time customer postcode Some of these are numeric attributes: (price, profit margin, shelf-length, date-time); some are nominal: (item purchased, store, position in store, customer postcode)

To cluster, we need a Distance function For some clustering methods (eg K-means) we need to define the distance between two facts, using their vectors. Euclidean distance is usually fine: Although we usually have to normalise the vector components to get good results

Vector representation Represent each instance (fact) as a vector: one dimension for each numeric attribute some nominal attributes may be replaced by numeric attributes (eg postcode to 2 grid coordinates) some nominal attributes replaced by N binary dimensions - one for each value that the attribute can take. (eg ‘female’ becomes <1, 0>, ‘male’ becomes <0, 1>) Cluster analysis relies on building vectors. These are similar to the vectors we built for describing documents, but now they have numeric and nominal attributes mixed up (we mainly thought of IR vector model as being homogeneous - each element represented a particular word from the term dictionary). Example vector: (0,0,0,0,1,0,0,4.65,15,0,0,1,0,0,0,0,1,….

Vector representation Represent each fact as a vector: one dimension for each numeric attribute some nominal attributes may be replaced by numeric attributes (eg postcode to 2 grid coordinates) some nominal attributes replaced by N binary dimensions - one for each value that the attribute can take. (eg ‘female’ becomes <1, 0>, ‘male’ becomes <0, 1>) Treatment of nominal features is just like a line in ARFF file; or keyword weights that index documents in IR e.g. Google Cluster analysis relies on building vectors. These are similar to the vectors we built for describing documents, but now they have numeric and nominal attributes mixed up (we mainly thought of IR vector model as being homogeneous - each element represented a particular word from the term dictionary). Example vector: (0,0,0,0,1,0,0,4.65,15,0,0,1,0,0,0,0,1,….

Vector representation Price is £4.65 Promotion is No 3 of 6 7 different products; this sale is for product no 5 Profit margin is 15% Store is No 2 of many ... The vectors may be very long - we won’t store then like this! Example vector: (0,0,0,0,1,0,0,4.65,15,0,0,1,0,0,0,0,1,….

Cluster Algorithm Now we run an algorithm to identify clusters: n-dimensional regions where facts are dense. There are very many cluster algorithms, each suitable for different circumstances. We briefly describe k-means iterative optimisation, which yields K clusters; then an alternative incremental method which yields a dendrogram or hierarchy of clusters

Algorithm1: K-means 1. Decide on the number, k, of clusters you want 2. Select at random k vectors 3. Using the distance function, form groups by assigning each remaining vector to the nearest of the k vectors from step 2. 4. Compute the centroid (mean) of each of the k groups from 3. 5. Re-form the groups by assigning each vector to the nearest centroid from 4. 6. Repeat steps 4 and 5 until the groups no longer change. The k groups so formed are the clusters.

Pick three points at random Partition Data set

Find partition centroids

Re-partition

Re-adjust centroids

Repartition

Re-adjust centroids

Repartition Clusters have not changed k-means has converged

Algorithm2: Incremental Clustering This method builds a dendrogram “tree of clusters” by adding one instance at a time. The decision as to which cluster each new instance should join (or whether they should form a new cluster by themselves), is based on a category utility The category utility is a measure of how good a particular partition is; it does not require attributes to be numeric. Algorithm: for each instance, add to tree so far, where it “best fits” according to category uitiliy

Incremental clustering To add a new instance to existing cluster hierarchy. Compute the CU for new instance: a. Combined with each existing top level cluster b. Placed in a cluster of it’s own Choose the option above with greatest CU. If added to an existing cluster try to increase CU by merging with subclusters. The method needs modifying by introducing a merging and a splitting procedure.

Incremental Clustering b c b a a b c a b c b c a d a c b d a b c d a b d c

Incremental Clustering f a b d c e e f a b d c a b d c e f

Incremental clustering Merging procedure on considering placing instance I at some level: if best cluster to add I to is Cl (ie maximises CU), and next best at that level is Cm, then: Compute CU for Cl merged with Cm and merge if CU is larger than with clusters separate.

Incremental Clustering Splitting Procedure Whenever: the best cluster for the new instance to join has been found Merging is not found to be beneficial Try splitting the node, recompute CU and replace node with its children if this leads to higher CU value.

Incremental clustering v k-means Neither method guarantees a globally optimised partition. K-means depends on the number of clusters as well as initial seeds (K first guesses). Incremental clustering generates a hierarchical structure that can be examined and reasoned about. Incremental clustering depends on the order in which instances are added.

Self Check Describe advantages classification rules have over decision trees. Explain the difference between classification and association rules. Given a set of instances, generate decision rules and association rules which are 100% accurate (on training set) Explain what is meant by cluster centroid, k-means, unsupervised machine learning.