Data Mining II: Association Rule mining & Classification Jagdish Gangolly State University of New York at Albany Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Data Mining II Attribute-Oriented Induction Mining association rules Mining single-dimensional boolean association rules Classification Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Attribute-Oriented Induction I Steps: Original query (in DMQL) specify the database to be mined specify relevant attributes specify the relation to be mined specify the concept in the hierarchy Transformation of DMQL to relational query whose execution yields initial working relation. Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Attribute-Oriented Induction II Attribute removal/generalisation: removal rule: remove attribute if no generalisation operator on the attribute (large set of attribute values, but nogeneralisation operator) higher level concepts in the hierarchy expressed in terms of other attributes (address example) generalisation rule: if there are many attribute values and there are generalisation operators, use them attribute generalisation threshold control Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Basic Algorithm for Attribute-Oriented induction Input: Relational database, DMQL query, a list of attributes, a set of concept hierarchies, attribute generalisation thresholds Output: a Prime generalised relation Method: Collect task-relevant data into a working relation: get W Collect statistics on the working relation Derive the prime relation P. Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Mining association rules I Some examples: Market basket analysis: analysing customer buying habits Intrusion detection by analysing user habits Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Mining association rules II Basic concepts: Set of items I Task-relevant data D consisting of database transactions T I An association rule is an implication of the form A B where A I, B I, A B = support(A B) = P(AB) confidence(A B ) = P(B/A) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Mining association rules II Classification of association rules: Based on types of values Boolean computer financial-management-software Quantitative association rule age(X, “30..39”) income(X, “42K..48K”) buys(X, “financial-management-software”) Based on dimensions of data involved in the rule buys(X, “computer”) buys(X, “financial-management-software”) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Mining association rules III Based on levels of abstraction age(X, “30..39”) buys(X, “laptop”) age(X, “30..39”) buys(X, “computer”) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Mining single-dimensional boolean association rules I Apriori algorithm for finding frequent itemsets Apriori property: (All nonempty subsets of a frequent itemset must also be frequent). If P(I) < min_sup, then for any item A, P(IA) < min_sup Steps: Join step: A set of candidate k-itemsets, denoted by Ck , generated by joining Lk-1 with itself. Prune step: Prune Ck Example 6-1 (p.232) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Classification I Supervised learning Training data Test data Training data is analysed to derive classification rules; the test data are used to estimate the accuracy of classification rules Unsupervised learning or clustering Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Classification II Preliminary steps: Comparison/evaluation of methods: data cleaning (reduction of noise, missing values, etc.) relevance analysis (feature selection) data transformation (generalisation, normalisation) Comparison/evaluation of methods: Predictive accuracy speed Robustness Scalability Interpretability Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018