Download presentation
Presentation is loading. Please wait.
Published byAshlie Bradford Modified over 6 years ago
1
Data Mining II: Association Rule mining & Classification
Jagdish Gangolly State University of New York at Albany Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
2
Data Mining II Attribute-Oriented Induction Mining association rules
Mining single-dimensional boolean association rules Classification Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
3
Attribute-Oriented Induction I
Steps: Original query (in DMQL) specify the database to be mined specify relevant attributes specify the relation to be mined specify the concept in the hierarchy Transformation of DMQL to relational query whose execution yields initial working relation. Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
4
Attribute-Oriented Induction II
Attribute removal/generalisation: removal rule: remove attribute if no generalisation operator on the attribute (large set of attribute values, but nogeneralisation operator) higher level concepts in the hierarchy expressed in terms of other attributes (address example) generalisation rule: if there are many attribute values and there are generalisation operators, use them attribute generalisation threshold control Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
5
Basic Algorithm for Attribute-Oriented induction
Input: Relational database, DMQL query, a list of attributes, a set of concept hierarchies, attribute generalisation thresholds Output: a Prime generalised relation Method: Collect task-relevant data into a working relation: get W Collect statistics on the working relation Derive the prime relation P. Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
6
Mining association rules I
Some examples: Market basket analysis: analysing customer buying habits Intrusion detection by analysing user habits Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
7
Mining association rules II
Basic concepts: Set of items I Task-relevant data D consisting of database transactions T I An association rule is an implication of the form A B where A I, B I, A B = support(A B) = P(AB) confidence(A B ) = P(B/A) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
8
Mining association rules II
Classification of association rules: Based on types of values Boolean computer financial-management-software Quantitative association rule age(X, “30..39”) income(X, “42K..48K”) buys(X, “financial-management-software”) Based on dimensions of data involved in the rule buys(X, “computer”) buys(X, “financial-management-software”) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
9
Mining association rules III
Based on levels of abstraction age(X, “30..39”) buys(X, “laptop”) age(X, “30..39”) buys(X, “computer”) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
10
Mining single-dimensional boolean association rules I
Apriori algorithm for finding frequent itemsets Apriori property: (All nonempty subsets of a frequent itemset must also be frequent). If P(I) < min_sup, then for any item A, P(IA) < min_sup Steps: Join step: A set of candidate k-itemsets, denoted by Ck , generated by joining Lk-1 with itself. Prune step: Prune Ck Example 6-1 (p.232) Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
11
Classification I Supervised learning
Training data Test data Training data is analysed to derive classification rules; the test data are used to estimate the accuracy of classification rules Unsupervised learning or clustering Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
12
Classification II Preliminary steps: Comparison/evaluation of methods:
data cleaning (reduction of noise, missing values, etc.) relevance analysis (feature selection) data transformation (generalisation, normalisation) Comparison/evaluation of methods: Predictive accuracy speed Robustness Scalability Interpretability Acc 522 Fall 2001 Jagdish S. Gangolly 11/15/2018
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.