Download presentation
Presentation is loading. Please wait.
Published byConstance Ward Modified over 10 years ago
1
Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Fernando Berzal fberzal@decsai.ugr.es
2
1 Association mining searches for interesting relationships among items in a given data set EXAMPLES n Diapers and six-packs are bought together, specially on Thursday evening (a myth?) n A sequence such as buying first a digital camera and then a memory card is a frequent (sequential) pattern n … Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
3
2 MARKET BASKET ANALYSIS The earliest form of association rule mining Applications: Catalog design, store layout, cross-marketing… Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
4
3 Definition Item n n In transactional databases: Any of the items included in a transaction. n n In relational databases: (Attribute, value) pair k-itemset Set of k items Itemset support Itemset support support(I) = P(I) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
5
4 Definition Association rule X Y n Support support(X Y) = support(X U Y) = P(X U Y) n Confidence confidence(X Y) = support(X U Y) / support(X) = P(Y|X) N OTE : Both support and confidence are relative Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
6
5 Discovery Association rule mining 1. 1. Find all frequent itemsets 2. 2. Generate strong association rules from the frequent itemsets Strong association rules are those that satisfy both a minimum support threshold and a minimum confidence threshold. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
7
6 Apriori Observation: All non-empty subsets of a frequent itemset must also be frequent Algorithm: Frequent k-itemsets are used to explore potentially frequent (k+1)- itemsets (i.e. candidates) Discovery "Fast Algorithms for Mining Association Rules", Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
8
7 Apriori improvements (I) n "An Effective Hash-Based Algorithm for Mining Association Rules", n Reducing the number of candidates Park, Chen & Yu: "An Effective Hash-Based Algorithm for Mining Association Rules", SIGMOD'95 "Mining Association Rules with Adjustable Accuracy", Sampling Toivonen: "Sampling Large Databases for Association Rules", VLDB'96 Park, Yu & Chen: "Mining Association Rules with Adjustable Accuracy", CIKM'97 "An Efficient Algorithm for Mining Association Rules in Large Databases" Partitioning Savasere, Omiecinski & Navathe: "An Efficient Algorithm for Mining Association Rules in Large Databases", VLDB'95 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
9
8 Apriori improvements (II) n "Fast Algorithms for Mining Association Rules", n Transaction reduction Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 (AprioriTID) "Dynamic Itemset Counting and Implication Rules for Market Basket Data", "Online Association Rule Mining", Dynamic itemset counting Brin, Motwani, Ullman & Tsur: "Dynamic Itemset Counting and Implication Rules for Market Basket Data", SIGMOD'97 (DIC) Hidber: "Online Association Rule Mining", SIGMOD'99 (CARMA) Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
10
9 Discovery Apriori-like algorithm: TBAR (Tree-based association rule mining) Berzal, Cubero, Sánchez & Serrano “TBAR: An efficient method for association rule mining in relational databases” Data & Knowledge Engineering, 2001 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
11
10 Discovery: TBAR A #7 B #9 C #7 D #8 B #6 D #5 C #6 D #7 D #5 5 instances with ABD 7 instances wih A 6 instances with AB 5 instances with AD L1L1L1L1 L2L2L2L2 L3L3L3L3 6 instances with BC Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
12
11 An alternative to Apriori: Compress the database representing frequent items into a frequent-pattern tree (FP-tree)… "Mining Frequent Patterns without Candidate Generation", Han, Pei & Yin: "Mining Frequent Patterns without Candidate Generation", SIGMOD'2000 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
13
12 A challenge When an itemset is frequent, all its subsets are also frequent n n Closed itemset C: There exists no proper super-itemset S such that support(S)=support(C) n n Maximal (frequent) itemset M: M is frequent and there exists no super-itemset Y such that M Y and Y is frequent. Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
14
13 Variations Based on the kinds of patterns to be mined: n n Frequent itemset mining (transactional and relational data) n n Sequential pattern mining (sequence data sets, e.g. bioinformatics) n n Structured pattern mining (structured data, e.g. graphs) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
15
14 Variations Based on the types of values handled: n n Boolean association rules n n Quantitative association rules n n Fuzzy association rules Delgado, Marín, Sánchez & Vila “Fuzzy association rules: General model and applications” IEEE Transactions on Fuzzy Systems, 2003 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
16
15 Variations More options: n n Generalized association rules (a.k.a. multilevel association rules) n n Constraint-based association rule mining n n Incremental algorithms n n Top-k algorithms n n … I C D M F I M I W o r k s h o p o n F r e q u e n t I t e m s e t M i n i n g I m p l e m e n t a t i o n s Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
17
16 Visualization Integrated into data mining tools to help users understand data mining results: n n Table-based approach e.g. SAS Enterprise Miner, DBMiner… n n 2D Matrix-based approach e.g. SGI MineSet, DBMiner… n n Graph-based techniques e.g. DBMiner ball graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
18
17 Visualization: Tables Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
19
18 Visualization: Visual aids Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
20
19 Visualization: 2D Matrix Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
21
20 Visualization: Graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
22
21 Visualization: VisAR Based on parallel coordinates (Techapichetvanich & Datta, ADMA’2005) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
23
22 Extensions Confidence is not the best possible interestingness measure for rules e.g. A very frequent item will always appear in rule consequents, regardless its true relationship with the rule antecedent X went to war X did not serve in Vietnam (from the US Census) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
24
23 Extensions Desirable properties for interestingness measures Piatetsky-Shapiro, 1991 P1ACC(A ⇒ C) = 0 when supp(A ⇒ C) = supp(A)supp(C) P2 ACC(A ⇒ C) monotonically increases with supp(A ⇒ C) P3ACC(A ⇒ C) monotonically decreases with supp(A) (or supp(C)) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
25
24 Extensions Certainty factors… n n … satisfy Piatetsky-Shapiro’s properties n n … are widely-used in expert systems n n … are not symmetric (as interest/lift) n n … can substitute conviction when CF>0 “Measuring the accuracy and interest of association rules: A new framework", Berzal, Blanco, Sánchez & Vila: “Measuring the accuracy and interest of association rules: A new framework", Intelligent Data Analysis, 2002 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
26
25 Extensions References: “Evaluation of interestingness measures for ranking discovered knowledge” Hilderman & Hamilton: “Evaluation of interestingness measures for ranking discovered knowledge”. PAKDD, 2001 “Selecting the right objective measure for association analysis” Tan, Kumar & Srivastava: “Selecting the right objective measure for association analysis”. Information Systems, vol. 29, pp. 293-313, 2004. “Association rule evaluation for classification purposes” Berzal, Cubero, Marín, Sánchez, Serrano & Vila: “Association rule evaluation for classification purposes” TAMIDA’2005 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
27
26 Applications Two sample applications where associations rules have been successful n n Classification (ART) n n Anomaly detection (ATBAR) “Discovering Hidden Association Rules ” Balderas, Berzal, Cubero, Eisman & Marín “Discovering Hidden Association Rules ” KDD’2005, Chicago, Illinois, USA Berzal, Cubero, Sánchez & Serrano “ART: A hybrid classification model” Machine Learning Journal, 2004 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
28
27 Classification Classification models based on association rules n n Partial classification models vg: Bayardo n n “Associative” classification models vg: CBA (Liu et al.) n n Bayesian classifiers vg: LB (Meretakis et al.) n n Emergent patterns vg: CAEP (Dong et al.) n n Rule trees vg: Wang et al. n n Rules with exceptions vg: Liu et al. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
29
28 GOAL Simple, intelligible, and robust classification models obtained in an efficient and scalable way MEANS Classification Decision Tree Induction + Association Rule Mining =ART [Association Rule Trees] Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
30
29 ART Classification Model IDEA Make use of efficient association rule mining algorithms to build a decision-tree-shaped classification model. ART = Association Rule Tree KEY Association rules + “else” branches Hybrid between decision trees and decision lists Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
31
30 ART Classification Model SPLICE Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
32
31 Construction ART classification model K=1 Rule mining (rules with K items in their LHS) ¿suitable rules? Branch the tree using selected rules and recursively process the “else” branch Yes K=K+1 ¿ K <= MaxSize ? Yes No Create a leaf node labelled with the most frequent class Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
33
32 Construction ART classification model Rule mining: Candidate hypotheses MinSupp Minimum support threshold MinConf Minimum confidence threshold Fixed threshold Automatic selection K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
34
33 Rule selection: n n Rules grouped by sets of attributes. n n Preference criterion. Construction ART classification model K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
35
34 Example Dataset ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
36
35 Example Level 1K = 1 ART classification model S1: if (Y=0) then C=0 with confidence 75% if (Y=1) then C=1 with confidence 75% S2: if (Z=0) then C=0 with confidence 75% if (Z=1) then C=1 with confidence 75% LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
37
36 Example Level 1K = 2 LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) S2: if (X=1 and Z=0) then C=0 (100%) if (X=1 and Z=1) then C=1 (100%) S3: if (Y=0 and Z=0) then C=0 (100%) if (Y=1 and Z=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
38
37 Example Level 1 LEVEL 1 Best rule set selection e.g. S1 X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else... S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
39
38 Example Level 1 Level 2 ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
40
39 Example Level 2 LEVEL 2 Rule mining S1: if (Z=0) then C=0 with confidence 100% if (Z=1) then C=1 with confidence 100% RESULT X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else Z=0: C=0 (2) Z=1: C=1 (2) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
41
40 Example ART vs. TDIDT ARTTDIDT ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
42
41 Classifier accuracy ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
43
42 Classifier complexity ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
44
43 Training time ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
45
44 I/O Operations - Scans ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
46
45 I/O Operations - Records ART classification model > Experimental results
47
46 I/O Operations - Pages ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
48
47 Final comments ART classification model Classification models n n Acceptable accuracy n n Reduced complexity n n Attribute interactions n n Robustness (noise & primary keys) Classifier building method n n Efficient algorithm n n Good scalability properties n n Automatic parameter selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
49
48 It is often more interesting to find surprising non-frequent events than frequent ones EXAMPLES n Abnormal network activity patterns in intrusion detection systems. n Exceptions to “common” rules in Medicine (useful for diagnosis, drug evaluation, detection of conflicting therapies…) n … Anomaly detection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
50
49 Anomaly detection Anomalous association rule Confident rule representing homogeneous deviations from common behavior. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
51
50 Anomaly detection X¬Y confident X Y frequent and confident X usually implies Y (dominant rule) When X does not imply Y, then it usually implies A (the Anomaly) A X Y ¬Aconfident Anomalous association rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
52
51 Anomaly detection XYA1A1 Z1Z1 … XYA1A1 Z2Z2 … XYA2A2 Z3Z3 … XYA2A2 Z1Z1 … XYA3A3 Z2Z2 … XYA3A3 Z3Z3 … XYAZ… XY3Y3 AZ3Z3 … X Y3Y3 AZ… XY4Y4 AZ… X Y is the dominant rule X A when ¬ Y is the anomalous rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
53
52 Anomaly detection Suzuki et al.’s “Exception Rules” X Y is an association rule X I X I is the reference rule is the exception rule¬ Y I is the “interacting” itemset Too many exceptions The “cause” needs to be present Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
54
53 Anomaly detection: ATBAR Anomalous association rules A #7 AB#6 AC#4 AD#5 AE#3 AF#3 B #9 C #7 D #8 First scan A #7 Second scan B #6 D #5 Non-frequent A #7 A * Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
55
54 Anomaly detection: ATBAR Anomalous association rules B #9 C #7 D #8 First scan A #7 Second scan A #7 A * B #6 D #5 B #9 B * C #7 C * D #8 D * C #6 D #7 D #5 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
56
55 Anomaly detection: ATBAR Anomalous association rules Rule generation is immediate from the frequent and extended itemsets obtained by ATBAR Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
57
56 Anomaly detection: Results Experiments on health-related datasets from the UCI Machine Learning Repository n n Relatively small set of anomalous rules (typically, >90% reduction with respect to standard association rules) n n Reasonable overhead needed to obtain anomalous association rules (about 20% in ATBAR w.r.t. TBAR) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
58
57 Anomaly detection: Results An example from the Census dataset: if WORKCLASS: Local-gov then CAPGAIN: [99999.0, 99999.0] (7 out of 7) CAPGAIN: [99999.0, 99999.0] (7 out of 7) when not CAPGAIN: [0.0, 20051.0] Usual consequent “Anomaly” Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
59
58 n n Anomalous association rules (novel characterization of potentially interesting knowledge) n n An efficient algorithm for discovering anomalous association rules: ATBAR n n Some heuristics for filtering the discovered anomalous association rules Anomaly detection: Results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
60
59 n n Additional heuristics for focusing on interesting anomalies (maybe domain- or even application-specific). n n Alternative measures for the evaluation and ranking of anomalous association rules: Certainty factors / Conviction … Anomaly detection: Future… Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR
61
Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Questions, comments, and suggestions… Fernando Berzal fberzal@decsai.ugr.es
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.