Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –

Similar presentations


Presentation on theme: "Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –"— Presentation transcript:

1 Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Fernando Berzal fberzal@decsai.ugr.es

2 1 Association mining searches for interesting relationships among items in a given data set EXAMPLES n Diapers and six-packs are bought together, specially on Thursday evening (a myth?) n A sequence such as buying first a digital camera and then a memory card is a frequent (sequential) pattern n … Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

3 2 MARKET BASKET ANALYSIS The earliest form of association rule mining Applications: Catalog design, store layout, cross-marketing… Motivation Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

4 3 Definition Item n n In transactional databases: Any of the items included in a transaction. n n In relational databases: (Attribute, value) pair k-itemset Set of k items Itemset support Itemset support support(I) = P(I) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

5 4 Definition Association rule X  Y n Support support(X  Y) = support(X U Y) = P(X U Y) n Confidence confidence(X  Y) = support(X U Y) / support(X) = P(Y|X) N OTE : Both support and confidence are relative Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

6 5 Discovery Association rule mining 1. 1. Find all frequent itemsets 2. 2. Generate strong association rules from the frequent itemsets Strong association rules are those that satisfy both a minimum support threshold and a minimum confidence threshold. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

7 6 Apriori Observation: All non-empty subsets of a frequent itemset must also be frequent Algorithm: Frequent k-itemsets are used to explore potentially frequent (k+1)- itemsets (i.e. candidates) Discovery "Fast Algorithms for Mining Association Rules",  Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

8 7 Apriori improvements (I) n "An Effective Hash-Based Algorithm for Mining Association Rules", n Reducing the number of candidates  Park, Chen & Yu: "An Effective Hash-Based Algorithm for Mining Association Rules", SIGMOD'95 "Mining Association Rules with Adjustable Accuracy", Sampling  Toivonen: "Sampling Large Databases for Association Rules", VLDB'96  Park, Yu & Chen: "Mining Association Rules with Adjustable Accuracy", CIKM'97 "An Efficient Algorithm for Mining Association Rules in Large Databases" Partitioning  Savasere, Omiecinski & Navathe: "An Efficient Algorithm for Mining Association Rules in Large Databases", VLDB'95 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

9 8 Apriori improvements (II) n "Fast Algorithms for Mining Association Rules", n Transaction reduction  Agrawal & Skirant: "Fast Algorithms for Mining Association Rules", VLDB'94 (AprioriTID) "Dynamic Itemset Counting and Implication Rules for Market Basket Data", "Online Association Rule Mining", Dynamic itemset counting  Brin, Motwani, Ullman & Tsur: "Dynamic Itemset Counting and Implication Rules for Market Basket Data", SIGMOD'97 (DIC)  Hidber: "Online Association Rule Mining", SIGMOD'99 (CARMA) Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

10 9 Discovery Apriori-like algorithm: TBAR (Tree-based association rule mining)  Berzal, Cubero, Sánchez & Serrano “TBAR: An efficient method for association rule mining in relational databases” Data & Knowledge Engineering, 2001 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

11 10 Discovery: TBAR A #7 B #9 C #7 D #8 B #6 D #5 C #6 D #7 D #5 5 instances with ABD 7 instances wih A 6 instances with AB 5 instances with AD L1L1L1L1 L2L2L2L2 L3L3L3L3 6 instances with BC Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

12 11 An alternative to Apriori: Compress the database representing frequent items into a frequent-pattern tree (FP-tree)… "Mining Frequent Patterns without Candidate Generation",  Han, Pei & Yin: "Mining Frequent Patterns without Candidate Generation", SIGMOD'2000 Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

13 12 A challenge When an itemset is frequent, all its subsets are also frequent n n Closed itemset C: There exists no proper super-itemset S such that support(S)=support(C) n n Maximal (frequent) itemset M: M is frequent and there exists no super-itemset Y such that M  Y and Y is frequent. Discovery Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

14 13 Variations Based on the kinds of patterns to be mined: n n Frequent itemset mining (transactional and relational data) n n Sequential pattern mining (sequence data sets, e.g. bioinformatics) n n Structured pattern mining (structured data, e.g. graphs) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

15 14 Variations Based on the types of values handled: n n Boolean association rules n n Quantitative association rules n n Fuzzy association rules  Delgado, Marín, Sánchez & Vila “Fuzzy association rules: General model and applications” IEEE Transactions on Fuzzy Systems, 2003 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

16 15 Variations More options: n n Generalized association rules (a.k.a. multilevel association rules) n n Constraint-based association rule mining n n Incremental algorithms n n Top-k algorithms n n … I C D M F I M I W o r k s h o p o n F r e q u e n t I t e m s e t M i n i n g I m p l e m e n t a t i o n s Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

17 16 Visualization Integrated into data mining tools to help users understand data mining results: n n Table-based approach e.g. SAS Enterprise Miner, DBMiner… n n 2D Matrix-based approach e.g. SGI MineSet, DBMiner… n n Graph-based techniques e.g. DBMiner ball graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

18 17 Visualization: Tables Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

19 18 Visualization: Visual aids Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

20 19 Visualization: 2D Matrix Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

21 20 Visualization: Graphs Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

22 21 Visualization: VisAR Based on parallel coordinates (Techapichetvanich & Datta, ADMA’2005) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

23 22 Extensions Confidence is not the best possible interestingness measure for rules e.g. A very frequent item will always appear in rule consequents, regardless its true relationship with the rule antecedent X went to war  X did not serve in Vietnam (from the US Census) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

24 23 Extensions Desirable properties for interestingness measures Piatetsky-Shapiro, 1991 P1ACC(A ⇒ C) = 0 when supp(A ⇒ C) = supp(A)supp(C) P2 ACC(A ⇒ C) monotonically increases with supp(A ⇒ C) P3ACC(A ⇒ C) monotonically decreases with supp(A) (or supp(C)) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

25 24 Extensions Certainty factors… n n … satisfy Piatetsky-Shapiro’s properties n n … are widely-used in expert systems n n … are not symmetric (as interest/lift) n n … can substitute conviction when CF>0 “Measuring the accuracy and interest of association rules: A new framework",  Berzal, Blanco, Sánchez & Vila: “Measuring the accuracy and interest of association rules: A new framework", Intelligent Data Analysis, 2002 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

26 25 Extensions References: “Evaluation of interestingness measures for ranking discovered knowledge”  Hilderman & Hamilton: “Evaluation of interestingness measures for ranking discovered knowledge”. PAKDD, 2001 “Selecting the right objective measure for association analysis”  Tan, Kumar & Srivastava: “Selecting the right objective measure for association analysis”. Information Systems, vol. 29, pp. 293-313, 2004. “Association rule evaluation for classification purposes”  Berzal, Cubero, Marín, Sánchez, Serrano & Vila: “Association rule evaluation for classification purposes” TAMIDA’2005 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

27 26 Applications Two sample applications where associations rules have been successful n n Classification (ART) n n Anomaly detection (ATBAR) “Discovering Hidden Association Rules ”  Balderas, Berzal, Cubero, Eisman & Marín “Discovering Hidden Association Rules ” KDD’2005, Chicago, Illinois, USA  Berzal, Cubero, Sánchez & Serrano “ART: A hybrid classification model” Machine Learning Journal, 2004 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

28 27 Classification Classification models based on association rules n n Partial classification models vg: Bayardo n n “Associative” classification models vg: CBA (Liu et al.) n n Bayesian classifiers vg: LB (Meretakis et al.) n n Emergent patterns vg: CAEP (Dong et al.) n n Rule trees vg: Wang et al. n n Rules with exceptions vg: Liu et al. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

29 28 GOAL Simple, intelligible, and robust classification models obtained in an efficient and scalable way MEANS Classification Decision Tree Induction + Association Rule Mining =ART [Association Rule Trees] Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

30 29 ART Classification Model IDEA Make use of efficient association rule mining algorithms to build a decision-tree-shaped classification model. ART = Association Rule Tree KEY Association rules + “else” branches Hybrid between decision trees and decision lists Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

31 30 ART Classification Model SPLICE Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

32 31 Construction ART classification model K=1 Rule mining (rules with K items in their LHS) ¿suitable rules? Branch the tree using selected rules and recursively process the “else” branch Yes K=K+1 ¿ K <= MaxSize ? Yes No Create a leaf node labelled with the most frequent class Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

33 32 Construction ART classification model Rule mining: Candidate hypotheses MinSupp Minimum support threshold MinConf Minimum confidence threshold Fixed threshold Automatic selection K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

34 33 Rule selection: n n Rules grouped by sets of attributes. n n Preference criterion. Construction ART classification model K=1 Rule mining Selection Tree level K++ Go on? Tree leaf Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

35 34 Example Dataset ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

36 35 Example Level 1K = 1 ART classification model S1: if (Y=0) then C=0 with confidence 75% if (Y=1) then C=1 with confidence 75% S2: if (Z=0) then C=0 with confidence 75% if (Z=1) then C=1 with confidence 75% LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

37 36 Example Level 1K = 2 LEVEL 1 – Association rule mining Minimum support threshold = 20% Automatic confidence threshold selection S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) S2: if (X=1 and Z=0) then C=0 (100%) if (X=1 and Z=1) then C=1 (100%) S3: if (Y=0 and Z=0) then C=0 (100%) if (Y=1 and Z=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

38 37 Example Level 1 LEVEL 1 Best rule set selection e.g. S1 X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else... S1: if (X=0 and Y=0) then C=0 (100%) if (X=0 and Y=1) then C=1 (100%) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

39 38 Example Level 1  Level 2 ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

40 39 Example Level 2 LEVEL 2 Rule mining S1: if (Z=0) then C=0 with confidence 100% if (Z=1) then C=1 with confidence 100% RESULT X=0 and Y=0: C=0 (2) X=0 and Y=1: C=1 (2) else Z=0: C=0 (2) Z=1: C=1 (2) ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

41 40 Example ART vs. TDIDT ARTTDIDT ART classification model Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

42 41 Classifier accuracy ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

43 42 Classifier complexity ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

44 43 Training time ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

45 44 I/O Operations - Scans ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

46 45 I/O Operations - Records ART classification model > Experimental results

47 46 I/O Operations - Pages ART classification model > Experimental results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

48 47 Final comments ART classification model Classification models n n Acceptable accuracy n n Reduced complexity n n Attribute interactions n n Robustness (noise & primary keys) Classifier building method n n Efficient algorithm n n Good scalability properties n n Automatic parameter selection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

49 48 It is often more interesting to find surprising non-frequent events than frequent ones EXAMPLES n Abnormal network activity patterns in intrusion detection systems. n Exceptions to “common” rules in Medicine (useful for diagnosis, drug evaluation, detection of conflicting therapies…) n … Anomaly detection Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

50 49 Anomaly detection Anomalous association rule Confident rule representing homogeneous deviations from common behavior. Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

51 50 Anomaly detection X¬Y confident X  Y frequent and confident X usually implies Y (dominant rule) When X does not imply Y, then it usually implies A (the Anomaly)  A X Y  ¬Aconfident Anomalous association rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

52 51 Anomaly detection XYA1A1 Z1Z1 … XYA1A1 Z2Z2 … XYA2A2 Z3Z3 … XYA2A2 Z1Z1 … XYA3A3 Z2Z2 … XYA3A3 Z3Z3 … XYAZ… XY3Y3 AZ3Z3 … X Y3Y3 AZ… XY4Y4 AZ… X  Y is the dominant rule X  A when ¬ Y is the anomalous rule Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

53 52 Anomaly detection Suzuki et al.’s “Exception Rules” X  Y is an association rule X  I X  I is the reference rule is the exception rule¬ Y I is the “interacting” itemset   Too many exceptions   The “cause” needs to be present Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

54 53 Anomaly detection: ATBAR Anomalous association rules A #7 AB#6 AC#4 AD#5 AE#3 AF#3 B #9 C #7 D #8 First scan A #7 Second scan B #6 D #5 Non-frequent A #7 A * Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

55 54 Anomaly detection: ATBAR Anomalous association rules B #9 C #7 D #8 First scan A #7 Second scan A #7 A * B #6 D #5 B #9 B * C #7 C * D #8 D * C #6 D #7 D #5 Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

56 55 Anomaly detection: ATBAR Anomalous association rules Rule generation is immediate from the frequent and extended itemsets obtained by ATBAR Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

57 56 Anomaly detection: Results Experiments on health-related datasets from the UCI Machine Learning Repository n n Relatively small set of anomalous rules (typically, >90% reduction with respect to standard association rules) n n Reasonable overhead needed to obtain anomalous association rules (about 20% in ATBAR w.r.t. TBAR) Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

58 57 Anomaly detection: Results An example from the Census dataset: if WORKCLASS: Local-gov then CAPGAIN: [99999.0, 99999.0] (7 out of 7) CAPGAIN: [99999.0, 99999.0] (7 out of 7) when not CAPGAIN: [0.0, 20051.0] Usual consequent “Anomaly” Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

59 58 n n Anomalous association rules (novel characterization of potentially interesting knowledge) n n An efficient algorithm for discovering anomalous association rules: ATBAR n n Some heuristics for filtering the discovered anomalous association rules Anomaly detection: Results Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

60 59 n n Additional heuristics for focusing on interesting anomalies (maybe domain- or even application-specific). n n Alternative measures for the evaluation and ranking of anomalous association rules: Certainty factors / Conviction … Anomaly detection: Future… Motivation Definition Discovery Variations Visualization Extensions Applications ART ATBAR

61 Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática – Universidad de Granada (Spain) CEDI’2005 Taller de Minería de Datos Association Rules: Algorithms, variations, extensions, and applications Questions, comments, and suggestions… Fernando Berzal fberzal@decsai.ugr.es


Download ppt "Intelligent Databases and Information Systems research group Department of Computer Science and Artificial Intelligence E.T.S Ingeniería Informática –"

Similar presentations


Ads by Google