1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Data Mining Techniques Association Rule
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Organization “Association Analysis”
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Frequent Patterns I: Association Rule Discovery Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Association Rules
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Chapter 2: Association Rules & Sequential Patterns.
Chapter 2: Association Rules & Sequential Patterns.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining  Association Rule  Classification  Clustering.
Association Rules & Sequential Patterns. CS583, Bing Liu, UIC 2 Road map Basic concepts of Association Rules Apriori algorithm Sequential pattern mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Find information from data data ? information.
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association rule mining
Mining Association Rules
Frequent Pattern Mining
©Jiawei Han and Micheline Kamber
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
©Jiawei Han and Micheline Kamber
Association Analysis: Basic Concepts
Presentation transcript:

1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. F. Provost (Stern, NYU) Prof. B. Liu, UIC

What Is Association Mining? Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: – Market Basket analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. 2

Association Mining? Examples. – Rule form: “Body  ead [support, confidence]”. – buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] – buys(x, "bread")  buys(x, "milk") [0.6%, 65%] – major(x, "CS") /\ takes(x, "DB")  grade(x, "A") [1%, 75%] – age(X,30-45) /\ income(X, 50K-75K)  buys(X, SUVcar) – age=“30-45”, income=“50K-75K”  car=“SUV”

Market-basket analysis and finding associations Do items occur together? (more than I might expect) Proposed by Agrawal et al in It is an important data mining model studied extensively by the database and data mining community. Assume all data are categorical. No good algorithm for numeric data. Initially used for Market Basket Analysis to find how items purchased by customers are related. Bread  Milk [sup = 5%, conf = 100%]

Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: all rules that correlate the presence of one set of items with that of another set of items – E.g., 98% of people who purchase tires and auto accessories also get automotive services done Applications – *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) – Home Electronics  * (What other products should the store stocks up?) – Detecting “ping-pong”ing of patients, faulty “collisions” 5

Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction 6 Market-Basket transactions Example of Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk}, Implication means co-occurrence, not causality! An itemset is simply a set of items

Association Rule Mining –We are interested in rules that are non-trivial (and possibly unexpected) actionable easily explainable 7

Examples from a Supermarket Can you think of association rules from a supermarket? Let’s say you identify association rules from a supermarket, how might you exploit them? – That is, if you are the store manager, how might you make money? Assume you have a rule of the form X  Y 8

Supermarket examples If you have a rule X  Y, you could: – Run a sale on X if you want to increase sales of Y – Locate the two items near each other – Locate the two items far from each other to make the shopper walk through the store – Print out a coupon on checkout for Y if shopper bought X but not Y 9

Association “rules” – standard format Rule format: (A set can consist of just a single item) If {set of items}  Then {set of items} Condition implies Results If {Diapers, Baby Food} Condition {Beer, Chips} Results Then Customer buys diaper Customer buys both Customer buys beer

What is an interesting association? Requires domain-knowledge validation – actionable vs. trivial vs. inexplicable Algorithms provide first-pass based on statistics on how “unexpected” an association is Some standard statistics used: C  R – support ≈ p(R&C) percent of “baskets” where rule holds – confidence ≈ p(R|C) percent of times R holds when C holds

Support and Confidence Find all the rules X  Y with minimum confidence and support – Support = probability that a transaction contains {X,Y} i.e., ratio of transactions in which X, Y occur together to all transactions in database. – Confidence = conditional probability that a transaction having X also contains Y i.e., ratio of transactions in which X, Y occur together to those in which X occurs. In general confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS: Confidence (LHS => RHS) = Support(LHS  RHS) / Support(LHS) Customer buys diaper Customer buys both Customer buys beer

Definition: Frequent Itemset Itemset – A collection of one or more items Example: {Milk, Bread, Diaper} – k-itemset An itemset that contains k items Support count (  ) – Frequency of occurrence of itemset – E.g.  ({Milk, Bread,Diaper}) = 2 Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold 13

Definition: Association Rule Example: l Association Rule – An implication expression of the form X  Y, where X and Y are itemsets – Example: {Milk, Diaper}  {Beer} l Rule Evaluation Metrics – Support (s)  Fraction of transactions that contain both X and Y – Confidence (c)  Measures how often items in Y appear in transactions that contain X

Support and Confidence - Example Itemset {A, C} has a support of 2/5 = 40% Rule {A} ==> {C} has confidence of 50% Rule {C} ==> {A} has confidence of 100% Support for {A, C, E} ? Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Itemset {A, C} has a support of 2/5 = 40% Rule {A} ==> {C} has confidence of 50% Rule {C} ==> {A} has confidence of 100% Support for {A, C, E} ? Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

CS583, Bing Liu, UIC 16 Example Transaction data Assume: minsup = 30% minconf = 80% An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] Association rules from the itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] …… Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3] t1:Beef, Chicken, Milk t2:Beef, Cheese t3:Cheese, Boots t4:Beef, Chicken, Cheese t5:Beef, Chicken, Clothes, Cheese, Milk t6:Chicken, Clothes, Milk t7:Chicken, Milk, Clothes

Mining Association Rules Example of Rules: {Milk,Diaper}  {Beer} (s=0.4, c=0.67) {Milk,Beer}  {Diaper} (s=0.4, c=1.0) {Diaper,Beer}  {Milk} (s=0.4, c=0.67) {Beer}  {Milk,Diaper} (s=0.4, c=0.67) {Diaper}  {Milk,Beer} (s=0.4, c=0.5) {Milk}  {Diaper,Beer} (s=0.4, c=0.5) Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements

Drawback of Confidence Coffee Tea15520 Tea Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) =

Mining Association Rules Two-step approach: 1.Frequent Itemset Generation – Generate all itemsets whose support  minsup 2.Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive

CS583, Bing Liu, UIC20 Transaction data representation A simplistic view of shopping baskets, Some important information not considered. E.g, – the quantity of each item purchased and – the price paid.

CS583, Bing Liu, UIC21 Many mining algorithms There are a large number of them!! They use different strategies and data structures. Their resulting sets of rules are all the same. – Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined. Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different. We study only one: the Apriori Algorithm

CS583, Bing Liu, UIC22 The Apriori algorithm The best known algorithm Two steps: – Find all itemsets that have minimum support (frequent itemsets, also called large itemsets). – Use frequent itemsets to generate rules. E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]

CS583, Bing Liu, UIC23 Step 1: Mining all frequent itemsets A frequent itemset is an itemset whose support is ≥ minsup. Key idea: The apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets AB AC AD BC BD CD A B C D ABC ABD ACD BCD

Steps in Association Rule Discovery Find the frequent itemsets – Frequent item sets are the sets of items that have minimum support – Support is “downward closed”, so, a subset of a frequent itemset must also be a frequent itemset if {AB} is a frequent itemset, both {A} and {B} are frequent itemsets this also means that if an itemset that doesn’t satisfy minimum support, none of its supersets will either (this is essential for pruning search space) – Iteratively find frequent itemsets with cardinality from 1 to k (k- itemsets) Use the frequent itemsets to generate association rules

Frequent Itemset Generation 25 Given d items, there are 2 d possible candidate itemsets

Mining Association Rules—An Example For rule A  C: support = support({A,C}) = 50% confidence = support({A,C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent 26 Min. support 50% Min. confidence 50% User specifies these

Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support – A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset. Why? Make sure you can explain this. – Iteratively find frequent itemsets with cardinality from 1 to k (k- itemset) Use the frequent itemsets to generate association rules – This step is more straightforward and requires less computation so we focus on the first step 27

28 Illustrating Apriori Principle Found to be Infrequent Pruned supersets

The Apriori Algorithm Terminology: – C k is the set of candidate k-itemsets – L k is the set of k-itemsets Join Step: C k is generated by joining the set L k-1 with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset – This is a bit confusing since we want to use it the other way. We prune a candidate k-itemset if any of its k-1 itemsets are not in our list of frequent k-1 itemsets To utilize this you simply start with k=1, which is single-item itemsets and they you work your way up from there! 29

CS583, Bing Liu, UIC30 The Algorithm Iterative algo. (also called level-wise search): Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on. – In each iteration k, only consider itemsets that contain some k-1 frequent itemset. Find frequent itemsets of size 1: F 1 From k = 2 – C k = candidates of size k: those itemsets of size k that could be frequent, given F k-1 – F k = those itemsets that are actually frequent, F k  C k (need to scan the database once).

CS583, Bing Liu, UIC31 Apriori candidate generation The candidate-gen function takes L k-1 and returns a superset (called the candidates) of the set of all frequent k-itemsets. It has two steps – join step: Generate all possible candidate itemsets C k of length k – prune step: Remove those candidates in C k that cannot be frequent.

How to Generate Candidates? Suppose the items in L k-1 are listed in an order Step 1: self-joining L k-1 – The description below is a bit confusing– all we do is splice two sets together so that only one new item is added (see example) insert into C k select p.item 1, p.item 2, …, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1, …, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1 Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete c from C k 32

Example of Generating Candidates L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 – abcd from abc and abd – acde from acd and ace Pruning: – acde is removed because ade is not in L 3 C 4 ={abcd} 33

CS583, Bing Liu, UIC34 Example – Finding frequent itemsets Dataset T TIDItems T1001, 3, 4 T2002, 3, 5 T3001, 2, 3, 5 T4002, 5 itemset:count 1. scan T  C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3  F 1 : {1}:2, {2}:3, {3}:3, {5}:3  C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} 2. scan T  C 2 : { 1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2  F 2 : { 1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2  C 3 : {2, 3,5} 3. scan T  C 3 : {2, 3, 5}:2  F 3: {2, 3, 5} minsup=0.5

The Apriori Algorithm — Example (minsup = 30%) 35 Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

CS583, Bing Liu, UIC36 Step 2: Generating rules from frequent itemsets Frequent itemsets  association rules One more step is needed to generate association rules For each frequent itemset X, For each proper nonempty subset A of X, – Let B = X - A – A  B is an association rule if Confidence(A  B) ≥ minconf, support(A  B) = support(A  B) = support(X) confidence(A  B) = support(A  B) / support(A)

CS583, Bing Liu, UIC37 Generating rules: an example Suppose {2,3,4} is frequent, with sup=50% – Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively – These generate these association rules: 2,3  4, confidence=100% 2,4  3, confidence=100% 3,4  2, confidence=67% 2  3,4, confidence=67% 3  2,4, confidence=67% 4  2,3, confidence=67% All rules have support = 50%

CS583, Bing Liu, UIC38 Generating rules: summary To recap, in order to obtain A  B, we need to have support(A  B) and support(A) All the required information for confidence computation has already been recorded in itemset generation. No need to see the data T any more. This step is not as time-consuming as frequent itemsets generation.

CS583, Bing Liu, UIC39 On Apriori Algorithm Seems to be very expensive Level-wise search K = the size of the largest itemset It makes at most K passes over data In practice, K is bounded (10). The algorithm is very fast. Under some conditions, all rules can be found in linear time. Scale up to large data sets

Granularity of items One exception to the “ease” of applying association rules is selecting the granularity of the items. Should you choose: – diet coke? – coke product? – soft drink? – beverage? Should you include more than one level of granularity? Be careful (Some association finding techniques allow you to represent hierarchies explicitly)

Multiple-Level Association Rules Items often form a hierarchy – Items at the lower level are expected to have lower support – Rules regarding itemsets at appropriate levels could be quite useful – Transaction database can be encoded based on dimensions and levels Food MilkBread Skim 2%WheatWhite

Mining Multi-Level Associations A top_down, progressive deepening approach –First find high-level strong rules: milk  bread [20%, 60%] –Then find their lower-level “weaker” rules: 2% milk  wheat bread [6%, 50%] –When one threshold set for all levels; if support too high then it is possible to miss meaningful associations at low level; if support too low then possible generation of uninteresting rules different minimum support thresholds across multi-levels lead to different algorithms (e.g., decrease min-support at lower levels) Variations at mining multiple-level association rules –Level-crossed association rules: milk  wonder wheat bread –Association rules with multiple, alternative hierarchies: 2% milk  wonder bread

Rule Generation Now that you have the frequent itemsets, you can generate association rules – Split the frequent itemsets in all possible ways and prune if the confidence is below min_confidence threshold Rules that are left are called strong rules – You may be given a rule template that constrains the rules Rules with only one item on the right side Rules with two items on the left and one on the right – Rules of form {X,Y}  Z 43

Rules from Previous Example What are the strong rules of the form {X,Y}  Z if the confidence threshold is 75%? – We start with {2,3,5} {2,3}  5 (confidence = 2/2 = 100%): STRONG {3,5}  2 (confidence = 2/2 = 100%): STRONG {2,5}  3 (confidence = 2/3 = 66%): PRUNE! Note that in general you don’t just look at the frequent itemsets of maximum length. If we wanted strong rules of form X  Y we would look at C2 44

Interestingness Measurements Objective measures – Two popular measurements: Support Confidence Subjective measures (Silberschatz & Tuzhilin, KDD95) A rule (pattern) is interesting if – it is unexpected (surprising to the user); and/or – actionable (the user can do something with it) 45

Criticism to Support and Confidence Example 1: – Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal – play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. – play basketball  not eat cereal [20%, 33.3%] is far more interesting, although with lower support and confidence 46 Lift of A => B = P(B|A)/P(B) and a rule is interesting if lift is not near 1.0 What is the lift of this rule? (1/3)/(1250/5000) = 1.33

Customer Number vs. Transaction ID In the homework you may have a problem where there is a customer id for each transaction – You can be asked to do association analysis based on the customer id If this is so, you need to aggregate the transactions to the customer level 47

Market-basket analysis and finding associations Do items occur together? (more than I might expect) Why might I care? – merchandising e.g., placing products in a retail space (physical or electronic), catalog design packaging optional services – recommendations cross-selling and up-selling opportunities mining credit-card data developing customer loyalty and self-investment – fraud detection e.g., in insurance data, a doctor very often works on cases of particular lawyer – simply understanding my business are there “investment profiles” of my clients? customer segmentation based on buying behavior is anything strange going on?

Virtual items If you’re interested in including other possible variables, can create “virtual items” gift-wrap, used-coupon, new-store, winter-holidays, bought-nothing,…

Associations: Pros and Cons Pros – can quickly mine patterns describing business/customers/etc. without major effort in problem formulation – virtual items allow much flexibility – unparalleled tool for hypothesis generation Cons – unfocused not clear exactly how to apply mined “knowledge” only hypothesis generation – can produce many, many rules! may only be a few nuggets among them (or none)

Association Rules Association rule types: – Actionable Rules – contain high-quality, actionable information – Trivial Rules – information already well-known by those familiar with the business – Inexplicable Rules – no explanation and do not suggest action Trivial and Inexplicable Rules occur most often