Download presentation
1
SAS Homework 3 Review Association rules mining
MIS2502 Data Analytics
2
SAS Homework 3 Review Association Rules
Using Transactions Data Set Reject Store and Quantity – don’t need them Assign ID to Transaction (Nominal) – this is our ‘basket’ Target to Product (Nominal) - this is what we’re trying to determine but now its not a Y/N(binary) Step 8 = Transaction ! Add an Associations node (Model) In Properties Export Rule by ID = Yes Answer some questions regarding the Association Rules Evaluate Support, Confidence and Lift
3
Set Up Retail – associations between items purchased from Health/Beauty and Stationary. 400K + transactions collected from POS Products bar soap bows candy bars deodorant greeting cards magazines markers pain relievers pencils pens perfume photo processing prescription medications shampoo toothbrushes toothpaste wrapping paper We are using 2
4
Association Rules - Diagram
Right Click and Run . Then view results…..
5
Process Set rule thresholds Define Item Sets
Read through Item Sets, create list of all possible association rules (X => Y) for the Item Sets Compute Support, Confidence and Lift for each Rule Support, frequency count of occurrence/ all transactions for both the individual items (X and, Y) and for the ItemSet (X,Y) Confidence , strength of the association. How often Y appears in baskets that contain X count (X=>Y)/count(X) Expected Confidence X=>Y is the probability that one of the baskets has Y Lift = s (X->Y)/s(X)*s(Y) Or, in SAS, (confidence/expected confidence ) Drop those that don’t meet thresholds
6
Evaluating the Statistics
Support – frequency: % occurrence of ItemSet in data Confidence – strength: % right hand occurs in left Lift – dependence: prob of dependent occurrence /prob of random occurrence (>1) Support v Confidence Blue – 2 variable , - Red 3 variable Confidence Plot Left v Right (red = high) range at bottom Confidence v Expected Confidence Diff is Lift <=Ordered by lift on x axis
7
Evaluating the Rules Table view>rules>rule table
8
In Class
9
In Class 1) Which rule(s) have the highest confidence? MUSICSTREAM ==> WEBSITE 2) Which rule(s) have the highest support? WEBSITE ==> PODCAST and PODCAST ==> WEBSITE 3) Which rule(s) have the highest lift? ARCHIVE ==> WEBSITE and WEBSITE ==> ARCHIVE 4) What are the two rule “pairs” in the list above? ARCHIVE ==> WEBSITE/WEBSITE ==> ARCHIVE and WEBSITE ==> PODCAST/PODCAST ==> WEBSITE 5) What other service “goes the most” with visiting the website for general information (WEBSITE)? In other words, what other service are WEBSITE visitors most likely to seek out? What statistic did you use to figure this out? ARCHIVE – LIFT is greater than 1. This implies that this isn’t just random chance – people are actively seeking out the WEBSITE if they’ve used the ARCHIVE.
10
In Class 6) What other service seems to “go the least” with visiting the website for general information (WEBSITE)? In other words, what other service are WEBSITE visitors least likely to seek out? What statistic did you use to figure this out? PODCAST – LIFT is less than 1. This also implies that this isn’t just random chance – but this time, people who visit the web site are particularly unlikely to also download a podcast. 7) The rule MUSICSTREAM ==> WEBSITE has poor lift (i.e., less than 1), but the rule has the highest confidence. Explain how this is possible. It could be that many people use both MUSICSTREAM and WEBSITE so it appears in visitors’ set of services a lot. However, there can still be a negative effect of one on the other. For example, I use the website a lot, and I use music streaming a lot, but I’m still less likely to do one if I’ve done the other – possibly they are substitutes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.