SAS Homework 3 Review Association rules mining

Slides:



Advertisements
Similar presentations
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Advertisements

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Bivariate Analysis Cross-tabulation and chi-square.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Section 5.1 and 5.2 Probability
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 14 Comparing two groups Dr Richard Bußmann.
Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
COMP53311 Data Stream Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
QUANTITATIVE DATA ANALYSIS
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
CEEN-2131 Business Statistics: A Decision-Making Approach CEEN-2130/31/32 Using Probability and Probability Distributions.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
The Real Zeros of a Polynomial Function
Section 1.2 Continued Discrimination in the Workplace: Inference through Simulation: Discussion.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
Represent Functions as Graphs
1.8 Represent Functions as graphs
10 – 3 Data Distribution Mean, Median, Mode and Range The Basics of Statistics Mr. K.
Steps in Using the and R Chart
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
Homework #4 HTML Web Assignment II ©2001 E. Kinnear.
SAS Homework 4 Review Clustering and Segmentation
Example simulation execution The Able Bakers Carhops Problem There are situation where there are more than one service channel. Consider a drive-in restaurant.
Summary Statistics Review
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Chap 4-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 4 Using Probability and Probability.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Introductory Statistics. Learning Objectives l Distinguish between different data types l Evaluate the central tendency of realistic business data l Evaluate.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
The Three Analytics Techniques. Decision Trees – Determining Probability.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
EXAMPLE 1 Graph a function
Charles Tappert Seidenberg School of CSIS, Pace University
Second Nine Weeks. October 12, 2010 Turn the fraction into a percent
Randomness, Probability, and Simulation
S urvey Question #1: How many pairs of shoes do you own? S urvey Question #2: How many times a month do you go to the mall? Name Class Period Date.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Math 7  Bellwork #2  Please have the following on your desk: Name Tag Math book HW Red Pen Pencil.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1.7 Warm Up Warm Up Lesson Quiz Lesson Quiz Lesson Presentation Lesson Presentation Represent Functions as Graphs.
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
MIS2502: Data Analytics Association Rule Mining David Schuff
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining – Association Rules
Information used to create graphs and find statistics
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Mining
Click the mouse button or press the Space Bar to display the answers.
Data Science in Industry
Steps in Using the and R Chart
HIMS 650 Homework set 5 Putting it all together
MIS2502: Data Analytics Association Rule Learning
Charles Tappert Seidenberg School of CSIS, Pace University
Presentation transcript:

SAS Homework 3 Review Association rules mining MIS2502 Data Analytics

SAS Homework 3 Review Association Rules Using Transactions Data Set Reject Store and Quantity – don’t need them Assign ID to Transaction (Nominal) – this is our ‘basket’ Target to Product (Nominal) - this is what we’re trying to determine but now its not a Y/N(binary) Step 8 = Transaction ! Add an Associations node (Model) In Properties Export Rule by ID = Yes Answer some questions regarding the Association Rules Evaluate Support, Confidence and Lift

Set Up Retail – associations between items purchased from Health/Beauty and Stationary. 400K + transactions collected from POS Products bar soap bows candy bars deodorant greeting cards magazines markers pain relievers pencils pens perfume photo processing prescription medications shampoo toothbrushes toothpaste wrapping paper We are using 2

Association Rules - Diagram Right Click and Run . Then view results…..

Process Set rule thresholds Define Item Sets Read through Item Sets, create list of all possible association rules (X => Y) for the Item Sets Compute Support, Confidence and Lift for each Rule Support, frequency count of occurrence/ all transactions for both the individual items (X and, Y) and for the ItemSet (X,Y) Confidence , strength of the association. How often Y appears in baskets that contain X count (X=>Y)/count(X) Expected Confidence X=>Y is the probability that one of the baskets has Y Lift = s (X->Y)/s(X)*s(Y) Or, in SAS, (confidence/expected confidence ) Drop those that don’t meet thresholds

Evaluating the Statistics Support – frequency: % occurrence of ItemSet in data Confidence – strength: % right hand occurs in left Lift – dependence: prob of dependent occurrence /prob of random occurrence (>1) Support v Confidence Blue – 2 variable , - Red 3 variable Confidence Plot Left v Right (red = high) range at bottom Confidence v Expected Confidence Diff is Lift <=Ordered by lift on x axis

Evaluating the Rules Table view>rules>rule table

In Class

In Class 1) Which rule(s) have the highest confidence? MUSICSTREAM ==> WEBSITE 2) Which rule(s) have the highest support? WEBSITE ==> PODCAST and PODCAST ==> WEBSITE 3) Which rule(s) have the highest lift? ARCHIVE ==> WEBSITE and WEBSITE ==> ARCHIVE 4) What are the two rule “pairs” in the list above? ARCHIVE ==> WEBSITE/WEBSITE ==> ARCHIVE and WEBSITE ==> PODCAST/PODCAST ==> WEBSITE 5) What other service “goes the most” with visiting the website for general information (WEBSITE)? In other words, what other service are WEBSITE visitors most likely to seek out? What statistic did you use to figure this out? ARCHIVE – LIFT is greater than 1. This implies that this isn’t just random chance – people are actively seeking out the WEBSITE if they’ve used the ARCHIVE.

In Class 6) What other service seems to “go the least” with visiting the website for general information (WEBSITE)? In other words, what other service are WEBSITE visitors least likely to seek out? What statistic did you use to figure this out? PODCAST – LIFT is less than 1. This also implies that this isn’t just random chance – but this time, people who visit the web site are particularly unlikely to also download a podcast. 7) The rule MUSICSTREAM ==> WEBSITE has poor lift (i.e., less than 1), but the rule has the highest confidence. Explain how this is possible. It could be that many people use both MUSICSTREAM and WEBSITE so it appears in visitors’ set of services a lot. However, there can still be a negative effect of one on the other. For example, I use the website a lot, and I use music streaming a lot, but I’m still less likely to do one if I’ve done the other – possibly they are substitutes.