Relationship Mining Association Rule Mining Week 5 Video 3.

Slides:



Advertisements
Similar presentations
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.
Advertisements

Association Rule Mining
Relationship Mining Sequential Pattern Mining Week 5 Video 4.
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
CS 349: Market Basket Data Mining All about beer and diapers.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney.
CS130 – Software Tools Fall 2010 Statistics and PASW Wrap-up 1.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory WEBKDD 2002 Edmonton,
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
Measuring Association Rules Shan “Maggie” Duanmu Project for CSCI 765 Dec 9 th 2002.
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Charles Tappert Seidenberg School of CSIS, Pace University
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 17, 2012.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
Data Mining Association Analysis: Basic Concepts and Algorithms
Core Methods in Educational Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
Core Methods in Educational Data Mining
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

Relationship Mining Association Rule Mining Week 5 Video 3

Association Rule Mining  Try to automatically find simple if-then rules within the data set

Example  Famous (and fake) example:  People who buy more diapers buy more beer  If person X buys diapers,  Person X buys beer  Conclusion: put expensive beer next to the diapers

Interpretation #1  Guys are sent to the grocery store to buy diapers, they want to have a drink down at the pub, but they buy beer to get drunk at home instead

Interpretation #2  There’s just no time to go to the bathroom during a major drinking bout

Serious Issue  Association rules imply causality by their if-then nature  But causality can go either direction

If-conditions can be more complex  If person X buys diapers, and person X is male, and it is after 7pm, then person Y buys beer

Then-conditions can also be more complex  If person X buys diapers, and person X is male, and it is after 7pm, then person Y buys beer and tortilla chips and salsa  Can be harder to use, sometimes eliminated from consideration

Useful for…  Generating hypotheses to study further  Finding unexpected connections  Is there a surprisingly ineffective instructor or math problem?  Are there e-learning resources that tend to be selected together?

Association Rule Mining  Find rules  Evaluate rules

Association Rule Mining  Find rules  Evaluate rules

Rule Evaluation  What would make a rule “good”?

Rule Evaluation  Support/Coverage  Confidence  “Interestingness”

Support/Coverage  Number of data points that fit the rule, divided by the total number of data points  (Variant: just the number of data points that fit the rule)

Example Took Adv. DMTook Intro Stat Rule: If a student took Advanced Data Mining, the student took Intro Statistics Support/coverage?

Example Took Adv. DMTook Intro Stat Rule: If a student took Advanced Data Mining, the student took Intro Statistics Support/coverage? 2/11=

Confidence  Number of data points that fit the rule, divided by the number of data points that fit the rule’s IF condition  Equivalent to precision in classification  Also referred to as accuracy, just to make things confusing  NOT equivalent to accuracy in classification

Example Took Adv. DMTook Intro Stat Rule: If a student took Advanced Data Mining, the student took Intro Statistics Confidence?

Example Took Adv. DMTook Intro Stat Rule: If a student took Advanced Data Mining, the student took Intro Statistics Confidence? 2/6 = 0.33

Arbitrary Cut-offs  The association rule mining community differs from most other methodological communities by acknowledging that cut-offs for support and confidence are arbitrary  Researchers typically adjust them to find a desirable number of rules to investigate, ordering from best-to- worst…  Rather than arbitrarily saying that all rules over a certain cut-off are “good”

Other Metrics  Support and confidence aren’t enough  Why not?

Why not?  Possible to generate large numbers of trivial associations  Students who took a course took its prerequisites (AUTHORS REDACTED, 2009)  Students who do poorly on the exams fail the course (AUTHOR REDACTED, 2009)

Interestingness

 Not quite what it sounds like  Typically defined as measures other than support and confidence  Rather than an actual measure of the novelty or usefulness of the discovery  Which, to be fair, would be very difficult to create

Potential Interestingness Measures  Cosine P(A^B) sqrt(P(A)*P(B))  Measures co-occurrence  Merceron & Yacef (2008) note that it is easy to interpret (numbers closer to 1 than 0 are better; over 0.65 is desirable)

Quiz Took Adv. DMTook Intro Stat If a student took Advanced Data Mining, the student took Intro Statistics Cosine? A)0.160 B)0.309 C)0.519 D)0.720

Potential Interestingness Measures  Lift Confidence(A->B) P(B)  Measures whether data points that have both A and B are more common than data points only containing B  Merceron & Yacef (2008) note that it is easy to interpret (lift over 1 indicates stronger association)

Quiz Took Adv. DMTook Intro Stat If a student took Advanced Data Mining, the student took Intro Statistics Lift? A)0.333 B)0.429 C)0.500 D)0.643

Merceron & Yacef recommendation  Rules with high cosine or high lift should be considered interesting

Other Interestingness measures (Tan, Kumar, & Srivastava, 2002)

Other idea for selection  Select rules based both on interestingness and based on being different from other rules already selected (e.g., involve different operators)

Open debate in the field…

Association Rule Mining  Find rules  Evaluate rules

The Apriori algorithm (Agrawal et al., 1996) 1. Generate frequent itemset 2. Generate rules from frequent itemset

Generate Frequent Itemset  Generate all single items, take those with support over threshold – {i1}  Generate all pairs of items from items in {i1}, take those with support over threshold – {i2}  Generate all triplets of items from items in {i2}, take those with support over threshold – {i3}  And so on…  Then form joint itemset of all itemsets

Generate Rules From Frequent Itemset  Given a frequent itemset, take all items with at least two components  Generate rules from these items  E.g. {A,B,C,D} leads to {A,B,C}->D, {A,B,D}->C, {A,B}- >{C,D}, etc. etc.  Eliminate rules with confidence below threshold

Finally  Rank the resulting rules using your interest measures

Other Algorithms  Typically differ primarily in terms of style of search for rules

Variant on association rules  Negative association rules (Brin et al., 1997)  What doesn’t go together? (especially if probability suggests that two things should go together)  People who buy diapers don’t buy car wax, even though 30-year old males buy both?  People who take advanced data mining don’t take hierarchical linear models, even though everyone who takes either has advanced math?  Students who game the system don’t go off-task?

Next lecture  Sequential Pattern Mining