Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.

Slides:



Advertisements
Similar presentations
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Relationship Mining Association Rule Mining Week 5 Video 3.
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining, Frequent-Itemset Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
CS 349: Market Basket Data Mining All about beer and diapers.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 26, 2012.
Data Mining Jim King. What is Data Mining?  A.k.a. knowledge discovery The search for previously unknown relationships in large data setsThe search for.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 17, 2012.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 3, 2013.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Association Analysis: Basic Concepts and Algorithms
Core Methods in Educational Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Core Methods in Educational Data Mining
Association Analysis: Basic Concepts
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012

Today’s Class Association Rule Mining

Today’s Class The Land of Inconsistent Terminology

Association Rule Mining Try to automatically find simple if-then rules within the data set Another method that can be applied when you don’t know what structure there is in your data Unlike clustering, association rules are often obviously actionable

Example Famous (and fake) example: – People who buy more diapers buy more beer If person X buys diapers, Person X buys beer Conclusion: put expensive beer next to the diapers

Interpretation #1 Guys are sent to the grocery store to buy diapers, they want to have a drink down at the pub, but they buy beer to get drunk at home instead

Interpretation #2 There’s just no time to go to the bathroom during a major drinking bout

Serious Issue Association rules imply causality by their if- then nature But causality can go either direction

Intervention Put expensive beer next to the diapers in your store

If-conditions can be more complex If person X buys diapers, and person X is male, and it is after 7pm, then person Y buys beer

Then-conditions can also be more complex If person X buys diapers, and person X is male, and it is after 7pm, then person Y buys beer and Cornish pasties Can be harder to use, sometimes eliminated from consideration

Association Rule Mining Find rules Evaluate rules

Association Rule Mining Find rules Evaluate rules

Rule Evaluation What would make a rule “good”?

Rule Evaluation Support/Coverage Confidence “Interestingness”

Support/Coverage Number of data points that fit the rule, divided by the total number of data points (Variant: just the number of data points that fit the rule)

Example Took PSY503Took PSY Rule: If a student took PSY503, the student took PSY505 Support/coverage?

Example Took PSY503Took PSY Rule: If a student took PSY503, the student took PSY505 Support/coverage? 2/11 = 0.22

Confidence Number of data points that fit the rule, divided by the number of data points that fit the rule’s IF condition Equivalent to precision in classification Also referred to as accuracy, just to make things confusing NOT equivalent to accuracy in classification

Example Took PSY503Took PSY Rule: If a student took PSY503, the student took PSY505 Confidence?

Example Took PSY503Took PSY Rule: If a student took PSY503, the student took PSY505 Confidence? 2/6 = 0.33

Shockingly… The association rule mining community differs from most other methodological communities by acknowledging that cut-offs for support and confidence are arbitrary Researchers typically adjust them to find a desirable number of rules to investigate, ordering from best-to-worst… Rather than arbitrarily saying that all rules over a certain cut-off are “good”

Why? Why aren’t support and confidence enough?

Why? Why aren’t support and confidence enough? Possible to generate large numbers of trivial associations – Students who took a course took its prerequisites (Vialardi et al., 2009)

Why? Why aren’t support and confidence enough? Possible to generate large numbers of trivial associations – Students who took a course took its prerequisites (Vialardi et al., 2009) – Students who do poorly on the exams fail the course (El-Halees, 2009)

Why? Why aren’t support and confidence enough? Possible to generate large numbers of trivial associations – Students who took a course took its prerequisites (Vialardi et al., 2009) – Students who do poorly on the exams fail the course (El-Halees, 2009) – Students who game the system don’t learn as much

Why? Why aren’t support and confidence enough? Possible to generate large numbers of trivial associations – Students who took a course took its prerequisites (Vialardi et al., 2009) – Students who do poorly on the exams fail the course (El-Halees, 2009) – Students who game the system don’t learn as much (umpteen papers by some bozo named Baker)

Why? Why aren’t support and confidence enough? Possible to generate large numbers of trivial associations – Students who took a course took its prerequisites (Vialardi et al., 2009) – Students who do poorly on the exams fail the course (El-Halees, 2009) – Students who game the system don’t learn as much (umpteen papers by some bozo named Baker)

Interestingness Not quite what it sounds like Typically defined as the other measures of the degree of statistical support in other fashions Rather than an actual measure of the novelty or usefulness of the discovery – Would be great if researchers would pay more attention to this – A hard problem

Potential Interestingness Measures Cosine P(A^B) sqrt(P(A)*P(B)) Measures co-occurrence Merceron & Yacef approved for being easy to interpret (numbers closer to 1 than 0 are better; over 0.65 is desirable)

Potential Interestingness Measures Lift Confidence(A->B) P(B) Measures whether data points that have both A and B are more common than data points only containing B Merceron & Yacef approved for being easy to interpret (lift over 1 indicates stronger association)

Merceron & Yacef recommendation Rules with high cosine or high lift are interesting

Other Interestingness Meaures (Tan, Kumar, & Srivastava, 2002)

Other idea for selection Select rules based both on interestingness and based on being different from other rules already selected (e.g. involve different operators)

Open debate in the field…

How could we get at “Real Interestingness”?

Association Rule Mining Find rules Evaluate rules

The Apriori algorithm (Agrawal et al., 1996) 1.Generate frequent itemset 2.Generate rules from frequent itemset

Generate Frequent Itemset Generate all single items, take those with support over threshold – {i1} Generate all pairs of items from items in {i1}, take those with support over threshold – {i2} Generate all triplets of items from items in {i2}, take those with support over threshold – {i3} And so on… Then form joint itemset of all itemsets

Generate Rules From Frequent Itemset Given a frequent itemset, take all items with at least two components Generate rules from these items – E.g. {A,B,C,D} leads to {A,B,C}->D, {A,B,D}->C, {A,B}->{C,D}, etc. etc. Eliminate rules with confidence below threshold

Finally Rank the resulting rules using your interest measures

Other Algorithms Typically differ primarily in terms of style of search for rules

Questions? Comments?

Variant on association rules Negative association rules (Brin et al., 1997) – What doesn’t go together? (especially if probability suggests that two things should go together) – People who buy diapers don’t buy car wax, even though 30-year old males buy both? – People who take PSY505 don’t take PSY503? – Students who don’t game the system don’t go off- task?

Rules in Education What might be some reasonable applications for Association Rule Mining in education?

Asgn. 8 Questions? Comments?

Reminder Wednesday’s class cancelled I will be at an NSF meeting – In fact, I’m leaving for the airport… now

Next Class Monday, March 26 3pm-5pm AK232 Sequential Pattern Mining Readings Srikant, R., Agrawal, R. (1996) Mining Sequential Patterns: Generalizations and Performance Improvements. Research Report: IBM Research Division. San Jose, CA: IBM. [pdf][pdf] Perera, D., Kay, J., Koprinska, I., Yacef, K., Zaiane, O. (2009) Clustering and Sequential Pattern Mining of Online Collaborative Learning Data. IEEE Transactions on Knowledge and Data Engineering, 21, Assignments Due: 8. SEQUENTIAL PATTERN MINING

The End