Core Methods in Educational Data Mining

Slides:



Advertisements
Similar presentations
Pre and Post Assessments A quick and easy way to assess your Student Learning Outcomes.
Advertisements

Relationship Mining Association Rule Mining Week 5 Video 3.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Rakesh Agrawal Ramakrishnan Srikant
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
How S.M.A.R.T are your goals?. When setting a goal make sure it is… Specific M A R T Specific means… EXACT or CLEAR Saying that you want to do well in.
Overcoming Objectives
Ch5 Mining Frequent Patterns, Associations, and Correlations
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Probability Rules!! Chapter 15.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
ConcepTest • Section 2.3 • Question 1
QM Spring 2002 Business Statistics Probability Distributions.
Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory WEBKDD 2002 Edmonton,
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 17, 2012.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
CONFERENCE EVALUATION DATA ANALYSIS. DATA ANALYSIS  A credible amount of data has to be collected to allow for a substantial analysis  Information collected.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Microsoft produces a New operating system on a disk. There is 0
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 April 15, 2013.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Stats Methods at IC Lecture 3: Regression.
MA/CSSE 473 Day 07 Extended Euclid's Algorithm Modular Division
Rounding Tens and Hundreds
Advanced Methods and Analysis for the Learning and Social Sciences
Data Mining Association Analysis: Basic Concepts and Algorithms
Core Methods in Educational Data Mining
Testing Hypotheses about Proportions
Applied Biostatistics: Lecture 2
PHI 208 RANK Life of the Mind/phi208rank.com
Association rule mining
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Lecture 6 Structured Interviews and Instrument Design Part II:
Association Rule Mining
Gyozo Gidofalvi Uppsala Database Laboratory
Association Rule Mining
Chapter 14 Probability Rules!.
Core Methods in Educational Data Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
CSE4334/5334 Data Mining Lecture 15: Association Rule Mining (2)
Frequent patterns and Association Rules
Lecture 2: Probability.
Chapter 15 Probability Rules! Copyright © 2010 Pearson Education, Inc.
COMP5331 Advanced Topics Prepared by Raymond Wong
LESSON 5: PROBABILITY Outline Probability Events
Needs analysis (ESP) Communicative language needs for your job ?
Chapter 15 Probability Rules!.
Data Science in Industry
Department of Computer Science National Tsing Hua University
Validity and Reliability II: The Basics
QUANTITATIVE METHODS 1 SAMIR K. SRIVASTAVA.
Core Methods in Educational Data Mining
Before we begin MTT Case Study
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Interestingness.
Taylor and Maclaurin Series
Jones and Davis’s Correspondent Inference Theory
Overview Mini-problem Analyzing teacher questions
Presentation transcript:

Core Methods in Educational Data Mining EDUC 691 Spring 2019

Assignment BA4 Questions? Comments? Concerns?

Association Rule Mining

The Land of Inconsistent Terminology Today’s Class The Land of Inconsistent Terminology

Association Rule Mining Try to automatically find simple if-then rules within the data set Another method that can be applied when you don’t know what structure there is in your data Unlike clustering, association rules are often obviously actionable

Association Rule Metrics Support Confidence What do they mean? Why are they useful? Occurrence in the data Useful because they weed out 1) rules that do not apply to large amounts of data, and 2) data that does not occur enough to be analyzed

Exercise If a student took Advanced Data Mining, the student took Intro Statistics Support? Confidence? Took Advanced DM Took Intro Stat 1 Support = 2/11 = 0.182 Confidence = 2/6 = 0.333

Association Rule Metrics Interestingness What are some interestingness metrics? Why are they needed?

Why is interestingness needed? Possible to generate large numbers of trivial associations Students who took a course took its prerequisites (Vialardi et al., 2009) Students who do poorly on the exams fail the course (El-Halees, 2009)

Example: Cosine Measures co-occurrence P(A^B) sqrt(P(A)*P(B)) Easy to interpret (numbers closer to 1 than 0 are better; over 0.65 is desirable)

Exercise If a student took Advanced Data Mining, the student took Intro Statistics Cosine? Took Advanced DM Took Intro Stat 1 P(A) = 6/11 = 0.545 P(B) = 7/11 = 0.636 P(A^B) = 2/11 = 0.182 Cosine = 0.182/ SQRT(0.545*0.636) Cosine = 0.182/0.589 = 0.309 = B

Example: Lift Measures whether data points that have both A and B are more common than data points only containing B Confidence(A->B) P(B) Easy to interpret (lift over 1 indicates stronger association)

Exercise If a student took Advanced Data Mining, the student took Intro Statistics Lift? Took Advanced DM Took Intro Stat 1 Confidence (A->B) = 2/6 = 0.333 P(B) = 7/11 = 0.636 Lift = 0.333/0.636 = 0.524 = C

Example: Jaccard Measures whether data points that have both A and B are more common than data points only containing B P(A^B) P(A)+P(B)-P(A^B) Measures the relative degree to which having A and B together is more likely than having either A or B but not both

Exercise If a student took Advanced Data Mining, the student took Intro Statistics Jaccard? Took Advanced DM Took Intro Stat 1 P(A) = 6/11 = 0.545 P(B) = 7/11 = 0.636 P(A^B) = 2/11 = 0.182 Jaccard = 0.182/(0.545+0.636-0.182) Jaccard = 0.182/0.999 = 0.182

Association Rule Metrics What do Merceron & Yacef argue? We argue in this paper that cosine and added value (or equivalently lift) are well suited to educational data, and that teachers can interpret their results easily. We argue that interestingness should be checked with cosine first, and then with lift if cosine rates the rule as noninteresting. If both measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule.

Association Rule Metrics What do Merceron & Yacef argue? Cosine and lift are well suited to educational data, results can be easily interpreted Cosine first. If non-interesting, then lift. If measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule. We argue in this paper that cosine and added value (or equivalently lift) are well suited to educational data, and that teachers can interpret their results easily. We argue that interestingness should be checked with cosine first, and then with lift if cosine rates the rule as noninteresting. If both measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule.

Association Rule Metrics What do Luna-Bazaldua and colleagues argue? Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Convinction, and Jaccard also turn out to be good indicators of interestingness.

Association Rule Metrics What do Luna-Bazaldua and colleagues argue? Interestingness as evaluated by experts Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Conviction, and Jaccard also turn out to be good indicators of interestingness. Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Convinction, and Jaccard also turn out to be good indicators of interestingness.

Any questions on apriori algorithm?

Let’s do an example Volunteer please?

Someone pick Support

Generate Frequent Itemset ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Was the choice of support level appropriate? ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Re-try with lower support ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Generate Rules From Frequent Itemset ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Questions? Comments?

Differential Sequence Mining What is the difference between differential sequence mining and regular sequential pattern mining?

Rules in Education What might be some reasonable applications for Association Rule Mining, Sequential Pattern Mining, and Differential Sequence Mining in education?

If there’s time Get into groups of 3 and brainstorm on what ARM/SPM/DSM could be used for in education

The End