Core Methods in Educational Data Mining

Slides:



Advertisements
Similar presentations
Pre and Post Assessments A quick and easy way to assess your Student Learning Outcomes.
Advertisements

Name: Tatiana “Tania” Harrison
Relationship Mining Association Rule Mining Week 5 Video 3.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
Feature Engineering Studio February 23, Let’s start by discussing the HW.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
1 COMPUTER SKILLS 1 2 Objectives: 1. To learn how to use a tool that God has put into our world and daily lives. Learn how to both use it and appreciate.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Probability Rules!! Chapter 15.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory WEBKDD 2002 Edmonton,
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Fractional Factorial Designs Andy Wang CIS 5930 Computer Systems Performance Analysis.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 27, 2013.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 17, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Microsoft produces a New operating system on a disk. There is 0
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
PeerWise Student Instructions
Advanced Methods and Analysis for the Learning and Social Sciences
Chapter 3 Probability.
Data Mining Association Analysis: Basic Concepts and Algorithms
Testing Hypotheses about Proportions
Applied Biostatistics: Lecture 2
PHI 208 RANK Life of the Mind/phi208rank.com
Association Rules Repoussis Panagiotis.
Tutorial 8: Probability Distribution
The Need for Programming Languages
Frequent Pattern Mining
Adding Assignments and Learning Units to Your TSS Course
Fast Action Links extension A love letter to CiviCRM
Exam #3 Review Zuyin (Alvin) Zheng.
Association Rule Mining
Chapter 14 Probability Rules!.
Core Methods in Educational Data Mining
Section 10.2: Tests of Significance
Regression.
Big Data, Education, and Society
INFORMATION TECHNOLOGY PRESENTATION
CS240: Advanced Programming Concepts
Frequent patterns and Association Rules
Lecture 2: Probability.
e-PLUS Lab5 Language Lab System
Chapter 15 Probability Rules! Copyright © 2010 Pearson Education, Inc.
LESSON 5: PROBABILITY Outline Probability Events
Chapter 15 Probability Rules!.
Spreadsheets, Modelling & Databases
Nines and threes I am learning to decide if a number is divisible by nine or three. Can the number be shared evenly between 9 or 3.
Data Science in Industry
Welcome to the Validation Wizard Tutorial - Part 1 -
Flowcharts and Pseudo Code
Educational Data Mining Success Stories
Validity and Reliability II: The Basics
Chapter 15 Probability Rules! Copyright © 2010 Pearson Education, Inc.
How To Outline And Why It’s Awesome.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Evaluation of Clustering Techniques on DMOZ Data
Jones and Davis’s Correspondent Inference Theory
Presentation transcript:

Core Methods in Educational Data Mining EDUC 545 Spring 2017

Assignment C2

What tools did you use? Packages (e.g., Excel) Features of Packages (e.g., Pivot Tables)

Let’s… Form groups of 4 What features did you generate? How did you generate them? Did it end up in your final model? In what direction? Choose 2 features per group that are the coolest/most interesting/most novel Be ready to share with rest of class Does this match the class’s overall intuition?

Let’s… Go through how you created some features Actually do it… Re-create it in real-time, or show us your code… We’ll have multiple volunteers One feature per customer, please… Does this match the class’s overall intuition?

Was feature engineering beneficial?

Association Rule Mining

The Land of Inconsistent Terminology Today’s Class The Land of Inconsistent Terminology

Association Rule Mining Try to automatically find simple if-then rules within the data set Another method that can be applied when you don’t know what structure there is in your data Unlike clustering, association rules are often obviously actionable

Association Rule Metrics Support Confidence What do they mean? Why are they useful? Occurrence in the data Useful because they weed out 1) rules that do not apply to large amounts of data, and 2) data that does not occur enough to be analyzed

Quiz If a student took Advanced Data Mining, the student took Intro Statistics Support? Confidence? Took Advanced DM Took Intro Stat 1 Support = 2/11 = 0.182 Confidence = 2/6 = 0.333

Association Rule Metrics Interestingness What are some interestingness metrics? Why are they needed?

Why is interestingness needed? Possible to generate large numbers of trivial associations Students who took a course took its prerequisites (Vialardi et al., 2009) Students who do poorly on the exams fail the course (El-Halees, 2009)

Example: Cosine Measures co-occurrence P(A^B) sqrt(P(A)*P(B)) Easy to interpret (numbers closer to 1 than 0 are better; over 0.65 is desirable)

Quiz If a student took Advanced Data Mining, the student took Intro Statistics Cosine? 0.160 0.309 0.519 0.720 Took Advanced DM Took Intro Stat 1 P(A) = 6/11 = 0.545 P(B) = 7/11 = 0.636 P(A^B) = 2/11 = 0.182 Cosine = 0.182/ SQRT(0.545*0.636) Cosine = 0.182/0.589 = 0.309 = B

Example: Lift Measures whether data points that have both A and B are more common than data points only containing B Confidence(A->B) P(B) Easy to interpret (lift over 1 indicates stronger association)

Quiz If a student took Advanced Data Mining, the student took Intro Statistics Lift? 0.333 0.429 0.524 0.643 Took Advanced DM Took Intro Stat 1 Confidence (A->B) = 2/6 = 0.333 P(B) = 7/11 = 0.636 Lift = 0.333/0.636 = 0.524 = C

Example: Jaccard Measures whether data points that have both A and B are more common than data points only containing B P(A^B) P(A)+P(B)-P(A^B) Measures the relative degree to which having A and B together is more likely than having either A or B but not both

Quiz If a student took Advanced Data Mining, the student took Intro Statistics Jaccard? 0.182 0.429 0.333 0.643 Took Advanced DM Took Intro Stat 1 P(A) = 6/11 = 0.545 P(B) = 7/11 = 0.636 P(A^B) = 2/11 = 0.182 Jaccard = 0.182/(0.545+0.636-0.182) Jaccard = 0.182/0.999 = 0.182

Association Rule Metrics What do Merceron & Yacef argue? We argue in this paper that cosine and added value (or equivalently lift) are well suited to educational data, and that teachers can interpret their results easily. We argue that interestingness should be checked with cosine first, and then with lift if cosine rates the rule as noninteresting. If both measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule.

Association Rule Metrics What do Merceron & Yacef argue? Cosine and lift are well suited to educational data, results can be easily interpreted Cosine first. If non-interesting, then lift. If measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule. We argue in this paper that cosine and added value (or equivalently lift) are well suited to educational data, and that teachers can interpret their results easily. We argue that interestingness should be checked with cosine first, and then with lift if cosine rates the rule as noninteresting. If both measures disagree, teachers should use the intuition behind the measures to decide whether or not to dismiss the association rule.

Association Rule Metrics What do Luna-Bazaldua and colleagues argue? Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Convinction, and Jaccard also turn out to be good indicators of interestingness.

Association Rule Metrics What do Luna-Bazaldua and colleagues argue? Interestingness as evaluated by experts Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Conviction, and Jaccard also turn out to be good indicators of interestingness. Lift and cosine are good indicators of interestingness. In addition, the Phi Coefficient, Convinction, and Jaccard also turn out to be good indicators of interestingness.

Any questions on apriori algorithm?

Let’s do an example Volunteer please?

Someone pick Support

Generate Frequent Itemset ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Was the choice of support level appropriate? ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Re-try with lower support ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Generate Rules From Frequent Itemset ABCF ABDG ABEF BEGH BDIJ BCDJ DEFJ ABCD DEGJ DEGJ ABCE ABCF BCDJ BCDE DEFK DEGH

Questions? Comments?

Rules in Education What might be some reasonable applications for Association Rule Mining in education?

The End