Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.

Slides:



Advertisements
Similar presentations
Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other.
Advertisements

Bayesian Knowledge Tracing Prediction Models
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Feature Engineering Studio Special Session October 23, 2013.
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
An Introduction of Support Vector Machine
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Week 2 Video 4 Metrics for Regressors.
Lecture 14 – Neural Networks
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Chapter 5 NEURAL NETWORKS
Three kinds of learning
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Classification and Prediction: Regression Analysis
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Introduction to Directed Data Mining: Decision Trees
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Today Evaluation Measures Accuracy Significance Testing
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
by B. Zadrozny and C. Elkan
Prediction (Classification, Regression) Ryan Shaun Joazeiro de Baker.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Feature Engineering Studio March 1, Let’s start by discussing the HW.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 22, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Machine Learning in Practice Lecture 19 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 April 15, 2013.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Nov 20th, 2001Copyright © 2001, Andrew W. Moore VC-dimension for characterizing classifiers Andrew W. Moore Associate Professor School of Computer Science.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Core Methods in Educational Data Mining
Advanced Methods and Analysis for the Learning and Social Sciences
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Prediction (Classification, Regression)
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Predictive Modeling
Core Methods in Educational Data Mining
Graph Review Skills Needed Identify the relationship in the graph
Regression and Correlation of Data
Support Vector Machines 2
Is Statistics=Data Science
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012

Today’s Class Regression and Regressors

Two Key Types of Prediction This slide adapted from slide by Andrew W. Moore, Google

Regression There is something you want to predict (“the label”) The thing you want to predict is numerical – Number of hints student requests – How long student takes to answer – What will the student’s test score be

Regression Associated with each label are a set of “features”, which maybe you can use to predict the label Skillpknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Regression The basic idea of regression is to determine which features, in which combination, can predict the label’s value Skillpknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Linear Regression The most classic form of regression is linear regression

Linear Regression The most classic form of regression is linear regression Numhints = 0.12*Pknow *Time – 0.11*Totalactions Skillpknowtimetotalactionsnumhints COMPUTESLOPE ?

Linear Regression Linear regression only fits linear functions (except when you apply transforms to the input variables, which most statistics and data mining packages can do for you…)

Non-linear inputs What kind of functions could you fit with Y = X 2 Y = X 3 Y = sqrt(X) Y = 1/x Y = sin X Y = ln X

Linear Regression However… It is blazing fast It is often more accurate than more complex models, particularly once you cross-validate – Data Mining’s “Dirty Little Secret” – Caruana & Niculescu-Mizil (2006) It is feasible to understand your model (with the caveat that the second feature in your model is in the context of the first feature, and so on)

Example of Caveat Let’s study a classic example

Example of Caveat Let’s study a classic example Drinking too much prune nog at a party, and having to make an emergency trip to the Little Researcher’s Room

Data

Some people are resistent to the deletrious effects of prunes and can safely enjoy high quantities of prune nog!

Learned Function Probability of “emergency”= 0.25 * # Drinks of nog last 3 hours * (Drinks of nog last 3 hours) 2 But does that actually mean that (Drinks of nog last 3 hours) 2 is associated with less “emergencies”?

Learned Function Probability of “emergency”= 0.25 * # Drinks of nog last 3 hours * (Drinks of nog last 3 hours) 2 But does that actually mean that (Drinks of nog last 3 hours) 2 is associated with less “emergencies”? No!

Example of Caveat (Drinks of nog last 3 hours) 2 is actually positively correlated with emergencies! – r=0.59

Example of Caveat The relationship is only in the negative direction when (Drinks of nog last 3 hours) is already in the model…

Example of Caveat So be careful when interpreting linear regression models (or almost any other type of model)

Comments? Questions?

Neural Networks Another popular form of regression is neural networks (called Multilayer Perceptron in Weka) This image courtesy of Andrew W. Moore, Google

Neural Networks Neural networks can fit more complex functions than linear regression It is usually near-to-impossible to understand what the heck is going on inside one

Soller & Stevens (2007)

In fact The difficulty of interpreting non-linear models is so well known, that New York City put up a road sign about it

Regression Trees

Regression Trees (non-Linear) If X>3 – Y = 2 – else If X<-7 Y = 4 Else Y = 3

Linear Regression Trees (Model Trees, RepTree) If X>3 – Y = 2A + 3B – else If X< -7 Y = 2A – 3B Else Y = 2A + 0.5B + C

Create a Linear Regression Tree to Predict Emergencies

And of course… There are lots of fancy regressors in any Data Mining package SMOReg (support vector machine) Poisson Regression LOESS Regression For more, see

Assignment 6 Let’s discuss your solutions to assignment 6

How can you tell if a regression model is any good?

Correlation is a classic method (Or its cousin r 2 )

What data set should you generally test on? The data set you trained your classifier on A data set from a different tutor Split your data set in half, train on one half, test on the other half Split your data set in ten. Train on each set of 9 sets, test on the tenth. Do this ten times. Any differences from classifiers?

What are some stat tests you could use?

What about? Take the correlation between your prediction and your label Run an F test So F(1,9998)=50.00, p<

What about? Take the correlation between your prediction and your label Run an F test So F(1,9998)=50.00, p< All cool, right?

As before… You want to make sure to account for the non- independence between students when you test significance An F test is fine, just include a student term

As before… You want to make sure to account for the non- independence between students when you test significance An F test is fine, just include a student term (but note, your regressor itself should not predict using student as a variable… unless you want it to only work in your original population)

Alternatives Bayesian Information Criterion (Raftery, 1995) Makes trade-off between goodness of fit and flexibility of fit (number of parameters) i.e. Can control for the number of parameters you used and thus adjust for overfitting Said to be statistically equivalent to k-fold cross- validation

Asgn. 7

Next Class Wednesday, February 29 3pm-5pm AK232 Learnograms Readings None Assignments Due: None

The End

Bonus Slides If there’s time

BKT with Multiple Skills

Conjunctive Model (Pardos et al., 2008) The probability a student can answer an item with skills A and B is P(CORR|A^B) = P(CORR|A) * P(CORR|B) But how should credit or blame be assigned to the various skills?

Koedinger et al.’s (2011) Conjunctive Model Equations for 2 skills

Koedinger et al.’s (2011) Conjunctive Model Generalized equations

Koedinger et al.’s (2011) Conjunctive Model Handles case where multiple skills apply to an item better than classical BKT

Other BKT Extensions? Additional parameters? Additional states?

Many others Compensatory Multiple Skills (Pardos et al., 2008) Clustered Skills (Ritter et al., 2009)