Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other.

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

LOGO Regression Analysis Lecturer: Dr. Bo Yuan
Bayesian Knowledge Tracing Prediction Models
UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Brief introduction on Logistic Regression
Feature Engineering Studio Special Session October 23, 2013.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Indian Statistical Institute Kolkata
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Week 2 Video 4 Metrics for Regressors.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Lecture 6: Multiple Regression
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 3 Correlation and Prediction.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Copyright © Cengage Learning. All rights reserved. 1 Functions and Their Graphs.
Correlations 11/7/2013. Readings Chapter 8 Correlation and Linear Regression (Pollock) (pp ) Chapter 8 Correlation and Regression (Pollock Workbook)
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
by B. Zadrozny and C. Elkan
 Create a scatter plot for these data and draw a line of best fit. xy
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
New Seats – Block 1. New Seats – Block 2 Warm-up with Scatterplot Notes 1) 2) 3) 4) 5)
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 30, 2009.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Objectives (IPS Chapter 2.1)
The Practice of Statistics
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
2-7 Curve Fitting with Linear Models Warm Up Lesson Presentation
2.5 Using Linear Models A scatter plot is a graph that relates two sets of data by plotting the data as ordered pairs. You can use a scatter plot to determine.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Feature Engineering Studio October 7, Welcome to Bring Me a Rock Day 2.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Core Methods in Educational Data Mining
Chapter 7. Classification and Prediction
The Normal Distribution
Erasmus University Rotterdam
Multiple Regression.
Prediction (Classification, Regression)
Core Methods in Educational Data Mining
Regression Analysis Week 4.
Introduction to Predictive Modeling
Functions and Their Graphs
Core Methods in Educational Data Mining
Multiple Regression – Split Sample Validation
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Chapter 3 Correlation and Prediction
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Correlations and practicals
Presentation transcript:

Week 1, video 2: Regressors

Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Sometimes used to predict the future Sometimes used to make inferences about the present

Prediction: Examples A student is watching a video in a MOOC right now. Is he bored or frustrated? A student has used educational software for the last half hour. How likely is it that she knows the skill in the next problem? A student has completed three years of high school. What will be her score on the college entrance exam?

What can we use this for? Improved educational design If we know when students get bored, we can improve that content Automated decisions by software If we know that a student is frustrated, lets offer the student some online help Informing teachers, instructors, and other stakeholders If we know that a student is frustrated, lets tell their teacher

Regression in Prediction There is something you want to predict (the label) The thing you want to predict is numerical Number of hints student requests How long student takes to answer How much of the video the student will watch What will the students test score be

Regression in Prediction A model that predicts a number is called a regressor in data mining The overall task is called regression

Regression To build a regression model, you obtain a data set where you already know the answer – called the training label For example, if you want to predict the number of hints the student requests, each value of numhints is a training label Skillpknowtimetotalactions numhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Regression Associated with each label are a set of features, other variables, which you will try to use to predict the label Skillpknowtimetotalactions numhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Regression The basic idea of regression is to determine which features, in which combination, can predict the labels value Skillpknowtimetotalactions numhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Linear Regression The most classic form of regression is linear regression Numhints = 0.12*Pknow *Time – 0.11*Totalactions Skillpknowtime totalactionsnumhints COMPUTESLOPE ?

Quiz Numhints = 0.12*Pknow *Time – 0.11*Totalactions What is the value of numhints? A) 8.34 B) C) 3.67 D) 9.21 E) FNORD Skillpknowtime totalactionsnumhints COMPUTESLOPE ?

Quiz Numhints = 0.12*Pknow *Time – 0.11*Totalactions Which of the variables has the largest impact on numhints? (Assume they are scaled the same) A) Pknow B) Time C) Totalactions D) Numhints E) They are equal

However… These variables are unlikely to be scaled the same! If Pknow is a probability From 0 to 1 Well discuss this variable later in the class And time is a number of seconds to respond From 0 to infinity Then you cant interpret the weights in a straightforward fashion You need to transform them first

Transform When you make a new variable by applying some mathematical function to the previous variable Xt = X 2

Transform: Unitization Increases interpretability of relative strength of features Reduces interpretability of individual features Xt = X – M(X) SD(X)

Linear Regression Linear regression only fits linear functions… Except when you apply transforms to the input variables Which most statistics and data mining packages can do for you

Ln(X)

Sqrt(X)

X2X2

X3X3

1/X

Sin(X)

Linear Regression Surprisingly flexible… But even without that It is blazing fast It is often more accurate than more complex models, particularly once you cross-validate Caruana & Niculescu-Mizil (2006) It is feasible to understand your model (with the caveat that the second feature in your model is in the context of the first feature, and so on)

Example of Caveat Lets graph the relationship between number of graduate students and number of papers per year

Data

Too much time spent filling out personnel action forms?

Model Number of papers = * # of grad students * (# of grad students) 2 But does that actually mean that (# of grad students) 2 is associated with less publication? No!

Example of Caveat (# of grad students) 2 is actually positively correlated with publications! r=0.46

Example of Caveat The relationship is only in the negative direction when the number of graduate students is already in the model…

Example of Caveat So be careful when interpreting linear regression models (or almost any other type of model)

Regression Trees

Regression Trees (non-linear; RepTree) If X>3 Y = 2 else If X<-7 Y = 4 Else Y = 3

Linear Regression Trees (linear; M5) If X>3 Y = 2A + 3B else If X< -7 Y = 2A – 3B Else Y = 2A + 0.5B + C

Linear Regression Tree

Later Lectures Other regressors Goodness metrics for comparing regressors Validating regressors

Next Lecture Classifiers – another type of prediction model