Core Methods in Educational Data Mining HUDK4050 Fall 2014.

Slides:



Advertisements
Similar presentations
Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other.
Advertisements

Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
R Squared. r = r = -.79 y = x y = x if x = 15, y = ? y = (15) y = if x = 6, y = ? y = (6)
STAT E-100 Section 2—Monday, September 23, :30-6:30PM, SC113.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Putting the Learning in Distance Learning Mary Beth Orrange AMATYC Math on the Web Themed Session Thursday, November 20, Washington D.C.
Feature Engineering Studio February 23, Let’s start by discussing the HW.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
COMP 111 Programming Languages 1 First Day. Course COMP111 Dr. Abdul-Hameed Assawadi Office: Room AS15 – No. 2 Tel: Ext. ??
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Feedback – Lab 2 9 Sept Your learning experience in this course.
Feature Engineering Studio September 23, Welcome to Mucking Around Day.
Lecture 6 Correlation and Regression STAT 3120 Statistical Methods I.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Feature Engineering Studio September 9, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
Lecture 5 Announcements. HW2 – Regrading Thinking like a programmer –Part b: (5 points) Test for input in the range it is specified (otherwise what is.
Feature Engineering Studio September 23, Let’s start by discussing the HW.
Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Regression Lines. Today’s Aim: To learn the method for calculating the most accurate Line of Best Fit for a set of data.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Learning Analytics: Process & Theory March 24, 2014.
Feature Engineering Studio March 1, Let’s start by discussing the HW.
Feature Engineering Studio September 30, Quick Note Please me for appointments rather than just showing up at my office – I’m always glad.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 22, 2012.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
Feature Engineering Studio October 7, Welcome to Bring Me a Rock Day 2.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Feature Engineering Studio Special Session September 25, 2013.
Feature Engineering Studio February 2, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 3, 2013.
Regression Chapter 5 January 24 – Part II.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
1 1.Log in to the computer in front of you –Temp account: 210class / 2.Update your in Cascadia's system –If I need to you I'll use.
Christa Marsh Southern Arkansas University Biology Professor.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
R Squared.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Presentation transcript:

Core Methods in Educational Data Mining HUDK4050 Fall 2014

Welcome Welcome back to the 2 nd class session

Administrative Stuff Is everyone signed up for class? If not, and you want to receive credit, please talk to me after class

Other administrative questions?

Today’s Readings First, a no-penalty-or-punishment survey question

Today’s Readings Who read the Witten & Frank? Who watched the BDE video?

Questions? Comments? Concerns?

What is a prediction model?

What is a regressor?

What are some things you might use a regressor for? Bonus points for examples other than those in the BDE video

Let’s do an example Numhints = 0.12*Pknow *Time – 0.11*Totalactions Skillpknowtimetotalactionsnumhints COMPUTESLOPE0.273?

Which of the variables has the largest impact on numhints? (Assume they are scaled the same)

However… These variables are unlikely to be scaled the same! If Pknow is a probability – From 0 to 1 And time is a number of seconds to respond – From 0 to infinity Then you can’t interpret the weights in a straightforward fashion What could you do?

Let’s do another example Numhints = 0.12*Pknow *Time – 0.11*Totalactions Skillpknowtimetotalactionsnumhints COMPUTESLOPE0.2235?

Is this plausible?

What might you want to do if you got this result in a real system?

Transforms In the video, we talked about variable transforms Who here has transformed a variable (for an actual analysis)? What did you transform and why did you do it?

Variable Transformation: EDM versus statistics Statistics: fit data better AND avoid violating assumptions EDM: fit data better

Why don’t violations of assumptions matter in EDM?

Interpreting Regression Models Example from the video

Example of Caveat Let’s graph the relationship between number of graduate students and number of papers per year

Data

Model Number of papers = * # of grad students * (# of grad students) 2 But does that actually mean that (# of grad students) 2 is associated with less publication? No!

Example of Caveat (# of grad students) 2 is actually positively correlated with publications! – r=0.46

Example of Caveat The relationship is only in the negative direction when the number of graduate students is already in the model…

How would you deal with this? How can we interpret individual features in a comprehensive model?

Other questions, comments, concerns about lecture?

RapidMiner 5.3 exercise Go to the course website and download Sep10dataset.csv Data on the probability that a student error is careless Calculated as in (Baker, Corbett, & Aleven, 2008) Try to predict from other variables

RapidMiner tasks Build regressor to predict P(SLIP|TRIO) Look at model goodness Look at model Look at actual data and refine model Look at model goodness Build flat cross-validation Look at model goodness Build student-level cross-validation Look at model goodness

Class Code will be posted later today

Questions? Comments? Concerns?

Questions about Basic HW 1?

Reminders You don’t have to do it perfectly, you just have to do it If you run into trouble, feel free to me or, better yet, use the moodle discussion forum

Questions? Concerns?

Other questions or comments?

Next Class Monday, September 15 Classification Algorithms Baker, R.S. (2014) Big Data and Education. Ch. 1, V3, V4, V5. Witten, I.H., Frank, E. (2011) Data Mining: Practical Machine Learning Tools and Techniques. Ch. 4.6, 6.1, 6.2, 6.4 Basic HW 1 due

The End