Analytics in Higher Education: Methods Overview

Slides:



Advertisements
Similar presentations
Stephen C. Court Presented at
Advertisements

CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
David Fairris Tarek Azzam
EBI Statistics 101.
FINAL REVIEW BIOST/EPI 536 December 14, Outline Before the midterm: Interpretation of model parameters (Cohort vs case-control studies) Hypothesis.
Examination of Holland’s Predictive Pattern Order Hypothesis for Academic Achievement William D. Beverly and Robert A. Horn Northern Arizona University,
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Categorical Data Prof. Andy Field.
Chapter 8 Introduction to Hypothesis Testing
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Understanding Statistics
Chapter 8 Introduction to Hypothesis Testing
Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Introduction to Regression Analysis. Dependent variable (response variable) Measures an outcome of a study  Income  GRE scores Dependent variable =
Identifying At-Risk Students Gary R. Pike Information Management & Institutional Research Indiana University Purdue University Indianapolis.
+ Chapter Scientific Method variable is the factor that changes in an experiment in order to test a hypothesis. To test for one variable, scientists.
Classification Ensemble Methods 1
Introduction to Validity True Experiment – searching for causality What effect does the I.V. have on the D.V. Correlation Design – searching for an association.
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
PART 2 SPSS (the Statistical Package for the Social Sciences)
· IUPUI · Conceptualizing and Understanding Studies of Student Persistence University Planning, Institutional Research, & Accountability April 19, 2007.
MULTIVARIATE ANALYSIS. Multivariate analysis  It refers to all statistical techniques that simultaneously analyze multiple measurements on objects under.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Looking for statistical twins
GS/PPAL Section N Research Methods and Information Systems
A Statistical Analysis Utilizing Detailed Institutional Data
Approaches to social research Lerum
BINARY LOGISTIC REGRESSION
Data Mining in Higher Education
Developing an early warning system combined with dynamic LMS data
Lurking inferential monsters
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
EXPERIMENTAL RESEARCH
Age at First Measles-Mumps-Rubella Vaccination in Children with Autism and School-Matched Control Subjects William W. Thompson, PhD Presented at the.
Logistic Regression APKC – STATS AFAC (2016).
How High Schools Explain Students’ Initial Colleges and Majors
Inference and Tests of Hypotheses
Who Needs Credit and Who Gets Credit?
Educational Research: Fundamentals for the Consumer
Eastern Michigan University
University of Michigan
Single-Variable, Independent-Groups Designs
Employee Turnover: Data Analysis and Exploration
Internal Validity – Control through
2 independent Groups Graziano & Raulin (1997).
Propensity Score Matching Makes Program Evaluation Easy
Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador.
Correlations: testing linear relationships between two metric variables Lecture 18:
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
ERRORS, CONFOUNDING, and INTERACTION
Introduction to Experimental Design
Quasi-Experimental Designs
An Introduction to Correlational Research
Ordinal Classification of Heart Disease Severity
Model generalization Brief summary of methods
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Introduction to Regression
What Really Works? An Impact Case Study on Learning and Acting At Scale Brandon Moore, Tina Donahoo.
MGS 3100 Business Analysis Regression Feb 18, 2016
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
The Research Process & Surveys, Samples, and Populations
Types of Statistical Studies and Producing Data
Presentation transcript:

Analytics in Higher Education: Methods Overview Scott James Data Scientist, Hobsons

Overview Introduction Predictive Modeling Explanatory Modeling/Causal Inference Example

Introduction

The First Task: Who is at Risk?

A More Difficult Task:

Analytics Drive Student Success Identify Understand Analytics Cycle Act Strategic Goals Analyze Measurable Strategic Outcomes Measure Drives Institutional Momentum

Predictive Modeling

Predictive Modeling Process for Student Retention Retention to the next year (12 to 18 months) Predictive Model: Random Forest Logistic Regression Training: Build the model Examine variable distributions Exploratory Analysis Define Problem/ Outcome Data Partition Data Modeling and Validation Predictions and Scoring Validation: Test and compare model performance Clustering, Factor Analysis, Feature Selection Brier Score, AUC, Model Calibration Score students for the next academic year

Common Predictive Models Regression Models Random Forests Neural Networks Ensemble

Classification vs Probabilities Strong models are probabilistic rather than simply deterministic Relying on classification rates (percent correctly predicted) can sometimes lead to overconfidence in the model In addition to providing less information than probabilities, using classification rates (or percent correctly predicted) to measure model performance can also be problematic For example, for an institution that has a 90% retention rate one can produce a ‘model’ that is 90% accurate by classifying every student as being retained. Cut-offs for classifications in a model can be changed in order to maximize overall accuracy, but this doesn’t necessarily indicate the model is a good one

The Prediction is In…. Kansas to the Final Four

Probability of Winning the Midwest Region Source: kenpom.com via Sports Illustrated

Explanatory Modeling and Causal Inference

Common Data Challenges Confounding Variables An extraneous variable that can mask the true relationship between a predictor variable and an outcome variable Students participating in an intervention have lower than average SAT scores Selection Bias Occurs when intervention participants self-select in to or are selected in such a way that that they are not representative of the population of interest Students participating in an intervention have lower than average SAT scores

Popular Causal Inference Methods Propensity Score Matching (PSM) Coarsened Exact Matching (CEM) Well specified regression models Multilevel Modeling Structural Equation Modeling (SEM)

Example

Example: Evaluating Pre-Midterm Advising Meetings Background: 1,234 first year, first time freshmen four year public university There are two possible outcomes: If the result confirms the hypothesis, then you've made a measurement. If the result is contrary to the hypothesis, then you've made a discovery.  Enrico Fermi Physicst Enrico Fermi (1901-1954) Various sources including: Assessing Toxic Risk: Teacher Edition (2001)

Evaluating Pre-Midterm Advising Meetings Hypothesis Pre-midterm advising improves the likelihood of students earning a first term GPA of 2.0 or greater Student Background Variables Age Gender High School GPA Race SAT Scores Institution wanted to check their ‘gut’ that this intervention was effective, and is currently reviewing interventions available for first year, first time freshmen

Propensity Score Analysis Predicting Intervention Participation Of the five variables analyzed, these were the significant predictors: Younger students more likely to meet with advisors before the midterm Students with lower high school GPAs more likely to meet Students with lower SAT scores more likely to meet

80% v 66% Results Success! Hypothesis confirmed! 538 students in the resulting matched group (239 who went to advising and 239 similar students who did not) 80% of the students who went to advising achieved a higher GPA compared to 66% who did not (statistically significant at alpha= .001) 80% v 66% Weak Average Strong Little to No Very Strong Evidence Meter

Summary Ask the Right Questions Identify -> Predictive Modeling Focus on Probabilities Measure -> Explanatory Modeling/Causal Inference Methods Ask Another Question!

Questions?