KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Detecting Faces in Images: A Survey
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
1. Abstract 2 Introduction Related Work Conclusion References.
Sparse vs. Ensemble Approaches to Supervised Learning
Learning From Data Chichang Jou Tamkang University.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
Biostatistics Case Studies 2009 Peter D. Christenson Biostatistician Session 1: Classification Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Chapter 9 – Classification and Regression Trees
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Benk Erika Kelemen Zsolt
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Classification Ensemble Methods 1
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 6: Classification Trees.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
By Matt Bogard, M.S. May 12,  Single Variable Regression  Multivariable Regression  Logistic Regression  Data Mining vs. Classical Statistics.
Career Related Applications of Economics In Enrollment Management Matt Bogard Coordinator, Market Research Western Kentucky University.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
CS 9633 Machine Learning Support Vector Machines
Introduction to Machine Learning
Chapter 7. Classification and Prediction
Linear Regression CSC 600: Data Mining Class 12.
Deep Feedforward Networks
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Boosting and Additive Trees
Analytics in Higher Education: Methods Overview
Introduction to Data Mining and Classification
Data Mining Lecture 11.
Advanced Analytics Using Enterprise Miner
Regression Models - Introduction
Introduction to Boosting
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Lecture 1: Introduction to Machine Learning Methods
Introduction to Predictive Modeling
Classification of class-imbalanced data
Decision Trees By Cole Daily CSCI 446.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
MIS2502: Data Analytics Classification Using Decision Trees
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
STT : Intro. to Statistical Learning
Presentation transcript:

KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional Research Western Kentucky University

Purpose Are there opportunities at the applicant stage to improve our yield, implement cost savings, and shape our freshmen class to maximize retention? Is there a way of knowing which applicants are most likely to enroll and retain?

Methodology Machine Learning vs. Statistical Inference Decision Trees Emphasis on accurate predictions vs. inferences about particular roles of specific variables Decision Trees Ensemble Methods Gradient Boosting Neural Networks

Decision Tree Basics- Algorithm Chooses variables and split values creating data partitions that differ based on the outcome of interest (retention) Finds all possible splits based on an adjusted χ2 p-value Prunes the tree to derive the most accurate predictions with fewest possible splits based on validation data The final model is characterized by the split values for each explanatory variable and creates a set of rules for classifying new cases.

Basic Decision Tree Visualization

Benefits of Decision Trees "Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems.“ – Leo Brieman, Statistical Modeling: The Two Cultures (Statistical Science,2001) Non-parametric and non-linear No distributional assumptions Treat the data generation process as unknown No required functional form for predictors Identify complex interactions

Ensemble Methods Generalization Error- how well does a model predict across training, validation, and test data sets Ensemble- combined predictions of several learners or models The generalization error of a weighted combination of predictors in an ensemble is equal to the average error of the individual predictors minus ‘disagreement’ among them’-Krogh (1997), Statistical Mechanics of Ensemble Learning. Physical Review. Ensemble Error is smaller than the weighted average of the error of a single optimized predictor

Gradient Boosting Boosting algorithms: ensemble of a series of weak learners. Fit a series of trees using resampled training data weighted by classification accuracy of previous tree Combined series of trees form a single model

Neural Networks A nonlinear model of complex relationships with 'hidden' layers Using logistic activation functions, NNETS can be visualized as an ensemble of logits Y= W0 + W1 H1 + W2 H2 + W3 H3 + W4 Logit H4   and H1= logit(w10 +w11 x1 + w12 x2 ) H2 = logit(w20 +w21 x1 + w22 x2 ) H3 = logit(w30 +w31 x1 + w32 x2 ) H4 = logit(w40 +w41 x1 + w42 x2 )

Gradient Boosting vs. Decision Trees vs.NNETs vs. Logistic Regression Decision Trees and Gradient Boosting are both robust to data generation process Decision Trees - more transparent model structure, which is lost in ensemble methods like gradient boosting and neural networks Neural Networks have issues with input selection and are more complex to train Decision tree posterior probability distribution may not be very smooth

Gradient Boosting vs. Decision Trees vs. Logistic Regression Logistic Regression provides Smooth posterior probability distribution Less transparent model structure than decision trees but more transparent than GB Could be used for inferences or agnostic learning algorithm based on a specified functional form *some may refuse to make this distinction and make inferences where inappropriate

Machine Learning vs. Inference Trees can guide and direct further inferential work, but can be misleading in terms of causal relationships if you are not careful

Fitting the Models

Results Focus: how well does the model predict behavior vs. inferences about the roles of specific variables Tradeoff between discrimination (measured by ROC ) & model calibration (Cook,2007) Gradient Boosting outperformed the other models based on calibration

Scorecard Using our models, we can sort applicants into 4 categories for enrollment propensity and predicted retention.

Implementation: Use advanced analytics to develop a strategic recruitment and retention strategy

Adhoc Reports Report by Counselor/Region/Territory Report by County/ School Report by Student demographics Report by Prospect Source …other??

IR-DSS

Detail Reporting

Additional Reading Bogard, M.T. (2013).A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Using SAS Enterprise BI and SAS Enterprise Miner. Paper 044-2013. SAS Institute Inc. 2013.Proceedings of the SAS® Global Forum 2013 Conference. Cary, NC. DeVille, Barry. (2006). Decision Trees for Business Intelligence and Data Mining Using SAS®  Enterprise Miner. SAS® Institute. SAS® Institute.. By Barry de Ville and Padraic Neville. SAS® Institute. 2013 Friedman, Jerome H. (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189-1232. Available at http://stat.stanford. Hasti, Tibshirani and Friedman. (2009)Elements of Statistical Learning: Data Mining,Inference, and Prediction. Second Edition. Springer-Verlag.  'Statistical Modeling: The Two Cultures' by L. Breiman (Statistical Science 2001, Vol. 16, No. 3, 199–231) Cook,Nancy R.,(2007). Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction. Circulation, 115 (7):928-35. Krogh, A. & Sollich, P. (1997, January). Statistical mechanics of ensemble learning. Physical Review E (Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics), 55 (1), 811-825.