Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time.

Slides:



Advertisements
Similar presentations
1 Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) Analyzing Observational Data: Focus on Propensity Scores.
Advertisements

Acknowledgment: This project was developed in Stat 511 course – Fall Thanks to the College of Science for providing the data.Objective: For students.
Welcome to EPS 525 Introduction to Statistics Dr. Robert Horn Summer 2008 Mondays – Thursdays 1:00 – 3:15 p.m.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Qualitative Variables and
Using Survival Analysis to Better Understand Factors that Determine Student Success R USSELL L ONG Purdue University Y OUNGKYOUNG M IN The Korea Foundation.
Data and the Nature of Measurement
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
1 Arlene Ash QMC - Third Tuesday September 21, 2010 Analyzing Observational Data: Focus on Propensity Scores.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Revisiting Retention: A Four Phase Retention Research Initiative 2012 SLOAN Conference October 10 th, 2012 Gary J. Burkholder, PhD Senior Research Scholar.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Lecture 6 Correlation and Regression STAT 3120 Statistical Methods I.
Board of Trustees Quarterly Data Report Volume 1, Number 2 Graduation and Retention Update January 7, 2014.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 27 Time Series.
Evan Picton, Research Analyst Wenatchee Valley College.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Multiple Linear Regression Partial Regression Coefficients.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Introduction to Regression Analysis. Dependent variable (response variable) Measures an outcome of a study  Income  GRE scores Dependent variable =
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Education 793 Class Notes Multiple Regression 19 November 2003.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Announcements First quiz next Monday (Week 3) at 6:15-6:45 Summary:  Recap first lecture: Descriptive statistics – Measures of center and spread  Normal.
Academic Excellence Indicator System Report For San Antonio ISD Public Meeting January 23, 2006 Board Report January 23, 2006 Department of Accountability,
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
A first order model with one binary and one quantitative predictor variable.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Examining the Enrollment and Persistence of Students with Discrepant High School Grades and Standardized Test Scores Anne Edmunds, Ed.D. Higher Education.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Modeling change Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
Undergraduate Student Persistence & Graduation advisor UI/WSU Advising Symposium September 9, 2011 Joel Michalski, Ph.D. Candidate & Karla Makus, Academic.
Template provided by: “posters4research.com” Academic Performance and Persistence of Undergraduate Students at a Land-Grant Institution: A Statistical.
Ing. Martina Majorová, FEM SUA Statistics Lecture 4 – Data sampling & Data sorting.
Examining Achievement Gaps
Department of Mathematics
A Statistical Analysis Utilizing Detailed Institutional Data
North Texas Regional P-16 Gap Analysis for the School Year of
Multiple Regression Analysis
Data Analysis Module: Correlation and Regression
CHAPTER 29: Multiple Regression*
Introduction to Regression
Multiple Regression Analysis
Developing Honors College Admissions Rubric to Ensure Student Success
Presentation transcript:

Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time Percent Enrollment Increase __________________

Texas Woman’s University Denton. Dallas. Houston Year 2005 Facts Total Enrollment – 11,344 Undergrad – 6,266 Graduate (Masters) – 4,369 Doctoral Campus Enrollment Denton –9,157 Dallas – 921 Houston – 1, academic programs (19 doctoral) Female – 10,368 Male – 976

Outline Problem Definition Predicting Student Enrollment at Time ‘t’ Using Historical Data 1.Enrollment Process - For Newly Enrolled 2.The predictive problem 3.Logistic Prediction Model a. Data Issues and programming Solutions 4.Quadratic Prediction Model a. Exploratory analysis to Identify Patterns 5.Combine for overall Prediction: Results

Enrollment Enrollment predictions can be broken into two fundamental pieces: The focus of this paper is the prediction of Newly Enrolled students. Newly Enrolled Students Re-Enrolling/ Continuing Students

All Prospective Students Applicants FTIC Transfer Graduate Others Admitted to TWU New12 th Day Enrolled New StudentsEnrollment Process New Students: Enrollment Process

Idea Behind Enrollment Prediction at Time = t

Enrollment Prediction at Time ‘t’  Let Time = t denote the prediction date  For Applicants Before t, we will have data  For Applicants after time t (denoted by t’), we will not have data Total Enrollment = Enroll_t + Enroll_t’ Predict Timet Begin Prediction Fall 12 th Day

Weekly Partition of Prediction Interval Predict Week0517  The prediction interval will be broken up into weekly Intervals  The diagram below illustrates prediction at Week = 5  At Week = 5 we have 35 more days of applicant data than at Week = 0 Total Enroll = Enroll_t + Enroll_t’

Enroll_t  P t = {1, 2, …, N t } -- Finite set of applicants at week = t  k  P t Enrollment is a dichotomous response variable – y k  y k = 1 (student enrolled), y k = 0 (student did not enroll)  Enrollment of all applicants at week = t,

Model Dichotomous Variable For each y k, k  P t  let θ k represent the probability that y k = 1  There exists applicant information for each individual:  x k = (x 1k, x 2k, …, x pk ) = (Distance k, SAT k,…, Major_Ratio k ) Use Logistic Regression to model θ k

Logistic Regression Model The probability of student k enrolling is L k = β 0 + β 1 Distance k + β 2 SAT k +…+ β p Major_Ratio k These are predictor variables

Predict Enroll_t Estimated Enroll_t is …  Let Y be the random vector of responses:  Thus, Note: 1 is a N t x 1 vector of ones

Logistic Model Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc. What variables will get picked for model building? Year Prior Applicant Data Current Year Prediction

 Use SAS to create possibly significant variables and dummy code categorical variables Example: Major_Ratio, Ethnic, etc.  Backward Selection  Slightly different variables are selected for: FTIC, Transfer, and Graduate. Programming and Variable Selection Start Saturated Model Drop Predictor Stop Fitted Model No Yes SAS Programming: Exploratory and Variable Creation

FTIC Variable Selection Variable NameVariable TypeVariable Description TwelveResponse1 if enrolled; 0 otherwise Distance♦ExplanatoryContinuous variable SAT_M, SAT_V, ACTExplanatoryContinuous Variable; SAT Math score, SAT Verbal score, Act Score Give ACT♦Explanatory1 if score provided; 0 otherwise Program Ratio♦ExplanatoryContinuous variable Major Ratio♦ExplanatoryContinuous variable Date of BirthExplanatoryContinuous variable Gender♦Explanatory1 if female; 0 for male Apply Early♦Explanatory1 if apply before January 1; 0 otherwise E1, E2, E3, E4, E5, E6, E7 ExplanatoryDummy variables for Ethnicity Personal♦ExplanatoryDiscrete Variable; Number of key information available for a student

Case Study-Logistic Model Prediction  Applicant data for 2003 to predict 2004 FTIC by weekly time intervals The Logistic Model does not predict after week = t

Enrollment after Week = t Total Enrollment = Enroll_t + Enroll_t’ At any week = t, we need to predict Enroll_t’ Identify historical relationships that may be helpful

Applicant Versus Enrolled by Year Both applications and enrollment have been increasing Notice enrollment yield is decreasing  Is the % increase in enrollment matching the % increase in apply?

Applicant Yield By Strata  Enrollment is yield from applicant data is decreasing for each strata  How does this affect yearly increase in enrollment?

Percent Increase Applicant Vs. Enrolled Applicant increase is not a viable indicator of enrollment increase What patterns are reliable to model?

Cumulative FTIC Enrollment by Week Notice the parallel lines, which implies equal slopes! Enroll_tTotal Enrollment At any week = t, we can relate Enroll_t to Total Enrollment (Week = 17) Thus, (Total Enroll – Enroll_t) should be very similar from year to year

Relationship Between Enrollment & Total Enrollment By definition, (Total Enroll – Enroll_t) = Enroll_t’ Model Enroll_t’ and smooth out the consistent patterns by week

Enroll_t’ Model Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004  Estimate of Enroll_t’: (R 2 = )

Predict 2004 Enroll_t’

Predict 2004 FTIC Total Enroll  Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687

Predict 2005 FTIC Total Enroll  Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765

- END - Thank you! Any Questions?