Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time Percent Enrollment Increase __________________
Texas Woman’s University Denton. Dallas. Houston Year 2005 Facts Total Enrollment – 11,344 Undergrad – 6,266 Graduate (Masters) – 4,369 Doctoral Campus Enrollment Denton –9,157 Dallas – 921 Houston – 1, academic programs (19 doctoral) Female – 10,368 Male – 976
Outline Problem Definition Predicting Student Enrollment at Time ‘t’ Using Historical Data 1.Enrollment Process - For Newly Enrolled 2.The predictive problem 3.Logistic Prediction Model a. Data Issues and programming Solutions 4.Quadratic Prediction Model a. Exploratory analysis to Identify Patterns 5.Combine for overall Prediction: Results
Enrollment Enrollment predictions can be broken into two fundamental pieces: The focus of this paper is the prediction of Newly Enrolled students. Newly Enrolled Students Re-Enrolling/ Continuing Students
All Prospective Students Applicants FTIC Transfer Graduate Others Admitted to TWU New12 th Day Enrolled New StudentsEnrollment Process New Students: Enrollment Process
Idea Behind Enrollment Prediction at Time = t
Enrollment Prediction at Time ‘t’ Let Time = t denote the prediction date For Applicants Before t, we will have data For Applicants after time t (denoted by t’), we will not have data Total Enrollment = Enroll_t + Enroll_t’ Predict Timet Begin Prediction Fall 12 th Day
Weekly Partition of Prediction Interval Predict Week0517 The prediction interval will be broken up into weekly Intervals The diagram below illustrates prediction at Week = 5 At Week = 5 we have 35 more days of applicant data than at Week = 0 Total Enroll = Enroll_t + Enroll_t’
Enroll_t P t = {1, 2, …, N t } -- Finite set of applicants at week = t k P t Enrollment is a dichotomous response variable – y k y k = 1 (student enrolled), y k = 0 (student did not enroll) Enrollment of all applicants at week = t,
Model Dichotomous Variable For each y k, k P t let θ k represent the probability that y k = 1 There exists applicant information for each individual: x k = (x 1k, x 2k, …, x pk ) = (Distance k, SAT k,…, Major_Ratio k ) Use Logistic Regression to model θ k
Logistic Regression Model The probability of student k enrolling is L k = β 0 + β 1 Distance k + β 2 SAT k +…+ β p Major_Ratio k These are predictor variables
Predict Enroll_t Estimated Enroll_t is … Let Y be the random vector of responses: Thus, Note: 1 is a N t x 1 vector of ones
Logistic Model Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc. What variables will get picked for model building? Year Prior Applicant Data Current Year Prediction
Use SAS to create possibly significant variables and dummy code categorical variables Example: Major_Ratio, Ethnic, etc. Backward Selection Slightly different variables are selected for: FTIC, Transfer, and Graduate. Programming and Variable Selection Start Saturated Model Drop Predictor Stop Fitted Model No Yes SAS Programming: Exploratory and Variable Creation
FTIC Variable Selection Variable NameVariable TypeVariable Description TwelveResponse1 if enrolled; 0 otherwise Distance♦ExplanatoryContinuous variable SAT_M, SAT_V, ACTExplanatoryContinuous Variable; SAT Math score, SAT Verbal score, Act Score Give ACT♦Explanatory1 if score provided; 0 otherwise Program Ratio♦ExplanatoryContinuous variable Major Ratio♦ExplanatoryContinuous variable Date of BirthExplanatoryContinuous variable Gender♦Explanatory1 if female; 0 for male Apply Early♦Explanatory1 if apply before January 1; 0 otherwise E1, E2, E3, E4, E5, E6, E7 ExplanatoryDummy variables for Ethnicity Personal♦ExplanatoryDiscrete Variable; Number of key information available for a student
Case Study-Logistic Model Prediction Applicant data for 2003 to predict 2004 FTIC by weekly time intervals The Logistic Model does not predict after week = t
Enrollment after Week = t Total Enrollment = Enroll_t + Enroll_t’ At any week = t, we need to predict Enroll_t’ Identify historical relationships that may be helpful
Applicant Versus Enrolled by Year Both applications and enrollment have been increasing Notice enrollment yield is decreasing Is the % increase in enrollment matching the % increase in apply?
Applicant Yield By Strata Enrollment is yield from applicant data is decreasing for each strata How does this affect yearly increase in enrollment?
Percent Increase Applicant Vs. Enrolled Applicant increase is not a viable indicator of enrollment increase What patterns are reliable to model?
Cumulative FTIC Enrollment by Week Notice the parallel lines, which implies equal slopes! Enroll_tTotal Enrollment At any week = t, we can relate Enroll_t to Total Enrollment (Week = 17) Thus, (Total Enroll – Enroll_t) should be very similar from year to year
Relationship Between Enrollment & Total Enrollment By definition, (Total Enroll – Enroll_t) = Enroll_t’ Model Enroll_t’ and smooth out the consistent patterns by week
Enroll_t’ Model Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004 Estimate of Enroll_t’: (R 2 = )
Predict 2004 Enroll_t’
Predict 2004 FTIC Total Enroll Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687
Predict 2005 FTIC Total Enroll Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765
- END - Thank you! Any Questions?