EDUCAUSE Annual Conference Who are your at-risk Students? Using Data Mining to Target Intervention Efforts Lalitha Agnihotri , Ph.D., Senior Systems Analyst, DWH Alex Ott , Ed.D., Associate Dean, Academic & Enrollment Services Niyazi Bodur, Ph.D., VP, Information Technology & Infrastructure New York Institute of Technology EDUCAUSE Annual Conference October 16th, 2013
Presentation Description and Goals Learn how to improve targeted intervention by building a model to identify and classify at-risk students using data at your institution. Gain an understanding of the complete life cycle of the At-Risk Student Identification Model.
Targeted Intervention for At Risk Students The Goal: Early targeted intervention based on risk factors for each at-risk student to improve retention Rationale for Key Elements: Early Targeted intervention Risk factors for each student
Before the Model, All We Had Was…
Students At Risk (STAR) Model Version 1.0 Data sources: Admissions data Registration/Placement test data Survey data Method: Combine all risk variables into an aggregated measure. Alex to insert an excel
Version 1.0 Report Output:
Major Challenges with STAR 1.0 Limited attributes. Attributes of unknown strength, relevance, or even direction. Attributes equally weighted. Static Excel document: Big effort in getting all the attributes in one place. Major limitations ( limited factors, equally treated, applied simple logic calculation with manual process, no data mining applied).
Data Mining
Data Mining Classification Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Goal: previously unseen records should be assigned a class as accurately as possible. Find a model for class attribute as a function of the values of other attributes. Select the model that performs the best. Student ID Attributes Class
STAR Model: Version 2.0 with Data Mining and Automated Tools Built and automated the full dataset in our Data Warehouse Used Data Mining tools (SQL Server Analysis Services) to train multiple dynamic statistical models Enterprise solution SSAS Modeling DMX Prediction Query SSRS Report SQL Build Data
Models Trained Logistic Regression Logistic Regression Naïve Bayes Ensemble Logistic Regression Naïve Bayes Neural Network Decision Trees Logistic Regression Naïve Bayes Neural Network Decision Trees
Data Mining Knowledge Discovery: BIG Picture Lalitha to update the graphs with new labels
Data Mining Knowledge Discovery: Detailed Picture
Model Significance And Results You can re-org
So How Did the Model Actually Perform? Change this slide
Key Takeaways Success depends on productive partnership between IT and business. Data is the KEY. Data mining is a process. Select attributes based on (retention) research and particulars of your school. a. Data mining fundamentally is process (setup a high level goal –blue print, have departmental goal – detail drawing, physical, and data mining tools implementation, make a reality) b. Successful implementation needs a good partnership business department and IT c. Data is key. No data no talking.
Questions? Lalitha Agnihotri, lagnihot@nyit.edu Alexander Ott, aott@nyit.edu Niyazi Bodur, nbodur@nyit.edu