Predicting Students’ Course Success Using Machine Learning Approach

Slides:



Advertisements
Similar presentations
College Completion: Roadblocks & Strategies Appalachian Higher Education Network Conference Asheville, NC – June 10-12, 2014 Presented by: Zornitsa Georgieva.
Advertisements

College Board Study: Predictive Validity of the SAT & the Relationship with IUPUI Admission Standards Gary Pike, Ph.D., Executive Director, Information.
GPA and High School Ranking
Dual Credit and Advanced Placement: Do They Help Prepare Students for Success in College? Mardy Eimers, Director of Institutional Research & Planning Robert.
Collaboration with College Faculty to Develop and Implement an Enrollment Management Plan Presented to the Texas Association for Institutional Research,
The Redesigned Elements of Statistics Course University of West Florida March 2008.
The University of Hawai ʻ i at Mānoa ACCESS TO SUCCESS: LEADING INDICATORS WORKGROUP.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Office of Institutional Research and Planning
Identifying At-Risk Students With Two- Phased Regression Models Jing Wang-Dahlback, Director of Institutional Research Jonathan Shiveley, Research Analyst.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
RN to BSN Program Overview and Application Requirements Updated July 1, 2016.
Instruction, Assessment, and Student Outcomes in Online Learning Environments Eric Riedel Rebecca Jobe Jim Lenio Kimberlee Bethany Bonura 2016 AALHE Annual.
CSU Application Workshop Calstate.edu/apply
SB1440-Initial Outcomes Brian SterN Sunny Moon
RN to BSN Program Overview and Application Requirements
Helen Ward, M.P.A. Coastline Transfer Center Coordinator
Vikash Lakhani, MBA, Assistant Vice President for Student Success
UNDERGRADUATE STUDENT SUCCESS RECRUITMENT, RETENTION, AND GRADUATION ACADEMIC LEADERSHIP RETREAT AUGUST 2017.
MATH 201: STATISTICS Chapters 1 & 2 : Elements of Statistics
2012 High School Admissions Practices
Overview - College of Engineering, The University of Toledo
Predictive Analytics Proof of Concept (POC) September 2014
Machine Learning with Spark MLlib
Better Informed Academic Planning Using Student Flow Models
Outstanding Professor Award Committee Presents:
RN to BSN Program Overview and Application Requirements
Data Based Decision Making
Community for Excellence Assessment Results
Official Term Data (OTD):
Bridging the Gap for Students at Risk: A Data-Driven Case
Psychology Minor Tutorial
CSUMentor Tutorial 2017 – 2018 How To Complete The CSUMentor Undergraduate Online Application Copyright © 2014 XAP Corporation, All Rights Reserved.
ENROLLMENT AND RETENTION
Chapter 7 (b) – Point Estimation and Sampling Distributions
Machine Learning overview Chapter 18, 21
Probation Workshop Counseling Division
UMCP Student Loan Default Study & Financial Literacy Initiatives
CSUMentor Tutorial 2016 – 2017 How to complete the CSUMentor Undergraduate Online Application for Cal State Universities.
Ghulam Muhammad Ashrafi SHAFFAQ YOUSAF
A Predictive Model for Student Retention Using Logistic Regression
Effects of Targeted Troubleshooting Activities on
Taia L.C. Reid, Assistant Director of the Peer Educator Program
Hypothesis testing March 20, 2000.
Computer Science Department
Probation Workshop Counseling Division
California State University, Northridge
The Impact of a Special Advising Program on Students’ Progress
The Admissions Process at Public 4-year Institutions
FY17 Evaluation Overview: Student Performance Rating
The Impact, Costs, and Benefits of NC’s Early College Model
Sr. Vice President, Student Success
Dissertation RESULTS by Erin E. Cooper
KCTCS Strategic Plan Update: Retention rates
Probation Workshop Counseling Division
Understanding Your PSAT Scores
NCAA Student-Athlete Eligibility
Predicting Second to Third Year Retention
Do Now- Identify the sampling method (Cluster or stratified)
College Applications How to Make the Most of Your College Experience
WHERE ARE WE? THE STATE OF DUAL CREDIT/ENROLLMENT IN MISSOURI
Linda DeAngelo CIRP Assistant Director for Research
Lois Samer Anita Pattison
NCAA Eligibility Center Academic requirements
Academic Freedom & Standards Committee
Decision trees MARIO REGIN.
Teacher SLTs
Hard-blocked Placement Prerequisites
Developing Honors College Admissions Rubric to Ensure Student Success
USG Dual Enrollment Data and Trends
Presentation transcript:

Predicting Students’ Course Success Using Machine Learning Approach Akira Kanatsu Research Analyst Tanner Carollo Assistant Director Amanda Bain Graduate Student Research Assistant CAIR 2018 Annual Conference - November 14, 2018

Outline At a Glance Why Course Success Machine Learning Approach Data and Model Training Results Next Steps

At a Glance Undergraduate Student Population 17,854 (89%) Fall 2018 Lecture/Seminar Undergraduate Courses (< 500 level) 656 courses 1,380 sections 58,370 total enrollments 2,800 (5%) of enrollments are repeats 8,023 attempted enrollments (courses full)

Why Course Success

Impact on Retention, Graduation, and Y2D 73% of non-retained F’17 FTF had at least two DFWI’s (vs. 31% of retained) Only 3% of F’14 FTF with 2 or more 1st year DFWI’s graduated within 4 years

Impact at the Course Level 2017-18 Lower Division LEC/SEM Courses 55 (16%) had a DFWI rate of 20% or higher 10,418 of the course enrollments were repeats 8,012 attempted enrollments received a course is full message DFWI’s REPEATS COURSE IS FULL A lower division (major) course. DFWI rate of 21% 184 175 138

Machine Learning Approach

Machine Learning A branch of artificial intelligence Learn from data Identify patterns Make decisions Application Examples Online recommendations Self-driving cars Fraud detection ML uses statistical techniques to give computer the ability to "learn" from data, without being explicitly programmed. Supervised (known output/DV) and unsupervised. The purpose of supervised ML is to learn a function that, given a sample of data and outputs of interest, best approximates the relationship between input and output observable in the data Compared to traditional statistical modeling… Pros - No need to define specific models. Automatically improve/adjust models when new data is added. Cons - Identified patterns may not make sense/difficult to explain.

Application IBM SPSS Modeler (2018 CSU Innovation Minigrant) Powerful data mining software Accepts various data sources Visual interface without programing Language Automated modeling/model training Ability to apply trained models to a separate data Separate program from SPSS (annual license $3,000 per account)

Data

Courses Examined ADMN BIOL MATH PHIL SSCI Course Selection Lower Division High Enrollment High Failure Rate

Course Success Predictors Student Information General Term Specific Course Information Instructor Information Demographics (e.g., Sex & URM) Student Level Basis of Admission Degree Type (e.g., BA) College of Major Incoming GPA (i.e., HS & TR GPA) SAT & ACT Scores Developmental Math/Placement

Course Success Predictors Student Information General Term Specific Course Information Instructor Information Cumulative GPA Term Attempted Units Course Repeat/Highest Previous Grade # of High DFWI Courses (>25%) Average DFWI Rates # of Concurrent Courses with 10% Higher DFWI Rates When Taken Together

DFWI Rates by Math Course Sections

Course Success Predictors Student Information General Term Specific Course Information Instructor Information Term (e.g., Fall) Campus (SB & Palm Desert) Meeting Days (e.g., MWF) Class Begin Time Instruction Mode (e.g., Online) Class Size

Course Success Predictors Student Information General Term Specific Course Information Instructor Information Tenure Status Previous Course Teaching Experience Average GPA Average DFWI Rates

Model Training

Data under “Sources” Type under “Field Ops”

Auto Classifier Node train and test machine learning models using up to 16 statistical techniques and list most accurate models.

Random Tree and CHAID models

Output as Excel file, but can also create reports in Modeler, or using Pyson and R syntax.

Results

Model Output Prediction – whether students are likely to receive DFWI (1=Yes, 0=No) Confidence – how likely these predictions are correct (range: 0 to 1)   Prediction Confidence Student 155 1 0.777777778 Student 156 0.666666667 Student 157 Student 158 0.888888889 Student 159 Student 160 Student 161 Student 162 Student 163 Student 164 In Modeler, a machine learning model produces predictions at student level.

Model Prediction & Accuracy Course Enrollment Historical (Summer 2015 to Summer 2018) Fall 2018 Predicted DFWI (n) Predicted DFWI (%) Actual DFWI (n) Actual DFWI (%) Overall Model Accuracy on Historical Data ADMN 2709 598 22% 641 24% 80% 133 39 29% BIOL 1276 629 49% 645 51% 77% 408 175 43% MATH 8363 2287 27% 2136 26% 81% 681 102 15% PHIL 653 267 41% 241 37% 79% 103 34 33% SSCI 3557 1004 28% 1231 35% 74% 407 84 21%

Top 5 Predictors (CHAID Model) Rank ADMN BIOL MATH PHIL SSCI 1 Cumulative GPA Instructor DFWI Rate 2 Student Level Instructor Average GPA Tenure Status 3 Dev. Math Tenure Status Dev. Math Grade Days College of Major 4 Class Size SAT Score Incoming GPA Concurrent CRS Term 5 URM Status Sex Blue = Student Academic Characteristics Red = Instructor Characteristics Yellow/Orange = Student Demographic Characteristics Green= Course Characteristics Purple = Concurrent Courses

MATH Course Prediction by Section

Other Possible Report: MATH Student with Concurrent Course   Identified Concurrent Course Number of Concurrent Courses Student 234 Yes 1 Student 241 2 Student 267 Student 277 Student 278 Student 279 Student 280 Student 284 Student 287 Student 299

Next Steps

Targeted Supplemental Instruction Advising Process Improvement Exploration and Automation Targeted Supplemental Instruction Advising Notify students where/when appropriate (e.g., concurrent courses) Reporting outputs and structures