Download presentation
Presentation is loading. Please wait.
Published byJayden Score Modified over 9 years ago
1
Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers
2
The Case Study Educational Data Mining in a practical setting Directed to a student advice procedure Eindhoven University of Technology, Electrical Engineering department
3
The Case Study: advice procedure PAGE 3July 2009 Exam results Pre-university student information September October November December January EXAMS HOLIDAY EXAMS Exam results ADVICE STUDENTS 30% 70% DEADLINE Talks with students etc.
4
Outline CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment Conclusions and further work PAGE 4July 2009
5
CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment PAGE 5July 2009
6
Understanding of context Situation at Electrical Engineering, Eindhoven University of Technology 40% dropout rate, small inflow Decision to dropout preferably before end of January Study advice by student counselor Objective for the department: More robust and objective advices PAGE 6July 2009
7
Understanding of context In data mining terms: Build model for academic success of a student Based on the currently available information Only information until December of year of enrollment. Objective for research: Try out applicability EDM in this context: −Enough data (amount)? −Enough data (type)? PAGE 7July 2009
8
Data understanding Data source Institutions’ database −Pre-university data −University data Resulting data Data from 648 students, from 2001-2009 PAGE 8July 2009
9
Data preparation (pre-university data) Standard preparatory education: # courses Type of courses taken Average grades for total, science, and math Non-standard previous education: Type Grade PAGE 9July 2009
10
Data preparation (university data) Courses, grades, # attempts Many transformations needed: Reorganizations Partial exams Example: Calculus 2000-2001: 1 examination 2001-2006: 2 partial examinations 2007-2008: 5 partial examinations, or 1 examination. PAGE 10July 2009
11
Modeling (general) Classification task 2 class classification Criterion: finish all courses of first year in three years Several mining techniques applied Decision trees (+ensembles), bayesian classifiers, association rules Separate university/pre-university data first PAGE 11July 2009
12
Modeling (pre-university data) Base line model One rule classifier 68% accuracy using Science_mean No significant improvement using other classification techniques PAGE 12July 2009
13
Modeling (university data) Base line model One rule classifier 75% accuracy using Linear algebra AB Significant improvements using other models (80%) Decision trees slightly better than other models PAGE 13July 2009
14
Modeling (total set) Accuracies 80%, using attributes from both subsets Improvements using cost matrices Shape misclassification Small trade-offs accuracy and misclassification: Accuracy 79%, 52% of errors FP Accuracy 76%, 41% of errors FP Similarities between models Linear Algebra AB always root node Science Mean always high in tree PAGE 14July 2009
15
Modeling (decision tree) LinAlgAB < 5.5 1 > 5.5 CalcA < 5.15 1 > 5.15 VWO_Sc_mean 1 {good, excellent} {n/a, poor, avg, above avg} 0 79% Accuracy PAGE 15July 2009
16
Evaluation Detailed manual analysis by student counselor: Review the classification measure: −25% of False Negatives should be true negatives −How to classify skilled people who leave? Improve data transformations PAGE 16July 2009
17
Deployment Objectives More robust and objective advices: −80% accuracy is possible, clear directions for improvements. Try out applicability EDM in this context: −Enough data (amount)? −Yes, and more is not easily obtainable −Enough data (type)? −Would probably be very useful, but costly. Deployment possible after improvements PAGE 17July 2009
18
Conclusions and further work EDM can help in a study advice process: 80% accuracy is possible, clear directions for improvements. EDM can work using small datasets and a limited amount of data categories Further work: Improve data transformations Improve classification measure: better two- class, move to three-class Review use of additional data PAGE 18July 2009
19
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.