Predicting Student Risks Through Longitudinal Analysis Date : 2015/04/23 Resource : KDD’14 Author: A.Tamhan,S.Ikbal,B.Sengupta,M.Duggirala…. Advisor : Dr. Jia-Ling Koh Speaker : Sheng-Chih Chu 1
Outline Introduction Data Description & Defining Risk Data Processing Experiments Conclusion 2
Introduction Motivation: K-12 reflects the most critical phase of an personal lifelong learning, during which the opportunities for a successful future need to be created and nurtured. Poor academic in K-12 is often precursor to unsatisfactory eduational outcomes,which are associated with social costs and significant personal. 3
Introduction Motivation: 4
Introduction Goal: Building predictive module to predict students at risk of poor performance is first goal. In addition, early prediction can allow teachers take remedial actions in a students’s learning path. 5
Outline Introduction Data Description & Defining Risk Data Processing Experiments Conclusion 6
Data Description GCPS is one of the largest school systems in the US,consisting of 132 schools and serving more than students at present. 7
Defining Risk CRCTs : (Score rang from 650~900) 850↑(excedding standards) 800 (standards) 800↓(at risk) (mathematics,science) ITBS : (provse PR) 25% as a thresholds on grade 8 (at risk (reading,written expression,mathematics,science,…) CogAt (reasonable ability) 8
Outline Introduction Data Description & Defining Risk Data Processing Experiments Conclusion 9
Data Processing and Feature 10 Data warehouse 19 million SPSS Modeler Consider CRCT,ITBS,CogAt
CRCTs for grade7 Mike750 Jasmine Thomas Alice821 Peter Jenny812 Longitudinal Feature Data GradeCRCTs for grade8 CRCTs for grade7 CRCTs for grade6 CRCTs for grade5 ITBS for grade8 ITBS for grade5 ITBS for grade3 Mike Jasmine Thomas Alice Peter4 64 Jenny Grade Mike7 Jasmine6 Thomas5 Alice8 Peter4 Jenny7 CRCTs for grade8 Mike Jasmine Thomas Alice832 Peter Jenny
Student Profile 12 genderethnici ty Free meal GiftedSpecial education Absent day Sus- pensions Discipline MikeMBYNY0X85 JasmineFWNYN10X87 ThomasMWN NN5X85 AliceFWYYN0X92 PeterMWNNN20O65 JennyFBNYN0X90 gender MikeM JasmineF ThomasM AliceF PeterM JennyF ethnici ty MikeB JasmineW ThomasW AliceW PeterW JennyB Discipline Mike85 Jasmine87 Thomas85 Alice92 Peter65 Jenny90
Merged Data Set 13 genderethnici ty Free meal GiftedSpecial education Absent day Sus- pensions Discipline MikeMBYNY0X85 JasmineFWNYN10X87 ThomasMWN NN5X85 AliceFWYYN0X92 BillMWNNN20O65 JennyFBNYN0X90 GradeCRCTs for grade8 CRCTs for grade7 CRCTs for grade6 CRCTs for grade5 ITBS for grade8 ITBS for grade5 ITBS for grade3 Mike Jasmine Thomas Alice Peter4 64 Jenny
Target variable: CRCT grade 8 Creation of Target Variable Dependent Data 14 GradeCRCTs for grade8 CRCTs for grade7 CRCTs for grade6 CRCTs for grade5 ITBS for grade8 ITBS for grade5 ITBS for grade3 Mike Jasmine Thomas Alice Peter4 64 Jenny
Imputation of Missing Features GradeCRCTs for grade8 CRCTs for grade7 CRCTs for grade6 CRCTs for grade5 ITBS for grade8 ITBS for grade5 ITBS for grade3 Mike Jasmine Thomas Alice Peter4 64 Jenny Jason Mao Marry Cube Bill Gary Han Mean: ( )/4 = 769 Mean: ( … )/8 = 788
Experiments Introduction Data Description & Defining Risk Data Processing Experiments Conclusion 16
Risk Prediction 17
Dataset ITBS data set contains samples containing 15.3% positive(at-risk) and 84.7% negative(non-risk) CRCT data set contains students containing 10.7% and 89.3% samples. Used 5-fold cross validation Used SPSS or Weka 18
Peformance 19
Performance 20
Early Prediction of the Risk 21
Early Prediction of the Risk 22
Outline Introduction Data Description & Defining Risk Data Processing Experiments Conclusion 23
Conclusion The result showed that a student’s risk of poor performance can be predicted with reasonable accuracy. 24