Presented by Khawar Shakeel Educational Data Mining to inspect low performance academic areas of the students using ensemble classification Presented by Khawar Shakeel Khawar Shakeel Naveed Anwer Butt Department of Computer Science Department of Computer Science University of Gujrat, Pakistan University of Gujrat, Pakistan Email: khawarshakeel@gmail.com Email: naveed@uog.edu.pk
Outline Introduction Design Goal Related Work Design Approach What is Educational Data Mining (EDM)? What problems can we solve using EDM? Stakeholder Design Goal Related Work Design Approach Experimental Results Suggestions & Future Work
Educational Data Mining (EDM) - Introduction Data Mining (DM) ? Data mining is a method to identify the hidden details from the huge volume raw data; such methods are applied when data is outsized and less knowledge about data available. The Educational Data Mining is currently a growing research area of Data Mining (DM) based on statistical methods for educationally linked data in order to improve the system and quality of higher education institutions.
Possible Questions to be solved by EDM How to predict students learning behavior? How to group up the students according to their interests? What are the strong and weak areas of studies of students? How to identify the students needing more help? Which group(s) of students likely to be dropped or promoted? What kind of educational resources need to be allocated? and why?
EDM- Stakeholder Administration Administrators use EDM to make sure the allocation of the useful resources for the betterment in institutional education, Faculty and advisors are becoming more proactive in identifying and addressing at risk students. Educators Educators attempt to understand the learning process and the methods they can use to improve their teaching methods. Researchers Researchers focus on the development and the evaluation of data mining techniques for effectiveness.
Our study- Design Goal Design a predictive model capable of To explore the reason(s) of poor performance of majority of the students in some specific course(s) or domain in order to intimate the administration for necessary actions need to be taken accordingly. Main tasks are Extraction of predictable attributes from the data source. Identification of different attributes that may determine learning behavior of the student. Construction of prediction model based on selected predictable variables using different existing ensemble classification algorithms. Report to administration about the findings.
Previous Work Although, data mining in education is not a mature field but there are a lot of work has done in this area. That is because of its prospective to educational establishment.
Previous Work
Previous Work
Design Approach - Data Collection Student information System of university - Data Pre-processing Selection Cleaning Transformation - Development of model based on Ensemble classification algorithms Bagging Boosting (J48 Decision Tree algorithm as base classifier) - Useful patterns leading better decision making
Proposed Design Overview
Design Approach –Data Collection Secondary data is collected initially through the Semester system using University Information System. Targeted Students are from Master’s and BS (HONS) degree programs registered in different departments of all faculties of a Public Sector University.
Design Approach –Data Introduction The attributes from the data need to be examined are students marks of each category like assignment, quiz, presentation, midterm, final subjective final objective for courses. The final data for model included 3130 instances and 7 variables.
Design Approach – Dataset
Design Approach–Data Preprocessing Data Selection Two departments from each faculty are selected. The extracted data is from batches of years 2008 & 2009 of BS (HONS) and 2010 & 2011 of master degree programs. Only academic activity values are being recorded as variable, ignoring student’s other information like demographical and finance etc. Data Cleaning The record of student(s) having missing marks in any exam category of any course is being cleaned because it can leads to bias decision sometime.
Design Approach– Data Preprocessing In university, there are many courses are being taught i.e. general courses, elective courses, compulsory courses and core courses. When we talk about grading of a courses means where the obtained marks falls in grading ranges. Here we are considering C, D, and F grades as low grades where the marks tend to less than 60. These said grades consider low because these clue to affects the GPA of student negatively. The data of courses having higher percentage of low grades are selected for analysis, For the selected courses, the data of all students is collected.
Our Approach –Ensemble Classification
Our Approach –Ensemble Classification Ensemble classification techniques based on the method of combining the classifiers in order to acquire the reliable results. The most common model combining approaches that exist in the data mining, are Bagging and boosting. Bagging technique has a voting structure in which n models, generally of same nature, are built. For an unidentified instances, each model’s predictions are verified. That class is given which is contribution the majority vote between the predictions from models. Boosting technique has is almost same to bagging in which only the model building stage changes. Here the instances which are repeatedly misclassified are permitted to contribute in training added amount of times. There are normally n classifiers which having distinct weights for their accuracies. As a final point, that class is given which to having maximum weight.
Results – Basic statistical Facts
Academic Facts
Results – Technical Facts
Results – Statistical Summary
Technical Bottom line The results stating the facts that the boosted tree performs outclass than bagged tree comparatively when the standard deviations are in higher range but also when the data size is small at the same time.
Suggestions Summarizing these facts, it is concluded that students need to improve their subjective approach like in “Mid Term” and “Final Subjective” in order to have “High Grades” and promoted to the next semester. And there should be balance in evaluation of different level of students.
Future Work As for future work, some other factors related to our research questions will be included like some financial and behavioral factors that may lead to better classification and to answer some new real time questions from educational environment. The more facts can be catch by enlarging the dataset and to include some other variables routing to new directions in decision making. Also some other mining techniques can be applied to discover some realities other than classification nature.
Thank You. khawarshakeel@gmail.com My presentation is over. Thank you very much. Do you have any question?