Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.

Similar presentations


Presentation on theme: "1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart."— Presentation transcript:

1 1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart University Bangkok, Thailand

2 2 Content PART I  Introduction to data mining  Data mining technique: association rule discovery  Data mining technique: data classification PART II  Improving quality of graduate students by data mining Conclusion

3 3 What Is Data Mining ? Knowledge Discovery from Data: KDD (Data Mining): The process of nontrivial extraction of patterns from data. Patterns that are: implicit, previously unknown, and potentially useful Patterns must be comprehensible for human users.

4 4 Knowledge Discovery Process: Iterative & Interactive Process Data sources Databases, flat files, Complex data Data Warehouses Preprocessing data Gathering, cleaning and selecting data Search for patterns: Data Mining Neural nets, machine learning, statistics and others Analyst reviews output Report findings Take actions based on findings Interpret results Mining Objective

5 5 What kind of data can be mined? Relational databases Data warehouses Transactional databases and Flat files Advanced DB systems and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases, multimedia databases Heterogeneous and legacy databases World Wide Web Bioinformatic data Databases Data Warehouse

6 6 Two modes of data mining Predictive data mining Predict behavior based on historic data Use data with known results to build a model that can be later used to explicitly predict values for different data Methods: classification, prediction, … etc. Descriptive data mining Describe patterns in existing data that may be used to guide decisions Methods: Associations rule discovery, Sequence pattern discovery, Clustering, … etc.

7 7 Data Mining Techniques Data Clustering Association rule discovery Data Classification Outlier detection Data regression Etc.

8 8

9 9 Classification is the process of assigning new objects to predefined categories or classes  Given a set of labeled records  Build a model  Predict labels for future unlabeled records Example:  Age, Educational background, Annual income, Current debts, Housing location => Making Decision  Degree= “ Master ” and Income=7500 => Credit= “ Excellent ” Data Classification

10 10 Three-Step Process of Classification Model construction Model Evaluation Classification Classifier Model Training Data Testing Data Classifier Model Unseen Data

11 11 Data Mining Tools ANGOSS KnowledgeStudio IBM Intelligent Miner Metaputer PolyAnalyst SAS Enterprise Miner SGI Mineset SPSS Clementine Many others More at http://www.kdnuggets.com/softwarehttp://www.kdnuggets.com/software

12 12 Data Mining Projects Checklist: Start with well-defined questions Define measures of success and failure Main difficulty: No automation Understanding the problem Data preparation Selection of the right mining methods Interpretation

13 13 Using Data Mining for Improving Quality of Engineering Graduates Objective: Discover knowledge from large databases of engineering student records. Discovered knowledge are useful in: - Assisting in development of new curricula, - Improvement of existing curricula, - Helping students to select the appropriate major

14 14 Using a data mining technique to help students in selecting their majors Motivation: - Student major selection is very important factor for his/her success. - Lack of experience and information on each major. Solution: - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years) - Determine the most appropriate major for each student

15 15 A Data Mining based Approach for Improving Quality of Engineering Graduates DB2 SQL Server course enrollment student databases student profile database Data Mining Tool Java Servlet User

16 16 Data for Data Mining Stu_codeSexAddressSch_GPA..... GPA 37058063mal e Bangko k 2.5..... 2.3 37058167mal e Songkla3.4..... 3.2................................ Stu_codeSub_codeTermYearGrade 3705806320411112537 C+ 3705806340311112537D 3705806320811112537 B+ Student profile database course enrollment student databases

17 17 Data preparation a classification model Stu_co de Se x Addr ess Sch_G PA..... GP A 37058 063 ma le Bang kok 2.5..... 2.3 37058 167 ma le Song kla 3.4..... 3.2................................ Stu_c ode Sub_c ode Ter m Ye ar Gra de 37058 063 20411 1 125 37 C+ 37058 063 40311 1 125 37 D 37058 063 20811 1 125 37 B+ Stu_co de Sex 2041 11 4031 11 … GPA 370580 63 m ale Medi um Low.... 2.3 370581 67 m ale High..... 3.2.................................... +

18 18 Global Classification Model Global Decision Tree which determines which majors should be appropriate to which students. Each internal node represents a test on student’s profile. Each leaf node represents an appropriate major to be selected

19 19 Drawbacks of Global Classification Model - Low Precision ~ 50% due to the large number of majors - Number of students is different in each department => the model cannot predict correctly the best major to be selected. - The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred.

20 20 Classification Model for Each Major -Decision tree predicts whether a student is likely to be a good student in a given major. - Good students are those that graduate within 4 years and are at the first 40% ranking in a given major. - Leaf nodes represent two class: Good and Bad

21 21 Advantage of Major ’ s Classification Model  Good precision 80%  The model predicts the best major to be selected even if number of students in each major is different  Its proposes a set of possible majors to be selected ordered by appropriateness score. Encountered problems Database size Other factors that could affect student’s decision:  Teacher Preference, etc.

22 22 Presentation of Discovered Knowledge

23 23 Applying Association rule discovery for Grade prediction Basket Analysis 204111 Mediu m 403111 High 417167 Mediu m 417168 Mediu m Education

24 24 Grade Prediction for the Coming Term

25 25 Presentation of Discovered Knowledge

26 26 Conclusion & Future works Application of data mining in Education Use data mining techniques for improving quality of engineering students Apply data mining techniques to several other educational domains.


Download ppt "1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart."

Similar presentations


Ads by Google