Download presentation
Presentation is loading. Please wait.
Published byMarshall Garrison Modified over 9 years ago
1
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart University Bangkok, Thailand
2
2 Content PART I Introduction to data mining Data mining technique: association rule discovery Data mining technique: data classification PART II Improving quality of graduate students by data mining Conclusion
3
3 What Is Data Mining ? Knowledge Discovery from Data: KDD (Data Mining): The process of nontrivial extraction of patterns from data. Patterns that are: implicit, previously unknown, and potentially useful Patterns must be comprehensible for human users.
4
4 Knowledge Discovery Process: Iterative & Interactive Process Data sources Databases, flat files, Complex data Data Warehouses Preprocessing data Gathering, cleaning and selecting data Search for patterns: Data Mining Neural nets, machine learning, statistics and others Analyst reviews output Report findings Take actions based on findings Interpret results Mining Objective
5
5 What kind of data can be mined? Relational databases Data warehouses Transactional databases and Flat files Advanced DB systems and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases, multimedia databases Heterogeneous and legacy databases World Wide Web Bioinformatic data Databases Data Warehouse
6
6 Two modes of data mining Predictive data mining Predict behavior based on historic data Use data with known results to build a model that can be later used to explicitly predict values for different data Methods: classification, prediction, … etc. Descriptive data mining Describe patterns in existing data that may be used to guide decisions Methods: Associations rule discovery, Sequence pattern discovery, Clustering, … etc.
7
7 Data Mining Techniques Data Clustering Association rule discovery Data Classification Outlier detection Data regression Etc.
8
8
9
9 Classification is the process of assigning new objects to predefined categories or classes Given a set of labeled records Build a model Predict labels for future unlabeled records Example: Age, Educational background, Annual income, Current debts, Housing location => Making Decision Degree= “ Master ” and Income=7500 => Credit= “ Excellent ” Data Classification
10
10 Three-Step Process of Classification Model construction Model Evaluation Classification Classifier Model Training Data Testing Data Classifier Model Unseen Data
11
11 Data Mining Tools ANGOSS KnowledgeStudio IBM Intelligent Miner Metaputer PolyAnalyst SAS Enterprise Miner SGI Mineset SPSS Clementine Many others More at http://www.kdnuggets.com/softwarehttp://www.kdnuggets.com/software
12
12 Data Mining Projects Checklist: Start with well-defined questions Define measures of success and failure Main difficulty: No automation Understanding the problem Data preparation Selection of the right mining methods Interpretation
13
13 Using Data Mining for Improving Quality of Engineering Graduates Objective: Discover knowledge from large databases of engineering student records. Discovered knowledge are useful in: - Assisting in development of new curricula, - Improvement of existing curricula, - Helping students to select the appropriate major
14
14 Using a data mining technique to help students in selecting their majors Motivation: - Student major selection is very important factor for his/her success. - Lack of experience and information on each major. Solution: - Find out the profiles of good students for each major using student profile database and course enrollment student databases (10 years) - Determine the most appropriate major for each student
15
15 A Data Mining based Approach for Improving Quality of Engineering Graduates DB2 SQL Server course enrollment student databases student profile database Data Mining Tool Java Servlet User
16
16 Data for Data Mining Stu_codeSexAddressSch_GPA..... GPA 37058063mal e Bangko k 2.5..... 2.3 37058167mal e Songkla3.4..... 3.2................................ Stu_codeSub_codeTermYearGrade 3705806320411112537 C+ 3705806340311112537D 3705806320811112537 B+ Student profile database course enrollment student databases
17
17 Data preparation a classification model Stu_co de Se x Addr ess Sch_G PA..... GP A 37058 063 ma le Bang kok 2.5..... 2.3 37058 167 ma le Song kla 3.4..... 3.2................................ Stu_c ode Sub_c ode Ter m Ye ar Gra de 37058 063 20411 1 125 37 C+ 37058 063 40311 1 125 37 D 37058 063 20811 1 125 37 B+ Stu_co de Sex 2041 11 4031 11 … GPA 370580 63 m ale Medi um Low.... 2.3 370581 67 m ale High..... 3.2.................................... +
18
18 Global Classification Model Global Decision Tree which determines which majors should be appropriate to which students. Each internal node represents a test on student’s profile. Each leaf node represents an appropriate major to be selected
19
19 Drawbacks of Global Classification Model - Low Precision ~ 50% due to the large number of majors - Number of students is different in each department => the model cannot predict correctly the best major to be selected. - The model proposes a unique major to be selected, a set of possible majors ordered by appropriateness score would be preferred.
20
20 Classification Model for Each Major -Decision tree predicts whether a student is likely to be a good student in a given major. - Good students are those that graduate within 4 years and are at the first 40% ranking in a given major. - Leaf nodes represent two class: Good and Bad
21
21 Advantage of Major ’ s Classification Model Good precision 80% The model predicts the best major to be selected even if number of students in each major is different Its proposes a set of possible majors to be selected ordered by appropriateness score. Encountered problems Database size Other factors that could affect student’s decision: Teacher Preference, etc.
22
22 Presentation of Discovered Knowledge
23
23 Applying Association rule discovery for Grade prediction Basket Analysis 204111 Mediu m 403111 High 417167 Mediu m 417168 Mediu m Education
24
24 Grade Prediction for the Coming Term
25
25 Presentation of Discovered Knowledge
26
26 Conclusion & Future works Application of data mining in Education Use data mining techniques for improving quality of engineering students Apply data mining techniques to several other educational domains.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.