Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,

Slides:



Advertisements
Similar presentations
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Wendy Williams Metaheuristic Algorithms Genetic Algorithms: A Tutorial “Genetic Algorithms are good at taking large, potentially huge search spaces and.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Data Mining Techniques Outline
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
Chapter 5 Data mining : A Closer Look.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to machine learning
Genetic Algorithms: A Tutorial
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
UWECE 539 Class Project Engine Operating Parameter Optimization using Genetic Algorithm ECE 539 –Introduction to Artificial Neural Networks and Fuzzy Systems.
Efficient Model Selection for Support Vector Machines
Mining Feature Importance: Applying Evolutionary Algorithms within a Web- based Educational System CITSA 2004, Orlando, July 2004 Behrouz Minaei-Bidgoli,
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Evolution Strategies Evolutionary Programming Genetic Programming Michael J. Watts
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Michigan State University Gathering and Timely use of Feedback from Individualized On-Line Work. Matthew HallJoyce ParkerBehrouz Minaei-BigdoliGuy AlbertelliGerd.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
Validation methods.
Genetic Algorithms. Solution Search in Problem Space.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Evolutionary Computation Evolving Neural Network Topologies.
Introduction to Machine Learning, its potential usage in network area,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Purposeful Model Parameters Genesis in Simple Genetic Algorithms
Chapter 7. Classification and Prediction
Evolution Strategies Evolutionary Programming
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Bulgarian Academy of Sciences
Rule Induction for Classification Using
Michigan State University
Presented by: Dr Beatriz de la Iglesia
Conditional Random Fields for ASR
Perceptrons Lirong Xia.
MultiRefactor: Automated Refactoring To Improve Software Quality
Using Bayesian Networks to Predict Test Scores
Gerd Kortemeyer, William F. Punch
Genetic Algorithms: A Tutorial
Behrouz Minaei, William Punch
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The loss function, the normal equation,
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Mathematical Foundations of BME Reza Shadmehr
Lecture 10 – Introduction to Weka
Traveling Salesman Problem by Genetic Algorithm
Evolutionary Ensembles with Negative Correlation Learning
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Genetic Algorithms: A Tutorial
Perceptrons Lirong Xia.
Presenter: Donovan Orn
Presentation transcript:

Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli, Deborah A. Kashy, Gerd Kortemeyer, William F. Punch Michigan State University

Topics Problem Overview Classification Methods Combination of Classifiers Weighting the features Using GA to choose best set of weights Experimental Results Conclusion Contribution Next Steps 12/5/2018 FIE 2003

Problem Overview This research is a part of the latest online educational system developed at Michigan State University (MSU), the Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA). In LON-CAPA, we are involved with two kinds of large data sets: Educational resources: web pages, demonstrations, simulations, individualized problems, quizzes, and examinations. Information about users who create, modify, assess, or use these resources. Find classes of students. Groups of students use these online resources in a similar way. Predict for any individual student to which class he/she belongs. 12/5/2018 FIE 2003

Data Set: PHY183 SS02 227 students 12 Homework sets 184 Problems 12/5/2018 FIE 2003

Class Labels (3-ways) 2-Classes   3-Classes   9-Classes 12/5/2018 FIE 2003

Extracted Features Total number of correct answers. (Success rate) Success at the first try Number of attempts to get answer Time spent until correct Total time spent on the problem Participating in the communication mechanisms 12/5/2018 FIE 2003

Classifiers Genetic Algorithm (GA) Non-Tree Classifiers (Using MATLAB) Bayesian Classifier 1NN kNN Multi-Layer Perceptron Parzen Window Combination of Multiple Classifiers (CMC) Genetic Algorithm (GA) Decision Tree-Based Software C5.0 (RuleQuest <<C4.5<<ID3) CART (Salford-systems) QUEST (Univ. of Wisconsin) CRUISE [use an unbiased variable selection technique] 12/5/2018 FIE 2003

GA Optimizer vs. Classifier Apply GA directly as a classifier Use GA as an optimization tool for resetting the parameters in other classifiers. Most application of GA in pattern recognition applies GA as an optimizer for some parameters in the classification process. GA is applied to find an optimal set of feature weights that improve classification accuracy. 12/5/2018 FIE 2003

Fitness/Evaluation Function 5 classifiers: Multi-Layer Perceptron 2 Minutes Bayesian Classifier 1NN kNN Parzen Window CMC 3 seconds Divide data into training and test sets (10-fold Cross-Val) Fitness function: performance achieved by classifier 12/5/2018 FIE 2003

Individual Representation The GA Toolbox supports binary, integer and floating-point chromosome representations. Chrom = crtrp(NIND, FieldDR) creates a random real-valued matrix of N x d, NIND specifies the number of individuals in the population FieldDR = [ 0 0 0 0 0 0; % lower bound 1 1 1 1 1 1]; % upper bound Chrom = 0.23 0.17 0.95 0.38 0.06 0.26 0.35 0.09 0.43 0.64 0.20 0.54 0.50 0.10 0.09 0.65 0.68 0.46 0.21 0.29 0.89 0.48 0.63 0.89 12/5/2018 FIE 2003

Simple GA % Create population Chrom = crtrp(NIND, FieldD); % Evaluate initial population, ObjV = ObjFn2(data, Chrom); gen = 0; while gen < MAXGEN, % Assign fitness-value to entire population FitnV = ranking(ObjV); % Select individuals for breeding SelCh = select(SEL_F, Chrom, FitnV, GGAP); % Recombine selected individuals (crossover) SelCh = recombin(XOV_F, SelCh,XOVR); % Perform mutation on offspring SelCh = mutate(MUT_F, SelCh, FieldD, MUTR); % Evaluate offspring, call objective function ObjVSel = ObjFn2(data, SelCh); % Reinsert offspring into current population [Chrom ObjV]=reins(Chrom,SelCh,1,1,ObjV,ObjVSel); gen = gen+1; MUTR = MUTR+0.001; end 12/5/2018 FIE 2003

Results without GA Performance % Classifier 2-Classes 3-Classes Tree Classifier C5.0 80.3 56.8 25.6 CART 81.5 59.9 33.1 QUEST 80.5 57.1 20.0 CRUISE 81.0 54.9 22.9 Non-tree Classifier Bayes 76.4 48.6 23.0 1NN 76.8 50.5 29.0 kNN 82.3 50.4 28.5 Parzen 75.0 48.1 21.5 MLP 79.5 50.9 - CMC 86.8 70.9 51.0 12/5/2018 FIE 2003

Results of using GA 12/5/2018 FIE 2003

12/5/2018 FIE 2003

GA Optimization Results 12/5/2018 FIE 2003

GA Optimization Results 12/5/2018 FIE 2003

GA Optimization Results Feature Extraction 12/5/2018 FIE 2003

Summary Four classifiers used to segregate the students. A combination of multiple classifiers led to a significant accuracy improvement in all three cases (2-, 3- and 9-Classes). Weighting the features and using a genetic algorithm to minimize the error rate improves the prediction accuracy by at least 10% in the all cases. 12/5/2018 FIE 2003

Conclusion In the case of the number of features is low, the feature weighting is working better than feature selection. The successful implementations of using CMC to classify the students and successful optimization through Genetic Algorithm demonstrate the merits of using the LON-CAPA data for pattern recognition in order to predict the students’ final grades based on their features, which are extracted from the homework data. 12/5/2018 FIE 2003

Contribution This method may be of considerable usefulness in A new approach to evaluating student usage of web-based instruction Easily adaptable to different types of courses, different population sizes, and different attributes to be analyzed This method may be of considerable usefulness in identifying students at risk early, especially in very large classes, Allow the instructor to provide appropriate advising in a timely manner. 12/5/2018 FIE 2003

Next Steps Apply data mining methods to find the Frequency Dependency and Association Rules among the groups of problems (Mathematical, Optional Response, Numerical, Java Applet, etc) Data mining help instructors, problem authors, and course coordinators better design online materials find sequences of online problems that students take to solve homework problems help instructors to develop their homework more effectively and efficiently help to detect anomalies in homework problems designed by instructors 12/5/2018 FIE 2003