Gerd Kortemeyer, William F. Punch

Slides:



Advertisements
Similar presentations
Feature Selection Presented by: Nafise Hatamikhah
Advertisements

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Chapter 5 Data mining : A Closer Look.
LON-CAPA 1 The LearningOnline Network with Computer- Assisted Personalized Approach (LON-CAPA) Gerd Kortemeyer Michigan State University.
LON-CAPA 1 The LearningOnline Network with Computer-Assisted Personalized Approach (LON-CAPA) Stuart Raeburn Michigan State University.
LON-CAPA 1 Knowing what 400 students know Gerd Kortemeyer Michigan State University... before it’s too late.
LON-CAPA 1 The LearningOnline Network with Computer- Assisted Personalized Approach (LON-CAPA) Gerd Kortemeyer Michigan State University.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
LON-CAPA 1 The LearningOnline Network with Computer-Assisted Personalized Approach (LON-CAPA) Gerd Kortemeyer Guy Albertelli Michigan State University.
Mining Feature Importance: Applying Evolutionary Algorithms within a Web- based Educational System CITSA 2004, Orlando, July 2004 Behrouz Minaei-Bidgoli,
LON-CAPA The LearningOnline Network with Computer-Assisted Personalized Approach Initially developed at Michigan State University Funded by Michigan State.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Chapter 1 Introduction to Data Mining
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Data Mining By Dave Maung.
Common Wisdom “Nobody ever got fired for buying IBM” –Google:5930 hits “Nobody ever got fired for buying Microsoft” –Google:9650 hits.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining and Decision Support
Mining of Massive Datasets Edited based on Leskovec’s from
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
Book web site:
Introduction to Machine Learning, its potential usage in network area,
Machine Learning with Spark MLlib
Data Mining – Intro.
Chapter 7. Classification and Prediction
Applying Neural Networks
DATA MINING © Prentice Hall.
Michigan State University
Chapter 6 Classification and Prediction
Can Computer Algorithms Guess Your Age and Gender?
Data Mining 101 with Scikit-Learn
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
CSE591: Data Mining by H. Liu
Prepared by: Mahmoud Rafeek Al-Farra
Association Analysis for an Online Education System
Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,
Behrouz Minaei, William Punch
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Web Mining Department of Computer Science and Engg.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Lecture 10 – Introduction to Weka
A task of induction to find patterns
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
A task of induction to find patterns
CSE591: Data Mining by H. Liu
Presentation transcript:

Gerd Kortemeyer, William F. Punch Enhancing Online Learning Performance: An Application of Data Mining Methods CATE 2004 Kauai, August 2004 Behrouz Minaei, Gerd Kortemeyer, William F. Punch

Outline LON-CAPA Overview Problem Statement Classification Methods Combination of Multiple Classifiers Weighting the features, using GA to choose best set of weights Experimental Results Contribution Conclusion 11/19/2018 CATE 2004

LON-CAPA Learning Content Management System Assessment System This research is a part of the latest online educational system developed at Michigan State University (MSU), the Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA). Learning Content Management System 9 high schools, 2 community colleges, and 17 universities nationwide Assessment System Online assessment with immediate feedback and multiple tries Different students get different versions of the same problem Different options, graphs, images, numbers, or formulas Open-Source and Free (GPL, Runs on Linux) 11/19/2018 CATE 2004

LON-CAPA Data Three kinds of growing data sets: Educational resources: web pages, demonstrations, simulations, individualized problems, quizzes, and examinations. Information about users who create, modify, assess, or use these resources. Data about how students use and access the educational materials 12,500 images 500 movies 23,000 content pages 18,600 homework and exam problems 1,100 simulations and animations 11/19/2018 CATE 2004

MSU – Fall 2003 50 courses used LON-CAPA at MSU Total student enrollment approximately 3,067 (out of 13,400 total global student-users) Disciplines included Advertising, Biochemistry, Biology, Chemistry, Finance, Geology, Math, Physics, Plant Biology, Statistics for Psychology 11/19/2018 CATE 2004

Data Distribution LON-CAPA collects data for every single access to the resources in both activity log and student database Logs are not only huge but also distributed and specific to a web-based educational system (LON-CAPA) Intelligent automated tools needed to discover relevant, useful, and interesting patterns Apply the discovered rules to produce more intelligent system 11/19/2018 CATE 2004

Knowledge Discovery Process Data Integration, removing inconsistency, … Data Cleansing, correcting errors, missing values Discretization, transform continuous to categorical Feature Selection, features are more relevant Mining process, rule discovery Post-processing, Large set rules  simplify 1) More comprehensible, 2) More interesting Use combination of objective and subjective approaches 11/19/2018 CATE 2004

Data Mining Tasks Classification: Clustering: (unsupervised learning) The goal is to predict the class variable based on the feature values of samples …Avoid Overfitting Clustering: (unsupervised learning) Association Analysis: Find the binary relationship among the data items Any feature variable can occur both in antecedent and in the consequent of a rule. 11/19/2018 CATE 2004

Statement of Problem(1) Our claim is that data mining can help to design better and more intelligent educational web-based environment Can help instructor to design the course more effectively, detect anomaly Can help students to use the resources more efficiently 11/19/2018 CATE 2004

Can help the instructor provide appropriate Statement of Problem (2) Can find classes of students. Groups of students use these online resources in a similar way Can help the instructor provide appropriate advising in a timely manner Predict for any individual student to which class he/she belongs 11/19/2018 CATE 2004

Data Sets: MSU online courses 11/19/2018 CATE 2004

Extracted Features Total number of attempts Total no. of correct answers (Success rate) Success on the first try Success on the second try Success after 3 to 9 attempts Success after 10 or more attempts Total time until the correct answer Total time spent, regardless of success Participation in online communication 11/19/2018 CATE 2004

Classifiers Genetic Algorithm (GA), Optimizer Non-Tree Classifiers (Using MATLAB) Bayesian Classifier 1NN kNN Multi-Layer Perceptron Parzen Window Combination of Multiple Classifiers (CMC) Genetic Algorithm (GA), Optimizer Decision Tree-Based Software C5.0 (RuleQuest <<C4.5<<ID3) CART (Salford-systems) QUEST (Univ. of Wisconsin) CRUISE [use an unbiased variable selection technique] 11/19/2018 CATE 2004

Fitness/Evaluation Function 5 classifiers: Multi-Layer Perceptron 2 Minutes Bayesian Classifier 1NN kNN Parzen Window CMC 3 seconds Divide data into training and test sets (10-fold Cross-Validation) Fitness function: performance achieved by classifier 11/19/2018 CATE 2004

Results without GA 11/19/2018 CATE 2004

Results of using GA 11/19/2018 CATE 2004

Results of using GA 11/19/2018 CATE 2004

GA Optimization Results 11/19/2018 CATE 2004

Features importance 11/19/2018 CATE 2004

Conclusion Four classifiers used to segregate the students. CMC improves accuracy significantly. Weighting the features and using a genetic algorithm to minimize the error rate improves the prediction accuracy by at least 10% in the all cases. In the case of the number of features is low, the feature weighting is working better than feature selection. 11/19/2018 CATE 2004

Contribution A new approach to evaluating student usage of web-based instruction An approach that is easily adaptable to different types of courses, different population sizes, and different attributes to be analyzed Rigorous application of known classifiers as a means of analyzing and comparing use and performance of students who have taken a technical course that was partially/completely administered via the web 11/19/2018 CATE 2004

Future work Can find some associative rules between students’ educational activities Can help instructors predict/describe the approaches that students will take for some types of problems Can be used to identify those students who are at risk, especially in very large classes 11/19/2018 CATE 2004

Questions http://www. lon-capa. org http://garage. cse. msu Questions http://www.lon-capa.org http://garage.cse.msu.edu minaeibi@cse.msu.edu 11/19/2018 CATE 2004