CS539: Project 3 Zach Pardos.

Slides:



Advertisements
Similar presentations
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Advertisements

Authorship Verification Authorship Identification Authorship Attribution Stylometry.
Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos, Neil Heffernan, Brigham Anderson and Cristina Heffernan.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Low/High Findability Analysis Shariq Bashir Vienna University of Technology Seminar on 2 nd February, 2009.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Credit Card Applicants’ Credibility Prediction with Decision Tree n Dan Xiao n Jerry Yang.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Matthew Greenstein | METEO 485 | Apr. 26, 2004 Using Neural Networks and Lagged Climate Indices to Predict Monthly Temperature and Precipitation Anomalies.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
An Exercise in Machine Learning
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Multiplication Timed Tests.
Frank DiMaio and Jude Shavlik Computer Sciences Department
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Logistic Regression: To classify gene pairs
Evaluating Classifiers
How to interact with the system?
Data Mining – Algorithms: Instance-Based Learning
Chapter 6 Classification and Prediction
Data Mining – Credibility: Evaluating What’s Been Learned
Statistical Inference
Find the Features of Noses
SAD: 6º Projecto.
Dipartimento di Ingegneria «Enzo Ferrari»,
Edge Weight Prediction in Weighted Signed Networks
Using Bayesian Networks to Predict Test Scores
Generalization ..
November 2017 Dr. Dale Parson
Understanding Your PSAT Scores
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Graph-based Adaptive Diagnosis
Road Traffic Sign Recognition
Project 1: Text Classification by Neural Networks
CS 4700: Foundations of Artificial Intelligence
Discriminative Frequent Pattern Analysis for Effective Classification
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
How to interact with the system?
Project.
Evaluating Models Part 2
CSCI N317 Computation for Scientific Applications Unit Weka
R & Trees There are two tree libraries: tree: original
Machine Learning in Practice Lecture 23
CS539 Project Report -- Evaluating hypothesis
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification Breakdown
Understanding ACT WorkKeys Scores.
Lecture 10 – Introduction to Weka
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Identifying Severe Weather Radar Characteristics
Assignment 8 : logistic regression
Assignment 7 Due Application of Support Vector Machines using Weka software Must install libsvm Data set: Breast cancer diagnostics Deliverables:
Fruıt ımage recognıtıon wıth weka
Neural Networks Weka Lab
A Data Partitioning Scheme for Spatial Regression
INTRODUCTION TO Machine Learning 3rd Edition
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

CS539: Project 3 Zach Pardos

Assistments Online Dataset Math question response data from 592 students. 1,143 math question attributes {correct, incorrect} Average of 200 questions answered per student (lots of missing values) Class: MCAS SCORE {0-29}

Assistments Online Dataset Skill models: 1, 5, 39, 106

Assistments Online Dataset How well can ANNs fit the dataset with only 1, 5, 39 or 106 hidden nodes? Default Weka values used for ANN training Epochs: 500 Learning: 0.3 Momentum: 0.2 No validation set Training-set for testing

Assistments Online Dataset Results for training-set testing: With 1 Hidden Node: Correctly Classified Instances 77 Incorrectly Classified Instances 515 Relative absolute error 95.5309 % With 5 Hidden Nodes: Correctly Classified Instances 220 Incorrectly Classified Instances 372 Relative absolute error 77.8246 %

Assistments Online Dataset Results for training-set testing: With 39 Hidden Nodes: Correctly Classified Instances 590 Incorrectly Classified Instances 2 Relative absolute error 3.2983 % With 106 Hidden Nodes: Correctly Classified Instances 587 Incorrectly Classified Instances 5 Relative absolute error 2.8975 %

Assistment Online Dataset Conclusion: 39 and 106 models predict very well. How well can ANNs generalize and predict instances they haven’t trained on? Next up: 10-fold cross validation

Assistment Online Dataset

Assistment Online Dataset Conclusions: ANNs very good at fitting data Not as good at predicting unseen cases Possible that more nodes are required to properly generalize (more CPU!)