CS539 Project Report -- Evaluating hypothesis

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Data Mining Lecture 9.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
C4.5 - pruning decision trees
Data Mining: A Closer Look Chapter Data Mining Strategies.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Evaluating Hypotheses
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.
Chapter 5 Data mining : A Closer Look.
Evaluating Performance for Data Mining Techniques
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
COMP3503 Intro to Inductive Modeling
Test Topics 1)Notation and symbols 2)Determining if CLT applies. 3)Using CLT to find mean/mean proportion and standard error of sampling distribution 4)Finding.
Experimental Evaluation of Learning Algorithms Part 1.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
CS690L Data Mining: Classification
CpSc 881: Machine Learning Evaluating Hypotheses.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Unit 1 Review Standards 1-8. Standard 1: Describe subsets of real numbers.
An Exercise in Machine Learning
Data Mining and Decision Support
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
BUS 308 Week 1 Quiz Check this A+ tutorial guideline at 1. Data on the city from which members.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
DECISION TREES An internal node represents a test on an attribute.
C4.5 - pruning decision trees
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
COMP1942 Classification: More Concept Prepared by Raymond Wong
Chapter 6 Classification and Prediction
Find the Features of Noses
Chapter 11: Learning Introduction
Supervised Learning Seminar Social Media Mining University UC3M
SAD: 6º Projecto.
Predict House Sales Price
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Empirical Evaluation (Ch 5)
Generalization ..
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Techniques for Data Mining
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Prepared by: Mahmoud Rafeek Al-Farra
CS539: Project 3 Zach Pardos.
Evaluating Hypotheses
Statistical Process Control
INTRODUCTION TO Machine Learning
3.4 – The Quadratic Formula
Decision Trees for Mining Data Streams
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Evaluating Hypothesis
Learning Chapter 18 and Parts of Chapter 20
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Using Clustering to Make Prediction Intervals For Neural Networks
Practice Project Overview
Machine Learning: Lecture 5
Presentation transcript:

CS539 Project Report -- Evaluating hypothesis Mingyu Feng Feb 24th, 2004

Learning J48 Decision Tree Dataset: S1, 45 nominal and 10 real attributes. No preprocessing Result Correctly Classified Instances 633 63.3 % Incorrectly Classified Instances 367 36.7 % Kappa statistic 0.403 Mean absolute error 0.1177 Root mean squared error 0.2946 Relative absolute error 65.7543 % Root relative squared error 98.6013 % errorS1(t) = 0.367 95% confidence interval = 0.367 +/- 0.2987. Feb 24th, 2005

Learning Neural Networks Dataset: S1, 45 nominal and 10 real attributes. No preprocessing Result Correctly Classified Instances 648 64.8 % Incorrectly Classified Instances 352 35.2 % Kappa statistic 0.4266 Mean absolute error 0.1075 Root mean squared error 0.2888 Relative absolute error 60.1082 % Root relative squared error 96.6877 % errors2(nn) = 0.352 Feb 24th, 2005

Difference between True Errors errorS1(t) = 0.367, errors2(nn) = 0.352 d = 0.015, σ = 0.0215 95% two-sided confidence interval for difference between two true errors 0.015 +/- 0.042 Feb 24th, 2005

Compare Learning Algorithms k =11 1100 instances in D0 divided into 11 partitions (T1… T11), 100 instances each For i =1 to 11, use Ti as test set, train decision tree and neural networks on remaining data Expectation of i =-0.0082, standard derivation = 0.00784 95% confidence interval for estimating the difference in error between J4.8 decision trees and neural networks: -0.0082 +/- 0.0175 i 1 2 3 4 5 6 7 8 9 10 11 error Ti (t) 0.37 0.28 0.39 0.29 0.32 0.33 error Ti (nn) 0.40 0.34 0.31 0.38 0.3 0.36 0.35 i 0.04 -0.03 0.03 0.01 -0.01 -0.04 -0.02 Feb 24th, 2005