CS539 Project Report -- Evaluating hypothesis

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

Data Mining Lecture 9.

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

C4.5 - pruning decision trees

Data Mining: A Closer Look Chapter Data Mining Strategies.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)

Evaluating Hypotheses

Neural Networks Chapter Feed-Forward Neural Networks.

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Experimental Evaluation

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.

Chapter 5 Data mining : A Closer Look.

Evaluating Performance for Data Mining Techniques

1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.

1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.

COMP3503 Intro to Inductive Modeling

Test Topics 1)Notation and symbols 2)Determining if CLT applies. 3)Using CLT to find mean/mean proportion and standard error of sampling distribution 4)Finding.

Experimental Evaluation of Learning Algorithms Part 1.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.

1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.

Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

CS690L Data Mining: Classification

CpSc 881: Machine Learning Evaluating Hypotheses.

Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.

Machine Learning Chapter 5. Evaluating Hypotheses

1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)

Unit 1 Review Standards 1-8. Standard 1: Describe subsets of real numbers.

An Exercise in Machine Learning

Data Mining and Decision Support

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

BUS 308 Week 1 Quiz Check this A+ tutorial guideline at 1. Data on the city from which members.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

DECISION TREES An internal node represents a test on an attribute.

C4.5 - pruning decision trees

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

COMP1942 Classification: More Concept Prepared by Raymond Wong

Chapter 6 Classification and Prediction

Find the Features of Noses

Chapter 11: Learning Introduction

Supervised Learning Seminar Social Media Mining University UC3M

SAD: 6º Projecto.

Predict House Sales Price

CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,

Empirical Evaluation (Ch 5)

Generalization ..

Data Mining Practical Machine Learning Tools and Techniques

Machine Learning Techniques for Data Mining

Weka Free and Open Source ML Suite Ian Witten & Eibe Frank

Prepared by: Mahmoud Rafeek Al-Farra

CS539: Project 3 Zach Pardos.

Evaluating Hypotheses

Statistical Process Control

INTRODUCTION TO Machine Learning

3.4 – The Quadratic Formula

Decision Trees for Mining Data Streams

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Evaluating Hypothesis

Learning Chapter 18 and Parts of Chapter 20

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Using Clustering to Make Prediction Intervals For Neural Networks

Practice Project Overview

Machine Learning: Lecture 5

Presentation transcript:

CS539 Project Report -- Evaluating hypothesis Mingyu Feng Feb 24th, 2004

Learning J48 Decision Tree Dataset: S1, 45 nominal and 10 real attributes. No preprocessing Result Correctly Classified Instances 633 63.3 % Incorrectly Classified Instances 367 36.7 % Kappa statistic 0.403 Mean absolute error 0.1177 Root mean squared error 0.2946 Relative absolute error 65.7543 % Root relative squared error 98.6013 % errorS1(t) = 0.367 95% confidence interval = 0.367 +/- 0.2987. Feb 24th, 2005

Learning Neural Networks Dataset: S1, 45 nominal and 10 real attributes. No preprocessing Result Correctly Classified Instances 648 64.8 % Incorrectly Classified Instances 352 35.2 % Kappa statistic 0.4266 Mean absolute error 0.1075 Root mean squared error 0.2888 Relative absolute error 60.1082 % Root relative squared error 96.6877 % errors2(nn) = 0.352 Feb 24th, 2005

Difference between True Errors errorS1(t) = 0.367, errors2(nn) = 0.352 d = 0.015, σ = 0.0215 95% two-sided confidence interval for difference between two true errors 0.015 +/- 0.042 Feb 24th, 2005

Compare Learning Algorithms k =11 1100 instances in D0 divided into 11 partitions (T1… T11), 100 instances each For i =1 to 11, use Ti as test set, train decision tree and neural networks on remaining data Expectation of i =-0.0082, standard derivation = 0.00784 95% confidence interval for estimating the difference in error between J4.8 decision trees and neural networks: -0.0082 +/- 0.0175 i 1 2 3 4 5 6 7 8 9 10 11 error Ti (t) 0.37 0.28 0.39 0.29 0.32 0.33 error Ti (nn) 0.40 0.34 0.31 0.38 0.3 0.36 0.35 i 0.04 -0.03 0.03 0.01 -0.01 -0.04 -0.02 Feb 24th, 2005