General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Learning Algorithm Evaluation
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Indian Statistical Institute Kolkata
The loss function, the normal equation,
Sparse vs. Ensemble Approaches to Supervised Learning
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Evaluation.
Three kinds of learning
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Experimental Evaluation
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Model Building III – Remedial Measures KNNL – Chapter 11.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Learning from observations
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Non-Bayes classifiers. Linear discriminants, neural networks.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CSC321: Lecture 7:Ways to prevent overfitting
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Classification and Regression Trees
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CSE 4705 Artificial Intelligence
CH. 2: Supervised Learning
Data Mining Lecture 11.
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining Practical Machine Learning Tools and Techniques
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Overfitting and Underfitting
Model generalization Brief summary of methods
Neural networks (1) Traditional multi-layer perceptrons
Presentation transcript:

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction of Gregory Piatetsky-Shapiro)

Overfitting Good performance on learning material, weak performance on new material. Linear regression VS. Artificial Neural Network. Decision tree with many leafs VS. decision tree with few leafs.

Classification: Linear Regression Linear Regression w 0 + w 1 x + w 2 y >= 0 Regression computes w i from data to minimize squared error to ‘fit’ the data Not flexible enough

Classification: Decision Trees X Y if X > 5 then blue else if Y > 3 then blue else if X > 2 then green else blue 52 3

Classification: Neural Nets Can select more complex regions Can be more accurate Also can overfit the data – find patterns in random noise

Overfitting and Noise Specially the combination of noise (errors) in the learning material and a mined model that attempts to fit all learning material can result in weak models (strong over fitting).

Reliability of a classification rule Based on many observations (covering) The classification of all the covered cases is correct 220/222 rule versus 2/2 rule Example of a simple quality measure for classification rules: OK/N+1 220/222+1 = VS 2/2+1=0.666

Performance of a mined model (always on test material) Classification tasks Classification error Classification matrix Weighted classification error Estimation tasks MSE Process Mining...

K-fold-CV (cross validation) I Within the ML community there is a relative simple experimental framework called k-fold cross validation. Starting with a ML-technique and a data set the framework is used to build, for instance, an optimal classification model (i.e. with the optimal parameter settings), to report about the performance of the ML-technique on this data set, to estimate the performance of the definitive learned model, and to compare the performance of the ML-technique with other learning techniques.

K-fold-CV (cross validation) II In the first step a series of experiments is performed to determine an optimal parameter setting for the current learning problem. The available data is divided into k subsets of roughly equal size. The ML-algorithm is trained k times. In training n, subset n is used as test material, the rest of the material is used as learning material. The performance of the ML-algorithm with a specific parameter setting is the average classification error over the k test sets. Based on the best average performance in Step 1, the optimal parameter setting is selected.

K-fold-CV (cross validation) III The goal of a second series of experiments is to estimate the expected classification performance of the ML-technique. The available data is again divided in k subsets and again the ML- algorithm is trained k times (in combination with the parameter setting as selected in Step 1). The average classification performance on the k test sets is used to estimate the expected classification performance of the ML- technique on the current data set and the T-test is used to calculate a confidence interval. If useful, a definitive model is build. All the available material is used in combination with the parameter setting as selected in Step 1. The performance results of Step 2 are used to predict the performance of the definitive model.