Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.

Slides:



Advertisements
Similar presentations
INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Instance Based Learning
Lazy vs. Eager Learning Lazy vs. eager learning
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification and Decision Boundaries
K nearest neighbor and Rocchio algorithm
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
CS Instance Based Learning1 Instance Based Learning.
DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.
Module 04: Algorithms Topic 07: Instance-Based Learning
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Chapter 6: Implementations. Why are simple methods not good enough? Robustness: Numeric attributes, missing values, and noisy data.
Appendix: The WEKA Data Mining Software
Chapter 9 Neural Network.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CSC 196k Semester Project: Instance Based Learning
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Chapter 6: Implementations. Why are simple methods not good enough? Robustness: Numeric attributes, missing values, and noisy data.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.9: Semi-Supervised Learning Rodney Nielsen Many.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Output: Knowledge Representation WFH: Data Mining,
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Practical Machine Learning Tools and Techniques
Chapter 6: Implementations
Data Science Algorithms: The Basic Methods
Data Mining – Algorithms: Instance-Based Learning
Prepared by: Mahmoud Rafeek Al-Farra
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Data Mining Practical Machine Learning Tools and Techniques
COSC 4335: Other Classification Techniques
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Machine Learning in Practice Lecture 20
Data Mining CSCI 307, Spring 2019 Lecture 11
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Implementation: Real Machine Learning Schemes Decision trees From ID3 to C4.5 (pruning, numeric attributes,...) Classification rules From PRISM to RIPPER and PART (pruning, numeric data, …) Association Rules Frequent-pattern trees Extending linear models Support vector machines and neural networks Instance-based learning Pruning examples, generalized exemplars, distance functions

Rodney Nielsen, Human Intelligence & Language Technologies Lab Implementation: Real Machine Learning Schemes Numeric prediction Regression/model trees, locally weighted regression Bayesian networks Learning and prediction, fast data structures for learning Clustering: hierarchical, incremental, probabilistic Hierarchical, incremental, probabilistic, Bayesian Semisupervised learning Clustering for classification, co-training

Rodney Nielsen, Human Intelligence & Language Technologies Lab Instance-based Learning Practical problems of 1-NN scheme: Slow (but: fast tree-based approaches exist) Remedy: remove irrelevant data Noise (but: k -NN copes quite well with noise) Remedy: remove noisy instances All attributes deemed equally important Remedy: weight attributes (or simply select) Doesn’t perform explicit generalization Remedy: rule-based NN approach

Rodney Nielsen, Human Intelligence & Language Technologies Lab Filtering Noise Only those instances involved in a decision need to be stored Noisy instances should be filtered out But how do you detect noise?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Speed Up, Combat Noise IB2: save memory, speed up classification Work incrementally Only incorporate misclassified instances Problem: noisy data gets incorporated IB3: deal with noise Discard instances that don’t perform well Compute confidence intervals for 1. Each instance’s success rate 2. Default accuracy of its class Accept/reject instances Accept if lower limit of #1 exceeds upper limit of #2 Reject if upper limit of #1 is below lower limit of #2

Rodney Nielsen, Human Intelligence & Language Technologies Lab Weight Attributes IB4: weights each attribute (weights can be class-specific) Weighted Euclidean distance: Update weights based on nearest neighbor Class correct: increase weight Class incorrect: decrease weight Amount of change for jth attribute depends on |xj(i) - xj(l)|

Rodney Nielsen, Human Intelligence & Language Technologies Lab Rectangular Generalizations Nearest-neighbor rule is used outside rectangles Rectangles are rules! (But they can be more conservative than “normal” rules.) Nested rectangles are rules with exceptions

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generalized Exemplars Generalize instances into hyper-rectangles Online: incrementally modify rectangles Offline version: seek small set of rectangles that cover the instances Important design decisions: Allow overlapping rectangles? Requires conflict resolution Allow nested rectangles? Dealing with uncovered instances?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Separating Generalized Exemplars Separation line

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generalized Distance Functions Given: some transformation operations on attributes K*: similarity = probability of transforming instance A into B by chance Average over all transformation paths Weight paths according their probability (need way of measuring this) Uniform way of dealing with different attribute types Easily generalized to give distance between sets of instances