Data Mining CSCI 307, Spring 2019 Lecture 11

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang.

DECISION TREES. Decision trees  One possible representation for hypotheses.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

Data Mining Classification: Alternative Techniques

Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)

Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)

1 Input and Output Thanks: I. Witten and E. Frank.

Classification and Decision Boundaries

Spatial Mining.

Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.

Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.

Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.

Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Module 04: Algorithms Topic 07: Instance-Based Learning

Data Mining – Output: Knowledge Representation

DATA MINING CLUSTERING K-Means.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.

Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.

Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.

Classification Algorithms Covering, Nearest-Neighbour.

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Output: Knowledge Representation WFH: Data Mining,

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank.

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

General-Purpose Learning Machine

k-Nearest neighbors and decision tree

Ananya Das Christman CS311 Fall 2016

Data Science Algorithms: The Basic Methods

Data Mining – Algorithms: Instance-Based Learning

Classification Nearest Neighbor

Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:

Mean Shift Segmentation

Instance Based Learning (Adapted from various sources)

Figure 1.1 Rules for the contact lens data.

K Nearest Neighbor Classification

Classification Nearest Neighbor

Nearest-Neighbor Classifiers

Classification Algorithms

COSC 4335: Other Classification Techniques

Machine Learning in Practice Lecture 23

CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.

Nearest Neighbors CSC 576: Data Mining.

Data Mining Classification: Alternative Techniques

Nearest Neighbor Classifiers

Text Categorization Berlin Chen 2003 Reference:

CSE4334/5334 Data Mining Lecture 7: Classification (4)

Data Mining CSCI 307, Spring 2019 Lecture 21

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

A task of induction to find patterns

Machine Learning in Practice Lecture 20

Data Mining CSCI 307, Spring 2019 Lecture 23

Data Mining CSCI 307, Spring 2019 Lecture 6

Presentation transcript:

Data Mining CSCI 307, Spring 2019 Lecture 11 Output: Rules

Instance-based Representation Simplest form of learning: rote learning Training instances are searched for instance that most closely resembles new instance The instances themselves represent the knowledge Also called instance-based learning Difference from rules (trees, etc.): Just store the instances; defer the work ("lazy" learning) No need to create and store rules. work from existing instances to find closest to the new one.

Instance-based Representation Similarity function defines what’s “learned” Methods: nearest-neighbor (use a distance function to find the instance that is most like the one needing to be classified) k-nearest-neighbor (use more than one neighbor; classify using the majority from the k-neighbors) Criticism of the method: No structure is "learned" – the instances don't describe the patterns. Proponents say: instances combined with the distance function is the structure.

The Distance Function Simplest case: only one numeric attribute Distance is the difference between the two attribute values involved (or a function thereof) Several numeric attributes: normally, Euclidean distance is used and attributes are normalized Nominal attributes: distance is set to 1 if values are different, 0 if they are equal Are all attributes equally important? Weighting the attributes might be necessary

Learning Prototypes Two classes: solid circles and open circles Different ways of partitioning the instance space, i.e. maybe save only critical examples of each class Nearest neighbor split Discard the grey ones Only those instances involved in a decision need to be stored (don't want to store all the instances) Noisy instances should be filtered out Idea: only use prototypical examples

Go Farther: Generalize with Rectangles Rectangles enclose same class Nesting allows an inner region to have a different class If fall in same rectangle, then same class but different decision boundary than nearest neighbor on previous slide. nearest-neighbor rule is used outside rectangles Rectangles are rules! (But they can be more conservative than “normal” rules.) Nested rectangles are rules with exceptions Note: Nominal regions are hard to visualize, need multi-dimensions

Cluster Representation I When a cluster is learned, output takes the form of a diagram The simplest case: assign a cluster number to each instance, lay out instances, and partition. Simple 2-D Representation

Representing Clusters II Some clustering algorithms allow clusters to overlap. Venn Diagram Overlapping Clusters

Representing Clusters III Probabilistic Assignment Some algorithms use probabilities. For each instance there is a degree of membership to the clusters 1, 2, or 3. 1 2 3 a 0.4 0.1 0.5 b 0.1 0.8 0.1 c 0.3 0.3 0.4 d 0.1 0.1 0.8 e 0.4 0.2 0.4 f 0.1 0.4 0.5 g 0.7 0.2 0.1 h 0.5 0.4 0.1 …

Representing Clusters IV Dendrogram Here, the clusters at the "leaves" of the diagram are more tightly clustered than at the higher levels. NB: Dendron is the Greek word for tree