1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014.
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Computational Geometry and Spatial Data Mining
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
(Fri) Young Ki Baik Computer Vision Lab.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Online Learning Algorithms
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
Face Detection using the Viola-Jones Method
Genetic Algorithm.
Issues with Data Mining
From Time-Changing Data Streams Blaž Sovdat August 27, 2014.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Dense-Region Based Compact Data Cube
University of Waikato, New Zealand
Data Transformation: Normalization
Efficient Image Classification on Vertically Decomposed Data
Mining Time-Changing Data Streams
Supervised Time Series Pattern Discovery through Local Importance
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Nearest-Neighbor Classifiers
A Fast and Scalable Nearest Neighbor Based Classification
Introduction to Data Mining, 2nd Edition
COSC 4335: Other Classification Techniques
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Learning from Data Streams
A task of induction to find patterns
Presentation transcript:

1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto, 2005

2 Outline Related Work ANNCAD Properties of ANNCAD Conclusion

3 Classifying Data Streams Problem Statement: We seek an algorithm for classifying data streams with numerical attributes--- will work for totally ordered domains too. Desiderata:  Fast update speed for newly arriving records.  Only require single pass of data.  Incremental algorithms are needed.  Coping with concept changes. Classical mining algorithms were not designed for data streams and need to replaced or modified.

4 Classifying Data Streams: Related Work Hoeffding trees:  VFDT and CVFDT: build decision tree incrementally.  Require a large amount of examples to obtain a fair performance classifier.  Unsatisfied performance when training set is small. Ensemble:  Combine base models by voting technique.  Suitable for coping with concept drift.  Fail to provide a simple model and understanding of the problem.

5 State of the Art: NearestNeighborhood Classifiers Pros and cons:  +: Strong intuitive appeal and simple to implement  -: Fail to provide simple models/rules  -: Expensive Computations ANN: Approximate Nearest Neighborhood with error guarantee 1+ε:  Idea: pre-processing the data by devising a data structure (e.g. ring-cover tree) to speed up the searchings.  Designed for stored data only.  Time for update the pre-processing step depends on size of data set which may be infinite.

6 Our Algorithm: ANNCAD Adaptive NN Classification Algorithm for Data Streams Model building:  Pre-assign classes to obtain an approximate result and provide simple models/rules.  Decompose the feature space to make classification decisions.  Akin to wavelets. Classification:  Find NN for classification adaptively.  progressively expand the searching of nearby area of a test point (star).

7 Quantize Feature Space and Compute Multi-resolution Coefficients Quantize Feature Space and record information into data arrays ( )/4 Multi-resolution representation of a two-class data set. A set of 100 two- class training points

8 Building a Classifier Hierarchical structure of ANNCAD Classifier B=6.75; R=0.6  Blue B=2; R=4.25  M(ix) Label each block with its majority class Label block only if |C 1st |-|C 2nd | > 80% B=3; R=3.25  M(ix)

9 Decision Algorithm on the ANNCAD Hierarchy Unclassified block, go to next level. Block with tag “M”, go back to prev. level. Compute the distance between the test point and the center of every nonempty neighboring block. The combined classifier over multiple levels Classified block  Label class I Classified block  Label class II

10 Incremental Update New training point

11 Concept Drift: Adaptation by Exponential Forgetting Data Array , Factor 0  1:  new   old No effect if no concept changes Adapt quickly (exponentially) if concept changes No extra memory needed (sliding window required.)

12 Grid Position and Resolution Problem: Neighborhood decision strongly depends on grid position Solution: Build several classifiers by shifting grid position by 1/n. Then combine the results by voting. Thm. x: test point, n d classifiers, b(x): Blocks containing x, then:  z   b(x),  y   b(x): dist(x,y)<(1+ 1 / n-1 )*dist(x,z).  In practice, only 2-3 classifiers can achieve a good result. Example: 4 different grids for building 4 classifiers.

13 Properties of ANNCAD Compact support: locality property allows fast update Dealing with noise: can set a threshold for classification decision Multi-resolution: to control the fineness of the result, or optimize the system resources. Low complexity ( g d = total number of cells)  Building classifier: O(min(N,g d ))  Testing: O(log 2 (g)+2 d ).  Updating: log 2 (g)+1.

14 Experiments Synthetic Data  3-d unit cube:  Class distribution: class 0 inside sphere with radius 0.5 class 1 outside  3000 training examples  1000 test examples Exact ANN:  Expand the searching area by double the radius until reaching some training point.  Classify the test point with the majority class. (a) different initial resolutions. (b) different # ensembles.

15 Experiments (Cont’) Real Data 1 -- Letter Recognition  Objective: identify a pixel displays as one of the 26 letter.  16 numerical attributes to describe its pixel displays.  15,000 training examples  5,000 test examples  Add 5 % noise by randomly assign class.  Grid size: 16 units  #Classifiers: 2

16 ANNCAD Vs VFDT Real Data 2 – Forest Cover Type  Objective: predict forest cover type.  10 numerical attributes.  12,000 training examples  9,000 test examples  Grid size: 32 unit  #Classifiers: 2

17 Concept Shift: ANNCAD vs CVFDT Real Data 3 – Adult  Objective: determine a person with salary>50K  Concept Shift Simulation: Group by races  = 0.98  Grid Size: 64  #Classifier: 2

18 Conclusion and Future Work ANNCAD  an incremental classification algorithm to find adaptive NN  Suitable for mining data streams: fast update speed  Exponential forgetting for concept shift/drift. Future Work: Detect concept shift/drift by changes in class label of blocks.

19 THANK YOU!