Rare Category Detection in Machine Learning Prafulla Dawadi Topics in Machine Learning.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Imbalanced data David Kauchak CS 451 – Fall 2013.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Mean Shift A Robust Approach to Feature Space Analysis Kalyan Sunkavalli 04/29/2008 ES251R.
(Rare) Category Detection Using Hierarchical Mean Shift Pavan Vatturi Weng-Keen Wong
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Classification.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Ensemble Learning (2), Tree and Forest
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Computer Vision James Hays, Brown
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Mean-shift and its application for object tracking
Mean Shift Theory and Applications Reporter: Zhongping Ji.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
EECS 274 Computer Vision Segmentation by Clustering II.
(Rare) Category Detection Using Hierarchical Mean Shift Pavan Vatturi Weng-Keen Wong
N. GagunashviliRAVEN Workshop Heidelberg Nikolai Gagunashvili (University of Akureyri, Iceland) Data mining methods in RAVEN network.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Real-Time Tracking with Mean Shift Presented by: Qiuhua Liu May 6, 2005.
Image Segmentation Shengnan Wang
Class Imbalance in Text Classification
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Mean Shift ; Theory and Applications Presented by: Reza Hemati دی 89 December گروه بینایی ماشین و پردازش تصویر Machine Vision and Image Processing.
Anomaly Detection.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Lecture 30: Segmentation CS4670 / 5670: Computer Vision Noah Snavely From Sandlot ScienceSandlot Science.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Data Mining and Text Mining. The Standard Data Mining process.
Course Introduction to Medical Imaging Segmentation 1 – Mean Shift and Graph-Cuts Guy Gilboa.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
School of Computer Science & Engineering
Introductory Seminar on Research: Fall 2017
Data Mining: Concepts and Techniques (3rd ed
A segmentation and tracking algorithm
Finding Clusters within a Class to Improve Classification Accuracy
Classification of class-imbalanced data
iSRD Spam Review Detection with Imbalanced Data Distributions
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Presentation transcript:

Rare Category Detection in Machine Learning Prafulla Dawadi Topics in Machine Learning

Outline Part I Examples Rare Class, Imbalanced Class, Outliers Part II (Rare)Category Detection Part III Kernel Density Estimation Mean Shift and Hierarchal Mean Shift Hierarchical Mean Shift for Category Detection Experimental Results Discussions 2

Examples 3 In, astronomical dataset, percentage of unusual galaxies are 0.001% of dataset Fraudulent credit card transactions are very few Network Intrusions, spam images, diagnosis of rare medical condition, oil spill in satellite images, etc contains rare classes.

Rare Class Number of Instance of one classes are abundantly large than other. Minority classes are INTERSTING [Vatturi & Wong, 2009], [Pelleg & Moore 2005] Challenges Noisy classes looks similar to rare class Classifier is overwhelmed with the majority class Number of instances of Fraudulent Transactions vs Normal Transactions 4

Rare Class and Separability 5

Rare Class vs. Imbalanced Class Classifier Rare class is extreme case of imbalanced classification problem [Han et al. 2009] Classifier for Imbalanced Class dataset focuses on overall accuracy of each class Metric : G-Mean, ROC curve Classifier for Rare Class dataset puts heavy emphasis on learning minority class. Metric : Precision, Recall, F-measure, for rare class learning [Han et al. 2009] 6

Rare Class vs. Outliers “ Most of the objects (99.9%) are well explained by current theories and …. remainder are anomalies, but 99% of these anomalies are uninteresting, and only 1% of them (0.001% of the full dataset) are useful … rest type of anomalies, called “boring anomalies”, are records which are strange for uninteresting reasons……The useful anomalies are extraordinary objects which are worthy of further research” [Pelleg & Moore 2005] Outliers are typically single point, separable from normal examples and are scattered over the space. [He & Carbonell 2008] Rare class assumes minority classes are compact in the feature space and may overlap with the majority class. Which one is a tougher problem : Imbalanced Class, Rare class, and Outliers. 7

Rare Class vs. Outliers 8 Rare ClassOutliers [He & Carbonell 2008]

Rare Class Learning Common Techniques : 1.Sampling Techniques Oversample, Under sample, SMOTE etc 2.Cost Sensitive Learning 3.Cost Sensitive Boosting Adacost, Cost sensitive Boosting, Smote Boost etc [Han et al] for good introduction of these techniques 9

Part II Category Detection 10

Category Detection Problem :Given a set of unlabelled examples, Where X i belongs to R and are from m distinct categories labeled y i = {1,2,..,m} Objective : Bring to the users attention at least a single instance from each category in few queries. [Vatturi & Wong, 2009] Challenge : Discover rare categories/class Stopping Criteria : Labeling cost or prior information 11

Category Detection 12 Category Detection Loop [Vatturi & Wong, 2009]

Category Detection and Active Learning Active Learning Aims in improving classifier performance with prior information of class and least label requests Category Detection Starting with no labeled examples, discover minority classes with least label requests [He & Carbonell 2008] 13

Why Category Detection Theoretical Importance “ Furthermore, rare category detection is a bottleneck in reducing the overall sampling complexity of active learning … Learning can not improve the label complexity of passive learning if different classes are not balanced in the data set… ” [Dasgupta 2005 ] [He 2010] Practical Importance Category detection can be used in many real applications. Domain expert can analyze trends of Fraudulent transactions 14

Assumptions Smoothness : underlying distribution of each majority classes are sufficiently smooth. Compactness : examples from the same minority class form a compact representation [He 2010] 15

Assumptions Synthetic Rare class has lower variance than the majority class [He 2010] 16

Issues How to detect rare categories in an unbalanced, unlabeled data set with the help of an oracle? How to detect rare categories with different data types, such as graph data, stream data, etc? How to do rare category detection with the least information about the data set? How to select relevant features for the rare categories? How to design effective classification algorithms which fully exploit the property of the minority classes (rare category classification)? [He 2010] [Vatturi & Wong, 2009] 17

Part III Category Detection Using Hierarchical Mean Shift Pavan Vatturi Weng-Keen Wong Oregon State University 18

Question Given arbitrary distribution of data, how would you determine which density it belongs to ? 19

Kernel Density Estimation 20 HistogramKernel Density Estimation

Density Gradient Estimation 21 The gradient density estimation is : is the mean shift. The mean shift vector always points toward the direction of the maximum increase in the density.

Mean Shift Algorithm 22

Mean Shift Algorithm 23 Mean Shift 1.Compute the mean shift vector, m h (x t ) 2.Translate the window by x t+1 = x t + m h (x t ) Mean Shift Clustering 1.Run the mean shift procedure to find the stationary points of the density function 2.Prune these points by retaining only local maxima The set of all locations that converge to the same mode defines the basin of attraction of that mode. The points which are in the same basin of attraction is associated with the same cluster. [ Cheng 1995]

Hierarchical Mean Shift 24 Bandwidth Maintain : 1.Total distance moved by mean shift 2.Previous cluster centers and original query data points

Methodology Data Standardization Building Cluster Hierarchy Query The user Tiebreaker Computational Consideration 25

Data Standardization Sphere the data 26

Cluster Hierarchy and Labeling Step 1 Step 2 Step 3 27 Bandwidth Cluster the data set with Hierarchical Mean shift with increasing Bandwidth Present the clustering with high validity criteria for labeling At each height for each cluster Ci Maintain the Cluster Validity List

Query the User : Active Learning Evaluate cluster using cluster goodness criteria Outlierness : How long can cluster survive? Compact-Isolation 28 P i = cluster centers

Algorithm 29

Methodology Tiebreaker Can happen for low bandwidth value when it is scanning for high compact reason HAD : Highest Average Distance Computational Consideration Is expensive as distance with all other points needs to be calculated – Use KD -tree 30

Experimental Results Dataset : Abalone, Shuttle, Optical Digits, Optical Letters, Statlog and Yeast 31

Experimental Results Dataset : Abalone, Shuttle, Optical Digits, Optical Letters, Statlog and Yeast 32

Strength Uses non-parametric mean shift clustering technique hence does not require prior knowledge regarding the properties of the data set. Reduces the number of queries to the user needed to discover all the categories in data 33

Weakness Reference vs Query dataset Stopping criteria Subsampled dataset Determining increasing bandwidth size Scalability High dimension and Kernel Density Estimation Supervised Approach 34

Discussion Comparison with Conventional Clustering Algorithm – Kmeans etc. Application and Use of Category Detection 35

References [Han et al ] Rare Class Mining: Progress and Prospect [Pelleg & Moore 2004] Dan Pelleg and Andrew Moore. Active learning for anomaly and rare- category detection. In Advances in Neural Information Processing Systems 18, December 2004 [He & Carbonell 2008] Jingrui He and Jaime Carbonell. Nearest-neighbor-based active learning for rare category detection. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 633–640. MIT Press, Cambridge, MA, 2008 [Vatturi & Wong, 2009] Vatturi, P. & Wong, W.-K. (2009). Category detection using hierarchical meanshift. in KDD [Cheng 1995] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intelligence 17(8):790–799, 1995 [Comaniciu & Meer 2002] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell., 24:603–619, 2002 [He 2010 ] J. He, Rare Category Analysis, Phd Thesis, CMU 36

37