Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

SAX: a Novel Symbolic Representation of Time Series
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fast Algorithms For Hierarchical Range Histogram Constructions
Pattern Recognition and Machine Learning
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Machine learning continued Image source:
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Lecture 3 Nonparametric density estimation and classification
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Mutual Information for Image Registration and Feature Selection
Machine Learning CMPT 726 Simon Fraser University
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Dimensionality Reduction
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Image Classification 영상분류
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
3D Motion Data Mining Multimedia Project Multimedia and Network Lab, Department of Computer Science.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Dense-Region Based Compact Data Cube
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Transformation: Normalization
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Ch8: Nonparametric Methods
Classification of unlabeled data:
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Clustering Evaluation The EM Algorithm
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Feature space tansformation methods
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Transformations targeted at minimizing experimental variance
Nonparametric density estimation and classification
Topological Signatures For Fast Mobility Analysis
EM Algorithm and its Applications
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining Anomaly Detection
Presentation transcript:

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei Li (Carnegie Mellon University), Christos Faloutsos (Carnegie Mellon University)

2 Classification for Distribution Data Sets Given n distributions ( n multi-dimensional vector sets) With a portion of them labeled and others unlabeled Classify unlabeled distributions into the right group Ex. Distr. #1 and Distr. #2 fall into the same group Distribution #1 (unknown) Distribution #2 (walking) Distribution #3 (jumping)

3 Scenario 1 Marketing research for e-commerce Vectors: orders by each customer Time the customer spent browsing Number of pages the customer browsed Number of items the customer bought Sales price Number of visits by each customer Distributions: customers Classification: identify customer groups who carry similar traits Find distribution groups to do market segmentation, rule discovery and spot anomalies E.g., “Design an advertisement for each customer categories”

4 Scenario 2 User analysis for SNS systems (e.g., blog hosting service) Vectors: internet habits by each participant Number of blog entries for every topic Length of entries for every topic Number of links of entries for every topic Number of hours spent online Distributions: SNS participants Classification: identify participant groups who have similar internet habits Find distribution groups to facilitate community creation E.g., “Create communities according to users’ interests”

5 Representing Distributions Histograms Easy to be updated incrementally Used in this work Another option: probability density function

6 Background Kullback-Leibler divergence Measures the natural distance difference from one probability distribution P to another arbitrary probability distribution Q. One undesirable property: Symmetric KL-divergence

7 Proposed Solution Naïve approach Create histogram for each distribution of data Compute the KL divergence directly from histograms p i and q i Use any data mining method E.g., classification, clustering, outlier detection Distribution data Histograms Group 1 Groups Group 2 Group 3

8 Proposed Solution DualWavelet (wavelet-based approach) Create histogram for each distribution of data Represent each histogram p i as and using wavelets : the wavelet of p i : the wavelet of log (p i ) Reduce number of wavelets by selecting c coefficients with the highest energy (c << m) Compute the KL divergence from the wavelets Use any data mining method E.g., classification, clustering, outlier detection Group 1 Groups Group 2 Group 3 Distribution data Histograms Wavelet highest c coefficients

9 DualWavelet Theorem 1 Let and be the wavelet of p i and q i resp. and be the wavelet of log(p i ) and log(q i ) resp. We have m: # of bins of a histogram c: # of wavelet coefficients KL divergence can be computed from wavelets

Time Complexity Naïve method for the nearest neighbor classification O(mnt) time n: # of input distributions, m: # of grid cells t : # of distributions in the training data set DualWavelet Wavelet transform: O(mn) Classification: O(nt) Since c (# of wavelet coefficients we use) is a small constant value 10

Space Complexity Naïve method for the nearest neighbor classification O(mt) space m: # of grid cells t : # of distributions in the training data set DualWavelet Wavelet transform: O(m) Classification: O(t) Since c (# of wavelet coefficients we use) is a small constant value 11

GEM: Optimal grid-side selection Optimal granularity of histogram Optimal number of segments provides good accuracy Plus reasonable computation cost Proposed normalized KL divergence (GEM criterion) Choose that maximizes the pairwise criteria Obtain for every sampled pair, then choose the maximum

Experiments Gaussians n=4,000 distributions, each with 10,000 points (dimension d=3) Mixture of Gaussians (1, 2, 2 d, (2 d +1)) Same means, but different variances for each class MoCap n=58 real running, jumping and walking motions (d=93) Each dimension corresponds to the x, y, or z position of a body joint Dimensionality reduction by using SVD (d=2) 13

14 Classification (MoCap) Confusion matrix for classification Jumping Walking Running JWR Jumping300 Walking0221 Running0119 recovered correct

15 Computation Cost (Gaussians) NaïveDense, which uses all histogram buckets NaïveSparse, which uses only selected buckets (largest values) DualWavelet achieves a dramatic reduction in computation time

16 Approximation Quality Scatter plot of computation cost vs. approximation quality Trade-off between quality and cost DualWavelet gives significantly lower approximation error, for the same computation time

17 Conclusions Addressed the problem of distribution classification, in general, distribution mining Proposed a fast and effective method to solve it Proposed to use wavelets on both the histograms, as well as their logarithms Solution can be applied to large datasets with multi- dimensional distributions Experiments show that DualWavelet is significantly faster than the naïve implementation (up to 400 times)