Random feature for sparse signal classification

Slides:



Advertisements
Similar presentations
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Aggregating local image descriptors into compact codes
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image acquisition using sparse (pseudo)-random matrices Piotr Indyk MIT.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Principal Component Analysis
Lecture Notes for CMPUT 466/551 Nilanjan Ray
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Support Vector Machines
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
CS Instance Based Learning1 Instance Based Learning.
Random Projections of Signal Manifolds Michael Wakin and Richard Baraniuk Random Projections for Manifold Learning Chinmay Hegde, Michael Wakin and Richard.
An Introduction to Support Vector Machines Martin Law.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.
2D-LDA: A statistical linear discriminant analysis for image matrix
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Dense-Region Based Compact Data Cube
Support vector machines
Support Vector Machine
Compressive Coded Aperture Video Reconstruction
Efficient Image Classification on Vertically Decomposed Data
Support Vector Machines
Machine Learning Dimensionality Reduction
Machine Learning Feature Creation and Selection
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pawan Lingras and Cory Butz
Students: Meiling He Advisor: Prof. Brain Armstrong
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Efficient Image Classification on Vertically Decomposed Data
Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy.
Learning with information of features
CS 2750: Machine Learning Support Vector Machines
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Recitation 6: Kernel SVM
Support vector machines
Support Vector Machines
Support vector machines
Support vector machines
University of Wisconsin - Madison
SVMs for Document Ranking
Presentation transcript:

Random feature for sparse signal classification Jen-Hao Rick Chang Aswin C. Sankaranarayanan B. V. K. Vijaya Kumar Hi, I am Jen-Hao Rick Chang from Carnegie Mellon University. Our work is called random feature for sparse signal classification. We provide a tighter bound for random feature method on sparse signals and propose compressive random feature that exploits the spareness of input signals to make kernel method scalable.

kernel matrix computation Kernel method Kernel method does not scale well N training samples kernel SVM Kernel method does not scale well in term of the size of the dataset. It uses a so called kernel trick to avoid constructing infinitely high dimensional lifted data. However, kernel trick induces high storage and computation costs during both training and test phases. For example, if you have one million training sample, you need at least 1 terabyte of memory to store the kernel matrix and you need to go through almost all the one million samples to evaluate a single test point. <old> It handles linearly inseparable datasets by lifting them into possibly- infinitely high dimensional spaces where the data may be separable. Even though we can utilize kernel functions to avoid constructing such high dimensional lifted data, it requires the construction of a large kernel matrix whose size quadratically depends on the number of training samples. Besides, during test phase, in order to evaluate a single test sample using kernel trick, we need to access a large portion of the training samples. Therefore, while kernel method provides benefits like convexity and understanding of the problems, it does not scale well with the size of the data set. For example, if you have one million training sample, you need at least 1 terabyte of memory to store the kernel matrix and you need to go through almost all the one million training samples to evaluate a single test point. data storage kernel matrix computation kernel matrix storage testing storage

kernel matrix computation Kernel method Kernel method does not scale well N training samples kernel SVM Our work exploits a method called random feature and the sparsity of the input signals to greatly reduce the storage, computation, and acquisition costs. data storage kernel matrix computation kernel matrix storage testing storage sparse signals: This work

random feature computation random feature storage Make kernel method scale gracefully N training samples linear SVM Random feature is first developed by Rahimi and Recht to make kernel method scale gracefully with the size of the dataset. By constructing a M-dimensional features from the data, whose inner products approximate the original kernel function, it alleviates the quadratic dependency on the size of the dataset during training phase and makes the test phase independent to the training dataset. Their result shows that M needs to be proportional to the original dimensionality of the signals to achieve good approximation. data storage random feature computation random feature storage testing storage Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. NIPS. 2007.

Our contributions provide an enhanced bound for sparse signals like images and videos: for k-sparse, d-dimensional signals propose compressive random feature —> exploits signal sparsity —> improves data storage, computation, and acquisition costs Our work has two contributions. First, we analyze random feature’s performance on sparse signals. For sparse signals like images and videos, we provides a tighter bound of the dimension of random feature. Specifically, for k-sparse, d-dimensional signals, M only needs to be proportional to k log(d/k) to achieve good kernel approximation. When the sparsity of the signal is high, our result greatly tightens the bound. Second, we propose a new scheme for random feature, compressive random feature. Compressive random feature exploits the sparsity of the signals to further reduce storage, computation, and data acquisition costs.

Compressive random feature N training samples k-sparse compress random feature proposed method data storage feature computation feature storage testing storage random feature kernel method The proposed compressive random feature is an effective combination of compressive sensing and random feature. Specifically, for signals that are k-sparse canonically or after transformation, we first perform random projection to reduce the dimensionality of the signals. Then we perform the typical random feature on the compressed signals. We prove that the compressive random feature has similar kernel approximation ability even with the additional dimensionality reduction. With the dimensionality reduction, we effectively reduce data storage, and computation. And with compressive sensing technique, we also reduce data acquisition cost. for sparse signals (ex: images) :

Classification result on MNIST Since the proposed compressive random feature has lower computational costs and at the same time retains the ability to approximate kernel function, on MNIST dataset, compared to the original random feature, it achieves similar classification accuracy in shorter time. Similar results can also be seen across many datasets, including CIFAR-10 and street view house number dataset. We welcome you to more details at our poster spot.