1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Slides:

Advertisements

Similar presentations

Automatic Image Annotation Using Group Sparsity

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

1 Challenge the future HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences Omar Oreifej Zicheng Liu CVPR 2013.

Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.

Machine learning continued Image source:

Lecture 31: Modern object recognition

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Steerable Part Models Hamed Pirsiavash and Deva Ramanan

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Face Recognition and Biometric Systems

Robust Object Tracking via Sparsity-based Collaborative Model

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Discriminative and generative methods for bags of features

Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Chapter 5: Linear Discriminant Functions

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Generic object detection with deformable part-based models

Hamed Pirsiavash, Deva Ramanan, Charless Fowlkes

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Bag of Video-Words Video Representation

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Face Recognition: An Introduction

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Object detection, deep learning, and R-CNNs

Histograms of Oriented Gradients for Human Detection(HOG)

Methods for classification and image representation

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

More sliding window detection: Discriminative part-based models

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Cascade for Fast Detection

Object detection with deformable part-based models

Learning Mid-Level Features For Recognition

Performance of Computer Vision

Lit part of blue dress and shadowed part of white dress are the same color

CS 188: Artificial Intelligence

Object detection as supervised classification

Group Norm for Learning Latent Structural SVMs

Computer Vision James Hays

Bilinear Classifiers for Visual Recognition

Two-Stream Convolutional Networks for Action Recognition in Videos

RCNN, Fast-RCNN, Faster-RCNN

Presentation transcript:

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva Ramanan Charless Fowlkes

2 Introduction n y n f n x Reshape (:) FeaturesFeature vector A window on input image x = n y n x n f £ 1 Linear model for visual recognition 1

3 Introduction Linear classifier Learn a template Apply it to all possible windows Template (Model) Feature vector w T x > T > 0

4 Introduction n y n f n x Features Reshape n f X = ::: n y n x £ n f n f n f n s : = n x n y ::: n s £ n f = ::: n s £ d £ h...::: i d £ n f W W s W T f W = W s W T f d · m i n ( n s ; n f ) d

5 Introduction Motivation for bilinear models Reduced rank: less number of parameters Better generalization: reduced over-fitting Run-time efficiency Transfer learning Share a subset of parameters between different but related tasks

6 Outline Introduction Sliding window classifiers Bilinear model and its motivation Extension Related work Experiments Pedestrian detection Human action classification Conclusion

7 Sliding window classifiers Extract some visual features from a spatio-temporal window e.g., histogram of gradients (HOG) in Dalal and Triggs’ method Train a linear SVM using annotated positive and negative instances Detection: evaluate the model on all possible windows in space-scale domain Use convolution since the model is linear w T x > 0 m i n w L ( w ) = 1 2 w T w + C P n max ( 0 ; 1 ¡ y n w T x n )

8 Sliding window classifiers Sample image Features Sample template W

9 Bilinear model (Definition) Visual data are better modeled as matrices/tensors rather than vectors Why not use the matrix structure

10 Bilinear model (Definition) n y n f n x Features Reshape n f n f T r ( ::: T ::: ) = T n f T r ( W T X ) = w T x F ea t ure X M o d e l W T r ( W T X ) > 0 w T x > 0 X = ::: n s £ n f n s : = n x n y w x

11 Bilinear model (Definition) Bilinear model w T x > 0 W n f d n f W s W T f W = W s W T f f ( X ) = T r ( W f W T s X ) d · m i n ( n s ; n f ) ::: n s £ n f = ::: n s £ d £ h...::: i d £ n f f ( x ) > 0

12 Bilinear model (Definition) Bilinear model w T x > 0 W n f d n f W s W T f W = W s W T f f ( X ) = T r ( W f W T s X ) d · m i n ( n s ; n f ) ::: n s £ n f = ::: n s £ d £ h...::: i d £ n f B i l i near i n W f an d W s f ( x ) > 0 f ( X ) = T r ( W T s XW f )

13 Bilinear model (Definition) Bilinear model w T x > 0 W n f d n f W s W T f W = W s W T f f ( X ) = T r ( W f W T s X ) d · m i n ( n s ; n f ) ::: n s £ n f = ::: n s £ d £ h...::: i d £ n f B i l i near i n W f an d W s f ( x ) > 0 f ( X ) = T r ( W T s XW f ) f ( X ) = T r ( W T X ) W as L i near:

14 Bilinear model (Learning) Linear SVM for a given set of training pairs f X n ; y n g RegularizerConstraints Parameter Objective function m i n W L ( W ) = 1 2 T r ( W T W ) + C P n max ( 0 ; 1 ¡ y n T r ( W T X n ))

15 Bilinear model (Learning) Linear SVM for a given set of training pairs f X n ; y n g RegularizerConstraints Parameter Objective function W : = W s W T f m i n W L ( W ) = 1 2 T r ( W T W ) + C P n max ( 0 ; 1 ¡ y n T r ( W T X n ))

16 Bilinear model (Learning) For Bilinear formulation Biconvex so solve by coordinate decent By fixing one set of parameters, it’s a typical SVM problem ( with a change of basis ) Use off-the-shelf SVM solver in the loop m i n L ( W s ; W f ) = 1 2 T r ( W f W T s W s W T f ) + C P n max ( 0 ; 1 ¡ y n T r ( W f W T s X n ))

17 Motivation Regularization Similar to PCA, but not orthogonal and learned discriminatively and jointly with the template Run-time efficiency convolutions instead of d n f d · m i n ( n s ; n f ) SubspaceReduced dimensional Template W = W s W T f

18 Motivation Transfer learning Share the subspace between different problems e.g human detector and cat detector Optimize the summation of all objective functions Learn a good subspace using all data W f W = W s W T f

19 Extension Multi-linear High-order tensors instead of just For 1D feature Separable filter for (Rank=1) Spatio-temporal templates L ( W s ; W f ) L ( W x ; W y ; W t ; W f ) L ( W x ; W y ) L ( W x ; W y ; W f )

20 Related work (Rank restriction) Bilinear models Often used in increasing the flexibility; however, we use them to reduce the parameters. Mostly used in generative models like density estimation and we use in classification Soft Rank restriction They used rather than in SVM to regularize on rank Convex, but not easy to solve Decrease summation of eigen values instead of the number of non-zero eigen values (rank) Wolf et al (CVPR’07) Used a formulation similar to ours with hard rank restriction Showed results only for soft rank restriction Used it only for one task (Didn’t consider multi-task learning) T r ( W ) T r ( W T W )

21 Related work (Transfer learning) Dates back to at least Caruana’s work (1997) We got inspired by their work on multi-task learning Worked on: Back-propagation nets and k-nearest neighbor Ando and Zhang’s work (2005) Linear model All models share a component in low-dimensional subspace (transfer) Use the same number of parameters

22 Experiments: Pedestrian detection Baseline: Dalal and Triggs’ spatio- temporal classifier (ECCV’06) Linear SVM on features: (84 for each cell) Histogram of gradient (HOG) Histogram of optical flow Made sure that the spatiotemporal is better than the static one by modifying the learning method 8 £ 8

23 Experiments: Pedestrian detection Dataset: INRIA motion and INRIA static 3400 video frame pairs 3500 static images Typical values: Evaluation Average precision Initialize with PCA in feature space Ours is 10 times faster n s = 14 £ 6, n f = 84, d = 5 or 10

24 Experiments: Pedestrian detection

25 Experiments: Pedestrian detection

26 Experiments: Pedestrian detection Link

27 Experiments: Human action classification 1 1 vs all action templates Voting: A second SVM on confidence values Dataset: UCF Sports Action (CVPR 2008) They obtained 69.2% We got 64.8% but More classes: 12 classes rather than 9 Smaller dataset: 150 videos rather than 200 Harder evaluation protocol: 2-fold vs. LOOCV 87 training examples rather than 199 in their case

28 UCF action Results: PCA (0.444)

29 UCF action Results: Linear (0.518)

30 UCF action Results: Bilinear (0.648)

31 UCF action Results PCA on features (0.444) Linear (0.518) (Not always feasible) Bilinear (0.648)

32 UCF action Results

33 Experiments: Human action classification 2 Transfer We used only two examples for each of 12 action classes Once trained independently Then trained jointly Shared the subspace Adjusted the C parameter for best result

34 Transfer PCA to initialize Train W 1 s W 2 s W m s W 1 f W 2 f W m f W = W s W T f

35 Transfer PCA to initialize Train W 1 s W 2 s W m s W 1 f W 2 f W m f W f W = W s W T f m i n W f P m i = 1 L ( W f ; W i s )

36 Results: Transfer Coordinate decent iteration 1 Coordinate decent iteration 2 Independent bilinear (C=.01) Joint bilinear (C=.1) Average classification rate

37 Results: Transfer (for “walking”) Iteration 1Refined at Iteration 2

38 Conclusion Introduced multi-linear classifiers Exploit natural matrix/tensor representation of spatio- temporal data Trained with existing efficient linear solvers Shared subspace for different problems A novel form of transfer learning Got better performance and about 10X speed up in run-time compared to the linear classifier. Easy to apply to most high dimensional features (instead of dimensionality reduction methods like PCA) Simple: ~ 20 lines of Matlab code

39 Thanks!

40 Bilinear model (Learning details) Linear SVM for a given set of training pairs For Bilinear formulation It is biconvex so solve by coordinate decent f x n ; y n g m i n W L ( W ) = 1 2 T r ( W T W ) + C P n max ( 0 ; 1 ¡ y n T r ( W T X n )) m i n L ( W f ; W s ) = 1 2 T r ( W f W T s W s W T f ) + C P n max ( 0 ; 1 ¡ y n T r ( W f W T s X n ))

41 Bilinear model (Learning details) Each coordinate descent iteration: freeze where then freeze where m i n ~ W f L ( ~ W f ; W s ) = 1 2 ( ~ W T f ~ W f ) + C P n max ( 0 ; 1 ¡ y n T r ( ~ W T f ~ X n )) m i n ~ W s L ( W f ; ~ W s ) = 1 2 ( ~ W T s ~ W s ) + C P n max ( 0 ; 1 ¡ y n T r ( ~ W T s ~ X n )) W s W f ~ W f = A 1 2 s W T f, ~ X n = A ¡ 1 2 s W T s X n, A s = W T s W s ~ W s = W s A 1 2 f, ~ X n = X n W f A ¡ 1 2 f, A f = W T f W f