Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Self-Paced Learning for Semantic Segmentation

Learning Specific-Class Segmentation from Diverse Data M. Pawan Kumar, Haitherm Turki, Dan Preston and Daphne Koller at ICCV 2011 VGG reading group, 29.

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

Curriculum Learning for Latent Structural SVM

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Efficient Large-Scale Structured Learning

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura.

Max-Margin Latent Variable Models M. Pawan Kumar.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.

CMPUT 466/551 Principal Source: CMU

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Learning Structural SVMs with Latent Variables Xionghao Liu.

Discrete Optimization for Vision and Learning. Who? How? M. Pawan Kumar Associate Professor Ecole Centrale Paris Nikos Komodakis Associate Professor Ecole.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Large-Scale Object Recognition with Weak Supervision

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold.

Data mining and statistical learning - lecture 13 Separating hyperplane.

Learning to Segment from Diverse Data M. Pawan Kumar Daphne KollerHaithem TurkiDan Preston.

Relaxations and Moves for MAP Estimation in MRFs M. Pawan Kumar STANFORDSTANFORD Vladimir KolmogorovPhilip TorrDaphne Koller.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Group Norm for Learning Latent Structural SVMs Overview Daozheng Chen (UMD, College Park), Dhruv Batra (TTI Chicago), Bill Freeman (MIT), Micah K. Johnson.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Efficient Model Selection for Support Vector Machines

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Self-paced Learning for Latent Variable Models

Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Object Detection with Discriminatively Trained Part Based Models

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Object detection, deep learning, and R-CNNs

Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.

Recognition Using Visual Phrases

Learning from Big Data Lecture 5

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Discriminative Machine Learning Topic 3: SVM Duality Slides available online M. Pawan Kumar (Based on Prof.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Recent developments in object detection

Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Group Norm for Learning Latent Structural SVMs

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Primal Sparse Max-Margin Markov Networks

SVMs for Document Ranking

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar

PASCAL VOC “Jumping” Classification Features Processing Training Classifier

PASCAL VOC Features Processing Training Classifier Think of a classifier !!! “Jumping” Classification ✗

PASCAL VOC Features Processing Training Classifier Think of a classifier !!! ✗ “Jumping” Ranking

Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1

Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1Accuracy= 1 = 0.92 = 0.67 = 0.81

Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!

Structured Output SVM Optimizing Average Precision High-Order Information Missing Information Related Work Taskar, Guestrin and Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachims and Altun, ICML 2004 Outline

Structured Output SVM Input xOutput yJoint Feature Ψ(x,y) Scoring function s(x,y;w) = w T Ψ(x,y) Prediction y(w) = argmax y s(x,y;w)

Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w)) Loss function for i-th sample Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur) Parameter Estimation

Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w))w T Ψ(x,y i (w)) +- w T Ψ(x,y i (w)) ≤ w T Ψ(x,y i (w)) +Δ(y i,y i (w)) - w T Ψ(x,y i ) ≤ max y { w T Ψ(x,y) + Δ(y i,y) } - w T Ψ(x,y i ) ConvexSensitive to regularization of w Parameter Estimation

Training data {(x i,y i ), i = 1,2,…,m} w T Ψ(x,y) + Δ(y i,y) - w T Ψ(x,y i ) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { w T Ψ(x,y) + Δ(y i,y) }

Training data {(x i,y i ), i = 1,2,…,m} s(x,y;w) + Δ(y i,y) - s(x,y i ;w) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { s(x,y;w) + Δ(y i,y) }

Problem Formulation –Input –Output –Joint Feature Vector or Scoring Function Learning Formulation –Loss function (‘test’ evaluation criterion) Optimization for Learning –Cutting plane (loss-augmented inference) Prediction –Inference Recap

Structured Output SVM Optimizing Average Precision (AP-SVM) High-Order Information Missing Information Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007 Outline

Problem Formulation Single Input X Φ(x i ) for all i  P Φ(x k ) for all k  N

Problem Formulation Single Output R R ik = +1 if i is better ranked than k -1 if k is better ranked than i

Problem Formulation Scoring Function s i (w) = w T Φ(x i ) for all i  P s k (w) = w T Φ(x k ) for all k  N S(X,R;w) = Σ i  P Σ k  N R ik (s i (w) - s k (w))

Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

Optimization for Learning Optimal greedy algorithm is O(|P||N|) run time. Cutting Plane Computation Yue, Finley, Radlinski and Joachims, SIGIR 2007

Ranking Sort in decreasing order of individual score s i (w) Yue, Finley, Radlinski and Joachims, SIGIR 2007

Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features

AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes

AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes

Structured Output SVM Optimizing Average Precision High-Order Information (M4-AP-SVM) Missing Information Related Work Kumar, Behl, Jawahar and Kumar, Submitted Outline

High-Order Information People perform similar actions People strike similar poses Objects are of same/similar sizes “Friends” have similar habits How can we use them for ranking? classification

Problem Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Ψ(x,y) = Ψ1(x,y)Ψ1(x,y) Ψ2(x,y)Ψ2(x,y) Unary Features Pairwise Features

Learning Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Δ(y*,y) = Fraction of incorrectly classified persons

Optimization for Learning x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

Classification x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

Ranking? x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Use difference of max-marginals

Max-Marginal for Positive Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm + (i;w) = max y,y i =+1 w T Ψ(x,y) Best possible score when person i is positive Convex in w

Max-Marginal for Negative Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm - (i;w) = max y,y i =-1 w T Ψ(x,y) Best possible score when person i is negative Convex in w

Ranking x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 s i (w) = mm + (i;w) – mm - (i;w) Difference-of-Convex in w Use difference of max-marginals HOB-SVM

Ranking s i (w) = mm + (i;w) – mm - (i;w) Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM

Problem Formulation Single Input X Φ(x i ) for all i  P Φ(x k ) for all k  N

Problem Formulation Single Input R R ik = +1 if i is better ranked than k -1 if k is better ranked than i

Problem Formulation Scoring Function s i (w) = mm + (i;w) – mm - (i;w) for all i  P S(X,R;w) = Σ i  P Σ k  N R ik (s i (w) - s k (w)) s k (w) = mm + (k;w) – mm - (k;w) for all k  N

Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

Optimization for Learning Difference-of-convex program Kohli and Torr, ECCV 2006 Very efficient CCCP Linearization step by Dynamic Graph Cuts Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted

Ranking Sort in decreasing order of individual score s i (w)

Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features

HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes

HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes

M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Better in 7, worse in 2 and tied in 1 class Difference in AP

M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes Difference in AP

Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Behl, Jawahar and Kumar, CVPR 2014

Fully Supervised Learning

Weakly Supervised Learning Rank images by relevance to ‘jumping’

Use Latent Structured SVM with AP loss –Unintuitive Prediction –Loose Upper Bound on Loss –NP-hard Optimization for Cutting Planes Carefully design a Latent-AP-SVM –Intuitive Prediction –Tight Upper Bound on Loss –Optimal Efficient Cutting Plane Computation Two Approaches

Results

Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Mohapatra, Jawahar and Kumar, In Preparation

conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8 fcA fcB Softmax + cross-entropy loss W AP loss W AP-CNN Small but statistically significant improvements

Questions? Code + Data Available