Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Self-Paced Learning for Semantic Segmentation

Learning Specific-Class Segmentation from Diverse Data M. Pawan Kumar, Haitherm Turki, Dan Preston and Daphne Koller at ICCV 2011 VGG reading group, 29.

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

Curriculum Learning for Latent Structural SVM

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Linear Classifiers (perceptrons)

Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura.

Max-Margin Latent Variable Models M. Pawan Kumar.

Learning Structural SVMs with Latent Variables Xionghao Liu.

Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold.

Introduction to Predictive Learning

Learning to Segment from Diverse Data M. Pawan Kumar Daphne KollerHaithem TurkiDan Preston.

Large-scale Classification and Regression Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Mathematical Programming in Support Vector Machines

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Efficient Model Selection for Support Vector Machines

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Self-paced Learning for Latent Variable Models

Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Motif finding with Gibbs sampling CS 466 Saurabh Sinha.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Object Detection with Discriminatively Trained Part Based Models

Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Learning from Big Data Lecture 5

COT6930 Course Project. Outline Gene Selection Sequence Alignment.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.

Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational.

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

Loss-based Learning with Weak Supervision M. Pawan Kumar.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

Week 3 Emily Hand UNR. Online Multiple Instance Learning The goal of MIL is to classify unseen bags, instances, by using the labeled bags as training.

Discriminative Machine Learning Topic 3: SVM Duality Slides available online M. Pawan Kumar (Based on Prof.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.

Machine Learning – Classification David Fenyő

Evaluating Classifiers

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Edge Weight Prediction in Weighted Signed Networks

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Group Norm for Learning Latent Structural SVMs

Probabilistic Models with Latent Variables

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris & INRIA Saclay, France C C Aim To estimate accurate model parameters by optimizing average precision with weakly supervised data Disadvantages: Prediction: LSSVM uses an unintuitive prediction Learning: LSSVM optimizes a loose upper-bound on the AP-loss Optimization: Exact loss-augmented inference is computationally inefficient Learning: Compares scores between 2 different sets of annotation CCCP Algorithm – Guarantees local optimum solution Latent AP-SVM Step 1: Find the best h i for each sample Step 2: Sort samples according to best scores Results Action Classification 5-fold cross validation t-test performed increase in performance 6/10 classes over LSVM 7/10 classes over LSSVM Overall improvement: 5% over LSVM 4% over LSSVM Performance on test set increase in performance all classes over LSVM 8/10 classes over LSSVM Overall improvement: 5.1% over LSVM 3.7% over LSSVM Notation x h y = “Using Computer” 1, x i ranked higher than x j Y: ranking matrix, st. Y ij = 0, x i & x j are ranked same -1, x i ranked lower than x j X: Input {x i = 1,..,n} { H p : Additional unknown information for positives {h i, i ∈ P} H N : Additional information for negatives {h j, j ∈ N} ∆(Y, Y ∗ ): AP-loss = 1 − AP(Y, Y ∗ ) Latent Structural SVM (LSSVM) Prediction: Learning: Compares scores between same sets of additional annotation Constraints of latent AP-SVM are a subset of LSSVM constraints Optimal solution of latent AP-SVM has a lower objective than LSSVM solution Latent AP-SVM provides a valid upper-bound on the AP-loss 1.Initialize the set of parameters w 0 2.Repeat until convergence 3.Imputation of the additional annotations for positives 4. Parameter update using cutting plane algorithm. Code and data available at: Travel grant provided by Microsoft Research India. Dataset- PASCAL VOC 2011 action classification dataset 4846 images depicting 10 action classes 2424 ‘trainval’ images and 2422 ‘test’ images Problem formulation- x: image of person performing action h: bounding box of the person y: action class Features activation scores of action-specific poselets & 4 object activation scores Negatives Positives NegativesPositives Optimization H opt = argmax H w T Ψ(X,Y,H) Y opt = argmax Y w T Ψ(X,Y,H opt ) (Y opt,H opt ) = max Y,H w T Ψ(X,Y,H) min w ½ ||w|| 2 + Cξ s.t. ∀ Y,H : max Ĥ {w T Ψ(X,Y *,Ĥ)} - w T Ψ(X,Y,H) ≥ Δ(Y,Y * ) - ξ min w ½ ||w|| 2 + Cξ s.t. ∀ Y,H N : max Hp {w T Ψ(X,Y *,{H P,H N }) - w T Ψ(X,Y,{H P,H N })} ≥ Δ(Y,Y * ) - ξ AP-SVM AP-SVM optimizes the correct AP-loss function as opposed to 0/1 loss. AP-loss depends on the ranking of the samples AP-loss = loss = 0.40 AP-loss = loss = 0.40 AP is the most commonly used accuracy measure for binary classification Learning: Prediction: Y opt = max Y w T Ψ(X,Y) min w ½ ||w|| 2 + Cξ s.t. ∀ Y : w T Ψ(X,Y * ) - w T Ψ(X,Y) ≥ Δ(Y,Y * ) - ξ Optimizing correct loss function is important for weakly supervised learning We also get improved results on the IIIT 5K-WORD dataset and PASCAL VOC 2007 object detection dataset Independently choose additional annotation H P Complexity: O(n P.|H|) Maximize over H N and Y independently Complexity: O(n P.n N ) Latent AP-SVM provides a tighter upper-bound on the AP Loss AP(Y, Y ∗ ) = AP of ranking Y 0-1 loss depends only on the number of incorrectly classified samples