Sharing Features Between Visual Tasks at Different Levels of Granularity Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin,

Slides:



Advertisements
Similar presentations
Self-Paced Learning for Semantic Segmentation
Advertisements

Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
CHAPTER 10: Linear Discrimination
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Pattern Recognition and Machine Learning
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Binary Classification Problem Learn a Classifier from the Training Set
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
Semantic Kernel Forests from Multiple Taxonomies Sung Ju Hwang (University of Texas at Austin), Fei Sha (University of Southern California), and Kristen.
An Introduction to Support Vector Machines Martin Law.
Learning to Learn By Exploiting Prior Knowledge
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Outline Separating Hyperplanes – Separable Case
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
EE369C Final Project: Accelerated Flip Angle Sequences Jan 9, 2012 Jason Su.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Object Detection with Discriminatively Trained Part Based Models
An Introduction to Support Vector Machines (M. Law)
N-best Reranking by Multitask Learning Kevin Duh Katsuhito Sudoh Hajime Tsukada Hideki Isozaki Masaaki Nagata NTT Communication Science Laboratories 2-4.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Support Vector Machine
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
Multiplicative updates for L1-regularized regression
Boosting and Additive Trees (2)
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Machine Learning Basics
Thesis Advisor : Prof C.V. Jawahar
Pawan Lingras and Cory Butz
Accounting for the relative importance of objects in image retrieval
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Learning with information of features
CS 2750: Machine Learning Support Vector Machines
Probabilistic Models with Latent Variables
COSC 4335: Other Classification Techniques
The following slides are taken from:
COSC 4368 Machine Learning Organization
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Primal Sparse Max-Margin Markov Networks
Minimal Kernel Classifiers
Presentation transcript:

Sharing Features Between Visual Tasks at Different Levels of Granularity Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of Southern California Problem Sharing features between sub/superclasses A single visual instance can have multiple labels at different levels of semantic granularity.. Main Idea We propose to simultaneously learn shared features that are discriminative for tasks at different levels of semantic granularity Baselines Baselines Dataset Dataset Sharing features between objects/attributes Example object class / attribute predictions Example object class / attribute predictions 1) No sharing : Baseline SVM classifier for the object class recognition 2) Sharing-Same level only : Sharing features between object classifiers at the same level 3) Sharing+Superclass : Subclass classifiers trained with features shared with superclasses* 4) Sharing+Subclass : Superclass classifiers trained with features shared with subclasses* *We use the algorithm for kernel classifiers. 1) Finer grained categorization tasks benefit from sharing features with their superclasses. → A subclass classifier learns features specific to its superclass, so that it can discriminate better between itself and the classes that do NOT belong to the same superclass. 2) Coarser grained categorization tasks do not benefit from sharing features with their subclasses → Features learned for subclasses are just intra-class variances that introduce confusion. Recognition accuracy Recognition accuracy Predicted Object NSO NSA Ours Dolphin Walrus Grizzly bear NSO NSA Ours Grizzly bear Rhinoceros Moose NSO NSA Ours Giant Panda Rabbit Rhinoceros Fast, active, toughskin, chewteeth, forest, ocean, swims NSA Ours Fast, active, toughskin, fish, forest, meatteeth, strong Strong, inactive, vegetation, quadrapedal, slow, walks, big NSA Ours Strong, toughskin, slow, walks, vegetation, quadrapedal, inactive Quadrapedal, oldworld, walks, ground, furry, gray, chewteeth NSA Ours Quadrapedal, oldworld, ground, walks, tail, gray, furry White, Spots, Long leg Polar bear Motivation: By regularizing to use the shared features, we aim to select features that are associated with semantic concepts at each level avoid overfitting when object- labeled data is lacking 1) Our method is more robust to background clutter, as it has a more refined set of features from sparsity regularization with attributes. 2) Our method makes robust predictions in atypical cases. 3) When our method fails, it often makes more semantically “close” predictions. Visual instance Dalmatian Canine Spots Object class Dalmatian Attributes → How can we learn new information from these extra labels, that can aid object recognition? u2u2 u1u1 u3u3 uDuD x2x2 x1x1 x3x3 xDxD Shared features Input visual features Superclass Dog, Canine, Placental mammal Dataset# images# classes# AttributesHierarchy level Animals with Attributes30,47550(40)282 Recognition accuracy for each class Recognition accuracy for each class Overall recognition accuracy Overall recognition accuracy 1)We make improvement on 33 classes out of 50 AWA classes, and on all classes of OSR. 2)Classes with more distinct attributes benefit more from feature sharing: e.g. Dalmatian, leopard, giraffe, zebra 1) No sharing-Obj. :Baseline SVM classifier for object class recognition 2) No sharing-Attr. : Baseline object recognition on predicted attributes as in Lampert’s approach 3) Sharing-Obj. : Our multitask feature sharing with the object class classifiers only 4) Sharing-Attr. : Our multitask feature sharing method with object class + attribute classifiers Baselines Baselines Predicted Attributes Red: incorrect prediction Conclusion / Future Work By sharing features between classifiers learned at different levels of granularity, we improve object class recognition rates. The exploited semantics effectively regularize the object models. Future work 1) Automatic selection of useful attributes/superclass grouping. 2) Leveraging the label structure to refine degree of sharing. Algorithm Extension to Kernel classifiers Extension to Kernel classifiers 1) Formulate kernel matrix K 2) Compute the basis vector B and diagonal matrix S using Gram- Schmidt process 3) Transform the data according to the learned B and S Learning shared features for linear classifiers Learning shared features for linear classifiers 1) Initialize covariance matrix Ω with a scaled identity matrix I/D 2) Transform the variables using the covariance matrix Ω : transformed n-th feature vector : transformed classifier weights 3) Solve for the optimal weights, while holding Ω fixed. 4) Update the covariance matrix Ω Alternate until W converges : weight vectors : smoothing parameter (for numerical stability) Variable updates 4) Apply the algorithm for the linear classifiers on the transformed features Initialization Independent classifier training Feature learning We adopt the alternating optimization algorithm from [Argyriou08] that can train classifiers and learn the shared features at each step. Learning shared features via regularization Sharing features via sparsity regularization Sharing features via sparsity regularization Trace norm regularization. Multitask feature learning Multitask feature learning Object classifier Attribute classifier SVM loss function on the original feature space (2,1)-norm L2-norm: Joint data fitting L1-norm: sparsity sparsity general Training set specific : n-th feature vector : n-th label for task t : parameter (weight vector) for task t Multitask Feature Learning: Sparsity regularization on the parameters across different tasks results in shared features with better generalization power. : regularization parameter : Orthogonal transformation to a shared feature space Covariance matrix. [Argyriou08] M. Argyriou, T. Evgeniou, Convex Multi-task Feature Learning, Machine Learning, 2008 Transformed (Shared) features Sparsity regularizer loss function Convex optimization Convex optimization However, the (2,1)-norm is nonsmooth. We instead solve an equivalent form, in which the features are replaced with a covariance matrix that measures the relative effectiveness of each dimension. How can we promote common sparsity across different parameters? → We use a (2,1)-norm regularizer that minimizes L1 norm across tasks [Argyriou08]. Our approach makes substantial improvements over the baselines. Exploiting the external semantics the auxiliary attribute tasks provide, our learned features generalize better---particularly when less training data is available.