Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.

Slides:

Advertisements

Similar presentations

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Advertisements

A brief review of non-neural-network approaches to deep learning

Sharing Features Between Visual Tasks at Different Levels of Granularity Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin,

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,

CHAPTER 10: Linear Discrimination

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

Computer vision: models, learning and inference Chapter 8 Regression.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.

Kernel Technique Based on Mercer’s Condition (1909)

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Binary Classification Problem Learn a Classifier from the Training Set

Support Vector Machines

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Semantic Kernel Forests from Multiple Taxonomies Sung Ju Hwang (University of Texas at Austin), Fei Sha (University of Southern California), and Kristen.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Outline Separating Hyperplanes – Separable Case

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

An Introduction to Support Vector Machines (M. Law)

N-best Reranking by Multitask Learning Kevin Duh Katsuhito Sudoh Hajime Tsukada Hideki Isozaki Masaaki Nagata NTT Communication Science Laboratories 2-4.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Methods for classification and image representation

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Multiplicative updates for L1-regularized regression

Boosting and Additive Trees (2)

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

COMP61011 : Machine Learning Ensemble Models

Machine Learning Basics

Thesis Advisor : Prof C.V. Jawahar

Video Summarization via Determinantal Point Processes (DPP)

Support Vector Machines Introduction to Data Mining, 2nd Edition by

Learning with information of features

Bilinear Classifiers for Visual Recognition

CS 2750: Machine Learning Support Vector Machines

COSC 4335: Other Classification Techniques

Machine learning overview

COSC 4368 Machine Learning Organization

Minimal Kernel Classifiers

Presentation transcript:

Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of Southern California Problem Experimental results Conclusion/Future Work Learning shared features via regularization Existing approaches to attribute-based recognition treat attributes as mid- level features or semantic labels to infer relations. Algorithm Main Idea We propose to simultaneously learn shared features that are discriminative for both object and attribute tasks. Recognition accuracy for each class Recognition accuracy for each class Methods compared Methods compared Dataset Dataset Sharing features via sparsity regularization Sharing features via sparsity regularization Trace norm regularization. Analysis Extension to Kernel classifiers Extension to Kernel classifiers Example object class / attribute predictions Example object class / attribute predictions 1) No sharing-Obj. :Baseline SVM classifier for object class recognition 2) No sharing-Attr. : Baseline object recognition on predicted attributes as in Lampert’s approach 3) Sharing-Obj. : Our multitask feature sharing with the object class classifiers only 4) Sharing-Attr. : Our multitask feature sharing method with object class + attribute classifiers Our approach makes substantial improvements over the baselines. Exploiting the external semantics the auxiliary attribute tasks provide, our learned features generalize better---particularly when less training data is available. Recognition accuracy Recognition accuracy Predicted Object Predicted Attributes NSO NSA Ours Dolphin Walrus Grizzly bear NSO NSA Ours Grizzly bear Rhinoceros Moose NSO NSA Ours Giant Panda Rabbit Rhinoceros Fast, active, toughskin, chewteeth, forest, ocean, swims NSA Ours Fast, active, toughskin, fish, forest, meatteeth, strong Strong, inactive, vegetation, quadrupedal, slow, walks, big NSA Ours Strong, toughskin, slow, walks, vegetation, quadrupedal, inactive Quadrupedal, oldworld, walks, ground, furry, gray, chewteeth NSA Ours Quadrupedal, oldworld, ground, walks, tail, gray, furry Selecting useful attributes for sharing Selecting useful attributes for sharing No. Attributes with high mutual information yield better features, as they provide more information for disambiguation. 1)We make improvements on 33 of 50 AWA classes, and on all classes of OSR. 2)Classes with more distinct attributes benefit more from feature sharing: e.g. Dalmatian, leopard, giraffe, zebra Animals with AttributesOutdoor scene recognition Multitask feature learning Multitask feature learning Are all attributes equally useful? Object classifier Attribute classifier White, Spots, Long leg Polar bear Motivation: By regularizing the object classifiers to use visual features shared by attributes, we aim to select features associated with generic semantic concepts avoid overfitting when object-labeled data is lacking SVM loss function on the original feature space Red: incorrect prediction Our method makes more semantically meaningful predictions. 1)Reduces confusion between semantically distinct pairs 2)Introduces confusion between semantically close pairs color Dataset Animals with Attributes (AWA) Outdoor Scene Recognition (OSR) # images30,4752,688 # classes508 # attributes856 AlgorithmLinearKernelized AWA OSR 1) Our method is more robust to background clutter, as it has a more refined set of features from sparsity regularization with attributes. 2) Our method makes robust predictions in atypical cases. 3) When our method fails, it often makes more semantically “close” predictions. patches Visual features 1) Formulate kernel matrix K 2) Compute the basis vector B and diagonal matrix S using Gram- Schmidt process 3) Transform the data according to the learned B and S (2,1)-norm L2-norm: Joint data fitting L1-norm: sparsity sparsity How can we promote common sparsity across different parameters? → We use a (2,1)-norm regularizer that minimizes L1 norm across tasks [Argyriou08]. general Training set specific : n-th feature vector : n-th label for task t : parameter (weight vector) for task t Multitask Feature Learning: Sparsity regularization on the parameters across different tasks results in shared features with better generalization power. : regularization parameter : Orthogonal transformation to a shared feature space Covariance matrix. Learning shared features for linear classifiers Learning shared features for linear classifiers 1) Initialize covariance matrix Ω with a scaled identity matrix I/D 2) Transform the variables using the covariance matrix Ω : transformed n-th feature vector : transformed classifier weights 3) Solve for the optimal weights, while holding Ω fixed. 4) Update the covariance matrix Ω Alternate until W converges : weight vectors : smoothing parameter (for numerical stability) Variable updates 4) Apply the algorithm for the linear classifiers on the transformed features gradients Initialization DalmatianWhiteSpots Object class Dalmatian Attributes Independent classifier training Feature learning Object class classifier Attributes classifier However, in conventional models, attribute-labeled data does not directly introduce new information when learning the objects; supervision for objects and attributes is separate. → This may reduce the impact of the extra semantic information attributes provide. Separate training u2u2 u1u1 u3u3 uDuD x2x2 x1x1 x3x3 xDxD Shared features Input visual features We adopt the alternating optimization algorithm from [Argyriou08] that can train classifiers and learn the shared features at each step. [Argyriou08] M. Argyriou, T. Evgeniou, Convex Multi-task Feature Learning, Machine Learning, 2008 By sharing features between objects and attributes, we improve object class recognition rates. By exploiting the auxiliary semantics, our approach effectively regularizes the object models. Future work 1) Extension to other semantic labels beyond attributes. 2) Learning to share structurally. Semantically meaningful prediction Semantically meaningful prediction Transformed (Shared) features Sparsity regularizer loss function Convex optimization Convex optimization However, the (2,1)-norm is nonsmooth. We instead solve an equivalent form, in which the features are replaced with a covariance matrix that measures the relative effectiveness of each dimension.