Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

Slides:



Advertisements
Similar presentations
Adding Unlabeled Samples to Categories by Learned Attributes Jonghyun Choi Mohammad Rastegari Ali Farhadi Larry S. Davis PPT Modified By Elliot Crowley.
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Machine learning continued Image source:
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
An Overview of Machine Learning
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.
From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Principal Component Analysis
A Study of Approaches for Object Recognition
Latent Dirichlet Allocation a generative model for text
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Spatial Pyramid Pooling in Deep Convolutional
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Using Image Priors in Maximum Margin Classifiers Tali Brayer Margarita Osadchy Daniel Keren.
Biomedical Image Analysis and Machine Learning BMI 731 Winter 2005 Kun Huang Department of Biomedical Informatics Ohio State University.
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
Introduction to domain adaptation
Training and future (test) data follow the same distribution, and are in same feature space.
Selective Transfer Machine for Personalized Facial Action Unit Detection Wen-Sheng Chu, Fernando De la Torre and Jeffery F. Cohn Robotics Institute, Carnegie.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Presented By Wanchen Lu 2/25/2013
Person-Specific Domain Adaptation with Applications to Heterogeneous Face Recognition (HFR) Presenter: Yao-Hung Tsai Dept. of Electrical Engineering, NTU.
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
DeepFont: Large-Scale Real-World Font Recognition from Images
IIIT Hyderabad Synthesizing Classifiers for Novel Settings Viresh Ranjan CVIT,IIIT-H Adviser: Prof. C. V. Jawahar, IIIT-H Co-Adviser: Dr. Gaurav Harit,
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
EVALUATING TRANSFER LEARNING APPROACHES FOR IMAGE INFORMATION MINING APPLICATIONS Surya Durbha*, Roger King, Nicolas Younan, *Indian Institute of Technology(IIT),
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
H. Lexie Yang1, Dr. Melba M. Crawford2
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Guillaume-Alexandre Bilodeau
Data Mining, Neural Network and Genetic Programming
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Cross Domain Distribution Adaptation via Kernel Mapping
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Article Review Todd Hricik.
Face Recognition and Feature Subspaces
Video Summarization via Determinantal Point Processes (DPP)
Finding Clusters within a Class to Improve Classification Accuracy
Bilinear Classifiers for Visual Recognition
[Figure taken from googleblog
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Presentation transcript:

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman 1

Motivation Mismatch between different domains/datasets –Object recognition Ex. [Torralba & Efros’11, Perronnin et al.’10] –Video analysis Ex. [Duan et al.’09, 10] –Pedestrian detection Ex. [Dollár et al.’09] –Other vision tasks 2 Performance degrades significantly! Images from [Saenko et al.’10]. TRAIN TEST

Unsupervised domain adaptation Source domain (labeled) Target domain (unlabeled) Objective Train classification model to work well on the target 3 The two distributions are not the same! ?

Challenges How to optimally, w.r.t. target domain, define discriminative loss function select model, tune parameters How to solve this ill-posed problem? impose additional structure 4

Correcting sample bias –Ex. [Shimodaira’00, Huang et al.’06, Bickel et al.’07] –Assumption: marginal distributions are the only difference. Learning transductively –Ex. [Bergamo & Torresani’10, Bruzzone & Marconcini’10] –Assumption: classifiers have high-confidence predictions across domains. Learning a shared representation –Ex. [Daumé III’07, Pan et al.’09, Gopalan et al.’11] –Assumption: a latent feature space exists in which classification hypotheses fit both domains. Examples of existing approaches 5

Our approach: learning a shared representation Key insight: bridging the gap –Fantasize infinite number of domains –Integrate out analytically idiosyncrasies in domains –Learn invariant features by constructing kernel 6 Target Source

Main idea: geodesic flow kernel 1.Model data with linear subspaces 2.Model domain shift with geodesic flow 3.Derive domain-invariant features with kernel 4.Classify target data with the new features Target Source 2

Assume low-dimensional structure Ex. PCA, Partial Least Squares (source only) Modeling data with linear subspaces 8 Target Source

Grassmann manifold –Collection of d-dimensional subspaces of a vector space –Each point corresponds to a subspace Characterizing domains geometrically 9 Target subspace Source subspace

Target Source Geodesic flow on the manifold –starting at source & arriving at target in unit time –flow parameterized with one parameter –closed-form, easy to compute with SVD Modeling domain shift with geodesic flow 10

Target Source Modeling domain shift with geodesic flow 11 Subspaces: Domains: SourceTarget

Source Modeling domain shift with geodesic flow 12 Subspaces: Domains: Along this flow, points (subspaces) represent intermediate domains.

Domain-invariant features 13 TargetSource More similar to source.

Domain-invariant features 14 TargetSource More similar to target.

Domain-invariant features 15 TargetSource Blend the two.

Measuring feature similarities with inner products 16 More similar to source. More similar to target. Invariant to either source or target.

We define the geodesic flow kernel (GFK): Advantages –Analytically computable –Robust to variants towards either source or target –Broadly applicable: can kernelize many classifiers Learning domain-invariant features with kernels 17

Contrast to discretely sampling 18 GFK (ours)[Gopalan et al. ICCV 2011] Dimensionality reduction No free parameters Number of subspaces, dimensionality of subspace, dimensionality after reduction GFK is conceptually cleaner and computationally more tractable.

Recap of key steps Target subspace Source subspace 2

Experimental setup Four domains Features Bag-of-SURF Classifier: 1NN Average over 20 random trials 20 Caltech-256Amazon DSLRWebcam

Classification accuracy on target 21 Accuracy (%) Source  Target

Classification accuracy on target 22 Accuracy (%) Source  Target

Classification accuracy on target 23 Accuracy (%) Source  Target

Which domain should be used as the source? 24 DSLR Caltech-256 Webcam Amazon

We introduce the Rank of Domains measure: Intuition –Geometrically, how subspaces disagree –Statistically, how distributions disagree 25 Automatically selecting the best

26 Possible sources Our ROD measure Caltech Amazon 0 DSLR 0.26 Webcam 0.05 Caltech-256 adapts the best to Amazon. Accuracy (%) Source  Target Automatically selecting the best

Semi-supervised domain adaptation Label three instances per category in the target 27 Accuracy (%) Source  Target

Cross-dataset generalization [Torralba & Efros’11] 28 Accuracy (%) Analyzing datasets in light of domain adaptation

Cross-dataset generalization [Torralba & Efros’11] 29 Accuracy (%) Caltech-101 generalizes the worst. Performance drop of ImageNet is big. Analyzing datasets in light of domain adaptation Performance drop!

Cross-dataset generalization [Torralba & Efros’11] 30 Accuracy (%) Performance drop becomes smaller! Caltech-101 generalizes the worst (w/ or w/o adaptation). There is nearly no performance drop of ImageNet. Analyzing datasets in light of domain adaptation

Summary Unsupervised domain adaptation –Important in visual recognition –Challenge: no labeled data from the target Geodesic flow kernel (GFK) –Conceptually clean formulation: no free parameter –Computationally tractable: closed-form solution –Empirically successful: state-of-the-art results New insight on vision datasets –Cross-dataset generalization with domain adaptation –Leveraging existing datasets despite their idiosyncrasies 31

Future work Beyond subspaces Other techniques to model domain shift From GFK to statistical flow kernel Add more statistical properties to the flow Applications of GFK Ex., face recognition, video analysis 32

Summary 33 Unsupervised domain adaptation –Important in visual recognition –Challenge: no labeled data from the target Geodesic flow kernel (GFK) –Conceptually clean formulation –Computationally tractable –Empirically successful New insight on vision datasets –Cross-dataset generalization with domain adaptation –Leveraging existing datasets despite their idiosyncrasies