Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI

The challenge of large scale visual interaction Last decade has proven the superiority of models learned from data vs. hand engineered structures!

Unsupervised: Learn models from found data; often exploit multiple modalities (text+image) Large-scale learning … The Tote is the perfect example of two handbag design principles that... The lines of this tote are incredibly sleek, but... The semi buckles that form the handle attachments are...

E.g., finding visual senses 4 Artifact sense: telephone DICTIONARY 1: (n) telephone, phone, telephone set (electronic equipment that converts sound into electrical signals that can be transmitted over distances and then converts received signals back into sounds):phone telephone set 2: (n) telephone, telephony (transmitting speech at a distance): telephony [Saenko and Darrell 09]

Unsupervised: Learn models from found data; often exploit multiple modalities (text+image) Supervised: Crowdsource labels (e.g., ImageNet) Large-scale Learning … The Tote is the perfect example of two handbag design principles that... The lines of this tote are incredibly sleek, but... The semi buckles that form the handle attachments are...

Yet… Even the best collection of images from the web and strong machine learning methods can often yield poor classifiers on in-situ data! Supervised learning assumption: training distribution == test distribution Unsupervised learning assumption: joint distribution is stationary w.r.t. online world and real world Almost never true! 6 ?

What You Saw Is Not What You Get The models fail due to domain shift SVM:54% NBNN:61% SVM:20% NBNN:19%

Close-up Far-away amazon.com Consumer images FLICKR CCTV Examples of visual domain shifts digital SLRwebcam

Examples of domain shift: change in camera, feature type, dimension digital SLR webcam SURF VQ to 300 SIFT VQ to 1000 Different dimensions

Solutions? Do nothing (poor performance) Collect all types of data (impossible) Find out what changed (impractical) Learn what changed

Prior Work on Domain Adaptation Pre-process the data [Daumé 07] : replicate features to also create source- and domain- specific versions; re-train learner on new features SVM-based methods [Yang07], [Jiang08], [Duan09], [Duan10] : adapt SVM parameters Kernel mean matching [Gretton09] : re-weight training data to match test data distribution

Our paradigm: Transform-based Domain Adaptation Previous methods drawbacks cannot transfer learned shift to new categories cannot handle new features We can do both by learning domain transformations * Example: green and blue domains W * Saenko, Kulis, Fritz, and Darrell. Adapting visual category models to new domains. ECCV, 2010

Symmetric assumption fails! Limitations of symmetric transforms Saenko et al. ECCV10 used metric learning: symmetric transforms same features How do we learn more general shifts ? W

Asymmetric transform (rotation) Latest approach*: asymmetric transforms Metric learning model no longer applicable We propose to learn asymmetric transforms – Map from target to source – Handle different dimensions *Kulis, Saenko, and Darrell, What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms, CVPR 2011

Asymmetric transform (rotation) W Latest approach: asymmetric transforms Metric learning model no longer applicable We propose to learn asymmetric transforms – Map from target to source – Handle different dimensions

Model Details Learn a linear transformation to map points from one domain to another – Call this transformation W – Matrices of source and target: W

Loss Functions Choose a point x from the source and y from the target, and consider inner product: Should be large for similar objects and small for dissimilar objects

Loss Functions Input to problem includes a collection of m loss functions General assumption: loss functions depend on data only through inner product matrix

Regularized Objective Function Minimize a linear combination of sum of loss functions and a regularizer: We use squared Frobenius norm as a regularizer – Not restricted to this choice

The Model Has Drawbacks A linear transformation may be insufficient Cost of optimization grows as the product of the dimensionalities of the source and target data What to do?

Kernelization Main idea: run in kernel space – Use a non-linear kernel function (e.g., RBF kernel) to learn non-linear transformations in input space – Resulting optimization is independent of input dimensionality – Additional assumption necessary: regularizer is a spectral function

Kernelization Original Transformation Learning Problem Kernel matrices for source and target New Kernel Problem Relationship between original and new problems at optimality

Summary of approach Input space 1. Multi-Domain Data 2. Generate Constraints, Learn W 3. Map via W4. Apply to New Categories Test point y1y1 y2y2

Multi-domain dataset

Experimental Setup Utilized a standard bag-of-words model Also utilize different features in the target domain – SURF vs SIFT – Different visual word dictionaries Baseline for comparing such data: KCCA

Same-Category Results Baselines (knn, svm, metric learning) explained in paper Our Method

Novel-class experiments Test methods ability to transfer domain shift to unseen classes Train transform on half of the classes, test on the other half Our Method (linear) Our Method

Extreme shift example Nearest neighbors in source using transformation Query from target Nearest neighbors in source using KCCA+KNN

Conclusion Should not rely on hand-engineered features any more than we rely on hand engineered models! Learn feature transformation across domains Developed a domain adaptation method based on regularized non-linear transforms – Asymmetric transform achieves best results on more extreme shifts – Saenko et al ECCV 2010 and Kulis et al CVPR 2011; journal version forthcoming

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.

Similar presentations

Presentation on theme: "Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.

Similar presentations

Presentation on theme: "Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI."— Presentation transcript:

Similar presentations

About project

Feedback