Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

Slides:

Advertisements

Similar presentations

Max-Margin Additive Classifiers for Detection

Advertisements

Learning Semantics with Less Supervision

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

Patch to the Future: Unsupervised Visual Prediction

Lecture 31: Modern object recognition

Data-driven Visual Similarity for Cross-domain Image Matching

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Boundary Preserving Dense Local Regions

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Fast intersection kernel SVMs for Realtime Object Detection

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Lecture 28: Bag-of-words models

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

1 Unsupervised Modeling and Recognition of Object Categories with Combination of Visual Contents and Geometric Similarity Links Gunhee Kim Christos Faloutsos.

1 Unsupervised Modeling of Object Categories Using Link Analysis Techniques Gunhee Kim Christos Faloutsos Martial Hebert Gunhee Kim Christos Faloutsos.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.

Object Recognition by Discriminative Methods Sinisa Todorovic 1st Sino-USA Summer School in VLPR July, 2009.

Discriminative and generative methods for bags of features

A String Matching Approach for Visual Retrieval and Classification Mei-Chen Yeh* and Kwang-Ting Cheng Learning-Based Multimedia Lab Department of Electrical.

What Makes Paris Look like Paris? Carl Doersch 1 Saurabh Singh 1 Abhinav Gupta 1 Josef Sivic 2 Alexei A. Efros 1,2 1 Carnegie Mellon University 2 INRIA.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Nonparametric Part Transfer for Fine-grained Recognition Presenter Byungju Kim.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

CVPR 2006 New York City Spatial Random Partition for Common Visual Pattern Discovery Junsong Yuan and Ying Wu EECS Dept. Northwestern Univ.

Project 3 Results.

Object detection, deep learning, and R-CNNs

Histograms of Oriented Gradients for Human Detection(HOG)

Methods for classification and image representation

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

Hierarchical Matching with Side Information for Image Classification

Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Recognition Using Visual Phrases

Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.

Presented by David Lee 3/20/2006

Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.

Lecture IX: Object Recognition (2)

Recent developments in object detection

Presented by David Lee 3/20/2006

The topic discovery models

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Learning Mid-Level Features For Recognition

Action Recognition ECE6504 Xiao Lin.

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Digit Recognition using SVMS

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

The topic discovery models

Object detection as supervised classification

Finding Clusters within a Class to Improve Classification Accuracy

Object-Graphs for Context-Aware Category Discovery

CS 1674: Intro to Computer Vision Scene Recognition

RCNN, Fast-RCNN, Faster-RCNN

Presentation transcript:

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon University / UC Berkeley ICCV 2013

where? (botany, geography) when? (historical dating) Long before the age of “data mining” …

when?1972

where? “The View From Your Window” challenge Krakow, Poland Church of Peter & Paul

Visual data mining in Computer Vision Visual world Most approaches mine globally consistent patterns Object category discovery [Sivic et al. 2005, Grauman & Darrell 2006, Russell et al. 2006, Lee & Grauman 2010, Payet & Todorovic, 2010, Faktor & Irani 2012, Kang et al. 2012, …] Low-level “visual words” [Sivic & Zisserman 2003, Laptev & Lindeberg 2003, Czurka et al. 2004, …]

Visual data mining in Computer Vision Recent methods discover specific visual patterns Paris Prague Visual world Paris non-Paris Mid-level visual elements [Doersch et al. 2012, Endres et al. 2013, Juneja et al. 2013, Fouhey et al. 2013, Doersch et al. 2013]

Problem Much in our visual world undergoes a gradual change Temporal:

Much in our visual world undergoes a gradual change Spatial:

Our Goal year when? Historical dating of cars [Kim et al. 2010, Fu et al. 2010, Palermo et al. 2012] Mine mid-level visual elements in temporally- and spatially-varying data and model their “visual style” [Cristani et al. 2008, Hays & Efros 2008, Knopp et al. 2010, Chen & Grauman. 2011, Schindler et al. 2012] where? Geolocalization of StreetView images

Key Idea 1) Establish connections 2) Model style-specific differences “closed-world”

Approach

Mining style-sensitive elements Sample patches and compute nearest neighbors [Dalal & Triggs 2005, HOG]

Mining style-sensitive elements PatchNearest neighbors

Mining style-sensitive elements PatchNearest neighbors style-sensitive

Mining style-sensitive elements PatchNearest neighbors style-insensitive

Mining style-sensitive elements Nearest neighbors Patch

Mining style-sensitive elements PatchNearest neighbors uniform tight

Mining style-sensitive elements (a) Peaky (low-entropy) clusters

(b) Uniform (high-entropy) clusters Mining style-sensitive elements

Making visual connections Take top-ranked clusters to build correspondences 1920s – 1990s Dataset 1940s 1920s

Making visual connections Train a detector (HoG + linear SVM) [Singh et al. 2012] Natural world “background” dataset 1920s

Making visual connections 1920s1930s1940s1950s1960s1970s1980s1990s Top detection per decade [Singh et al. 2012]

Making visual connections We expect style to change gradually… Natural world “background” dataset 1920s 1930s 1940s

Making visual connections Top detection per decade 1990s1930s1940s1960s1970s1980s1920s1950s

Making visual connections Top detection per decade 1920s1930s1940s1950s1960s1970s1980s1990s

Making visual connections Initial model (1920s)Final model Initial model (1940s)Final model

Results: Example connections

Training style-aware regression models Regression model 1 Regression model 2 Support vector regressors with Gaussian kernels Input: HOG, output: date/geo-location

Training style-aware regression models detector regression output detector regression output Train image-level regression model using outputs of visual element detectors and regressors as features

Results

Results: Date/Geo-location prediction Crawled from from Google Street View 13,473 images Tagged with year 1920 – ,455 images Tagged with GPS coordinate N. Carolina to Georgia

OursDoersch et al. ECCV, SIGGRAPH 2012 Spatial pyramid matching Dense SIFT bag-of-words Cars8.56 (years) Street View77.66 (miles) Results: Date/Geo-location prediction Mean Absolute Prediction Error Crawled from from Google Street View

Results: Learned styles Average of top predictions per decade

Extra: Fine-grained recognition OursZhang et al. CVPR 2012 Berg, Belhumeur CVPR Mean classification accuracy on Caltech- UCSD Birds 2011 dataset Zhang et al. ICCV 2013 Chai et al. ICCV 2013 Gavves et al. ICCV weak-supervision strong-supervision

Conclusions Models visual style: appearance correlated with time/space First establish visual connections to create a closed-world, then focus on style-specific differences

Thank you! Code and data will be available at