Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon University / UC Berkeley ICCV 2013

where? (botany, geography) when? (historical dating) Long before the age of “data mining” …

when?1972

where? “The View From Your Window” challenge Krakow, Poland Church of Peter & Paul

Visual data mining in Computer Vision Visual world Most approaches mine globally consistent patterns Object category discovery [Sivic et al. 2005, Grauman & Darrell 2006, Russell et al. 2006, Lee & Grauman 2010, Payet & Todorovic, 2010, Faktor & Irani 2012, Kang et al. 2012, …] Low-level “visual words” [Sivic & Zisserman 2003, Laptev & Lindeberg 2003, Czurka et al. 2004, …]

Visual data mining in Computer Vision Recent methods discover specific visual patterns Paris Prague Visual world Paris non-Paris Mid-level visual elements [Doersch et al. 2012, Endres et al. 2013, Juneja et al. 2013, Fouhey et al. 2013, Doersch et al. 2013]

Problem Much in our visual world undergoes a gradual change Temporal: 1887-19001900-19411941-19691958-19691969-1987

Much in our visual world undergoes a gradual change Spatial:

Our Goal 19201940196019802000 year when? Historical dating of cars [Kim et al. 2010, Fu et al. 2010, Palermo et al. 2012] Mine mid-level visual elements in temporally- and spatially-varying data and model their “visual style” [Cristani et al. 2008, Hays & Efros 2008, Knopp et al. 2010, Chen & Grauman. 2011, Schindler et al. 2012] where? Geolocalization of StreetView images

Key Idea 1) Establish connections 2) Model style-specific differences 192619471975 192619471975 “closed-world”

Approach

Mining style-sensitive elements Sample patches and compute nearest neighbors [Dalal & Triggs 2005, HOG]

Mining style-sensitive elements PatchNearest neighbors

Mining style-sensitive elements PatchNearest neighbors style-sensitive

Mining style-sensitive elements PatchNearest neighbors style-insensitive

Mining style-sensitive elements Nearest neighbors 19291927192919231930 Patch 19991947197119381973 19461948194019391949 19371959195719811972

Mining style-sensitive elements PatchNearest neighbors uniform tight 19991947197119381973 19461948194019391949 19371959195719811972 19291927192919231930

Mining style-sensitive elements 1930 19241930 1931193219291930 196619811969 1972197319691987 1998196919811970 (a) Peaky (low-entropy) clusters

193919211948 1999196319301956 1962194119851995 1932197019911962 19231937 1982 1983192219481933 (b) Uniform (high-entropy) clusters Mining style-sensitive elements

Making visual connections Take top-ranked clusters to build correspondences 1920s – 1990s Dataset 1940s 1920s

Making visual connections Train a detector (HoG + linear SVM) [Singh et al. 2012] Natural world “background” dataset 1920s

Making visual connections 1920s1930s1940s1950s1960s1970s1980s1990s Top detection per decade [Singh et al. 2012]

Making visual connections We expect style to change gradually… Natural world “background” dataset 1920s 1930s 1940s

Making visual connections Top detection per decade 1990s1930s1940s1960s1970s1980s1920s1950s

Making visual connections Top detection per decade 1920s1930s1940s1950s1960s1970s1980s1990s

Making visual connections Initial model (1920s)Final model Initial model (1940s)Final model

Results: Example connections

Training style-aware regression models Regression model 1 Regression model 2 Support vector regressors with Gaussian kernels Input: HOG, output: date/geo-location

Training style-aware regression models detector regression output detector regression output Train image-level regression model using outputs of visual element detectors and regressors as features

Results

Results: Date/Geo-location prediction Crawled from www.cardatabase.netCrawled from Google Street View 13,473 images Tagged with year 1920 – 1999 4,455 images Tagged with GPS coordinate N. Carolina to Georgia

OursDoersch et al. ECCV, SIGGRAPH 2012 Spatial pyramid matching Dense SIFT bag-of-words Cars8.56 (years)9.7211.8115.39 Street View77.66 (miles)87.4783.9297.78 Results: Date/Geo-location prediction Mean Absolute Prediction Error Crawled from www.cardatabase.netCrawled from Google Street View

Results: Learned styles Average of top predictions per decade

Extra: Fine-grained recognition OursZhang et al. CVPR 2012 Berg, Belhumeur CVPR 2013 41.0128.1856.89 Mean classification accuracy on Caltech- UCSD Birds 2011 dataset Zhang et al. ICCV 2013 Chai et al. ICCV 2013 Gavves et al. ICCV 2013 50.9859.4062.70 weak-supervision strong-supervision

Conclusions Models visual style: appearance correlated with time/space First establish visual connections to create a closed-world, then focus on style-specific differences

Thank you! Code and data will be available at www.eecs.berkeley.edu/~yjlee22

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

Similar presentations

Presentation on theme: "Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

Similar presentations

Presentation on theme: "Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback