Opportunities of Scale Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Slides:



Advertisements
Similar presentations
The blue and green colors are actually the same.
Advertisements

James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.
Three things everyone should know to improve object retrieval
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
Patch to the Future: Unsupervised Visual Prediction
Data-driven Visual Similarity for Cross-domain Image Matching
IMAGE RESTORATION AND REALISM MILLIONS OF IMAGES SEMINAR YUVAL RADO.
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.
Recognition: A machine learning approach
Robust and large-scale alignment Image from
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Computer Graphics Lab Electrical Engineering, Technion, Israel June 2009 [1] [1] Xuemiao Xu, Animating Animal Motion From Still, Siggraph 2008.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Object retrieval with large vocabularies and fast spatial matching
Announcements Project 4 questions? Guest lectures Thursday: Richard Ladner “tactile graphics” Next Tuesday: Jenny Yuen and Jeff Bigham.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Let Computer Draw Qingyuan Kong. Goal Give me a picture “Obama stands in front of a pyramid”
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Where computer vision needs help from computer science (and machine learning) Bill Freeman Electrical Engineering and Computer Science Dept. Massachusetts.
Lecture 8: Image Alignment and RANSAC
Extreme, Non-parametric Object Recognition 80 million tiny images (Torralba et al)
Lecture 18: Context and segmentation CS6670: Computer Vision Noah Snavely.
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
Creating and Exploring a Large Photorealistic Virtual Space INRIA / CSAIL / Adobe First IEEE Workshop on Internet Vision, associated with CVPR 2008.
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Object Recognition and Augmented Reality
SBU Digital Media CSE 690 Internet Vision Organizational Meeting Tamara Berg Assistant Professor SUNY Stony Brook.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Computer Vision James Hays, Brown
Last tuesday, you talked about active shape models Data set of 1,500 hand-labeled faces 20 facial features (eyes, eye brows, nose, mouth, chin) Train 40.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Human abilities Presented By Mahmoud Awadallah 1.
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Previous lecture Texture Synthesis Texture Transfer + =
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.
Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Visual Data on the Internet With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger : Computational Photography Jean-Francois.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Data-driven Architectural texture mapping Texture mapping Un-textured 3D sceneTextured output Textured Architectures 由于建筑物的3D model和 textures均属于structured.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Internet-scale Imagery for Graphics and Vision James Hays cs129 Computational Photography Brown University, Spring 2011.
Tentative Future Courses Fall `11 : Computer Vision – emphasis on recognition Spring `11 : Graduate seminar Fall `12 : Computational Photography.
Biologically Inspired Vision-based Indoor Localization Zhihao Li, Ming Yang
Artificial Intelligence Skepticism by Josh Pippin.
Classical Computer Vision: Feature Engineering
Announcements Project 4 out today help session at the end of class.
Multi-Perspective Panoramas
Opportunities of Scale, Part 2
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Object detection as supervised classification
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Data-driven methods: Texture 2 (Sz 10.5)
CS 1674: Intro to Computer Vision Scene Recognition
Rob Fergus Computer Vision
Brief Review of Recognition + Context
Announcements Guest lecture next Tuesday
Presentation transcript:

Opportunities of Scale Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba

Opportunities of Scale: Data-driven methods Today’s class – Scene completion – Im2gps Next class – Recognition via Tiny Images – More recognition by association

Google and massive data-driven algorithms A.I. for the postmodern world: – all questions have already been answered…many times, in many ways – Google is dumb, the “intelligence” is in the data

Google Translate

Chinese Room, John Searle (1980) Most of the discussion consists of attempts to refute it. "The overwhelming majority," notes BBS editor Stevan Harnad,“ still think that the Chinese Room Argument is dead wrong." The sheer volume of the literature that has grown up around it inspired Pat Hayes to quip that the field of cognitive science ought to be redefined as "the ongoing research program of showing Searle's Chinese Room Argument to be false. If a machine can convincingly simulate an intelligent conversation, does it necessarily understand? In the experiment, Searle imagines himself in a room, acting as a computer by manually executing a program that convincingly simulates the behavior of a native Chinese speaker.

Big Idea What if invariance / generalization isn’t actually the core difficulty of computer vision? What if we can perform high level reasoning with brute-force, data-driven algorithms?

Image Completion Example [Hays and Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007 and CACM October 2008.]

What should the missing region contain?

Which is the original? (a) (b) (c)

How it works Find a similar image from a large dataset Blend a region from that image into the hole Dataset

General Principal Input Image Images Associated Info Huge Dataset Info from Most Similar Images image matching Hopefully, If you have enough images, the dataset will contain very similar images that you can find with simple matching methods.

How many images is enough?

Nearest neighbors from a collection of 20 thousand images

Nearest neighbors from a collection of 2 million images

Image Data on the Internet Flickr (as of Sept. 19 th, 2010) – 5 billion photographs – 100+ million geotagged images Facebook (as of 2009) – 15 billion

Image Data on the Internet Flickr (as of Nov 2013) – 10 billion photographs – 100+ million geotagged images – 3.5 million a day Facebook (as of Sept 2013) – 250 billion+ – 300 million a day Instagram – 55 million a day

Image completion: how it works [Hays and Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007 and CACM October 2008.]

The Algorithm

Scene Matching

Scene Descriptor

Scene Gist Descriptor (Oliva and Torralba 2001)

Scene Descriptor + Scene Gist Descriptor (Oliva and Torralba 2001)

2 Million Flickr Images

… 200 total

Context Matching

Graph cut + Poisson blending

Result Ranking We assign each of the 200 results a score which is the sum of: The scene matching distance The context matching distance (color + texture) The graph cut cost

… 200 scene matches

Which is the original?

Diffusion Result

Efros and Leung result

Scene Completion Result

im2gpsim2gps (Hays & Efros, CVPR 2008) 6 million geo-tagged Flickr images

How much can an image tell about its geographic location?

Nearest Neighbors according to gist + bag of SIFT + color histogram + a few others

Im2gps

Example Scene Matches

Voting Scheme

im2gps

Effect of Dataset Size

Population density ranking High Predicted Density Low Predicted Density

Where is This? [Olga Vesselova, Vangelis Kalogerakis, Aaron Hertzmann, James Hays, Alexei A. Efros. Image Sequence Geolocation. ICCV’09]

Where is This?

Where are These? 15:14, June 18 th, :31, June 18 th, 2006

Where are These? 15:14, June 18 th, :31, June 18 th, :24, June 19 th, 2006

Results im2gps – 10% (geo-loc within 400 km) temporal im2gps – 56%