Visual Data on the Internet With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger 15-463: Computational Photography Jean-Francois.

Slides:

Advertisements

Similar presentations

James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.

Advertisements

Similarity and Difference

CS 691 Computational Photography

Computer Vision CS 143, Brown James Hays

Recap from Monday Frequency domain analytical tool computational shortcut compression tool.

Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem

Texture. Edge detectors find differences in overall intensity. Average intensity is only simplest difference. many slides from David Jacobs.

CS324e - Elements of Graphics and Visualization Color Histograms.

Computer Vision – Image Representation (Histograms)

Data-driven Visual Similarity for Cross-domain Image Matching

Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.

IMAGE RESTORATION AND REALISM MILLIONS OF IMAGES SEMINAR YUVAL RADO.

Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.

Image Similarity and the Earth Mover’s Distance Empirical Evaluation of Dissimilarity Measures for Color and Texture Y. Rubner, J. Puzicha, C. Tomasi and.

Recognition: A machine learning approach

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.

A Study of Approaches for Object Recognition

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

Let Computer Draw Qingyuan Kong. Goal Give me a picture “Obama stands in front of a pyramid”

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.

Lecture 2: Image filtering

Extreme, Non-parametric Object Recognition 80 million tiny images (Torralba et al)

Opportunities of Scale Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Comparing Images Adopted from Frederic Heger : Computational Photography Alexei Efros, CMU, Fall 2008.

Efficient Image Search and Retrieval using Compact Binary Codes

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Overview Introduction to local features

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Applications of Image Filters Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/04/10.

Computer vision.

Computer Vision James Hays, Brown

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Human abilities Presented By Mahmoud Awadallah 1.

CAP5415: Computer Vision Lecture 4: Image Pyramids, Image Statistics, Denoising Fall 2006.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Computational Photography Tamara Berg Features. Representing Images Keep all the pixels! Pros? Cons?

INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.

Lecture 4: Feature matching CS4670 / 5670: Computer Vision Noah Snavely.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Image Categorization Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/11.

IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.

Why is computer vision difficult?

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

2D Texture Synthesis Instructor: Yizhou Yu. Texture synthesis Goal: increase texture resolution yet keep local texture variation.

MIT AI Lab / LIDS Laboatory for Information and Decision Systems & Artificial Intelligence Laboratory Massachusetts Institute of Technology A Unified Multiresolution.

Internet-scale Imagery for Graphics and Vision James Hays cs129 Computational Photography Brown University, Spring 2011.

Tentative Future Courses Fall `11 : Computer Vision – emphasis on recognition Spring `11 : Graduate seminar Fall `12 : Computational Photography.

Texture Analysis and Synthesis. Texture Texture: pattern that “looks the same” at all locationsTexture: pattern that “looks the same” at all locations.

Classical Computer Vision: Feature Engineering

Lecture 13: Feature Descriptors and Matching

CS 4501: Introduction to Computer Vision Sparse Feature Detectors: Harris Corner, Difference of Gaussian Connelly Barnes Slides from Jason Lawrence, Fei.

Opportunities of Scale, Part 2

Object detection as supervised classification

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

CS 1674: Intro to Computer Vision Scene Recognition

Rob Fergus Computer Vision

Presentation transcript:

Visual Data on the Internet With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger : Computational Photography Jean-Francois Lalonde, CMU, Fall 2012 Visualization of 53,464 english nouns, credit: A. Torralba,

The Art of Cassandra Jones

Big Issues What is out there on the Internet? How do we get it? What can we do with it?

Subject-specific Data Photos of Coliseum (Snavely et al.) Portraits of Bill Clinton

Much of Captured World is “Generic”

Generic Data pedestriansfaces street scenesFood plates

The Internet as a Data Source

How big is Flickr? Credit: Franck_Michel ( 100M photos updated daily 6B photos as of August 2011! ~3B public photos

How Annotated is Flickr? (tag search) Party – 23,416,126 Paris – 11,163,625 Pittsburgh – 1,152,829 Chair – 1,893,203 Violin – 233,661 Trashcan – 31,200

“Trashcan” Results m=tags&z=t&page=5http:// m=tags&z=t&page=5

Big Issues What is out there on the Internet? How do we get it? What can we do with it? Let’s see a motivating example...

[Hays and Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007 and CACM October 2008.]

Diffusion Result

Efros and Leung result

Scene Matching for Image Completion

Scene Completion Result

The Algorithm

Scene Matching

Scene Descriptor

Scene Gist Descriptor (Oliva and Torralba 2001)

Scene Descriptor + Scene Gist Descriptor (Oliva and Torralba 2001)

2 Million Flickr Images

… 200 total

Context Matching

Graph cut + Poisson blending

Result Ranking We assign each of the 200 results a score which is the sum of: The scene matching distance The context matching distance (color + texture) The graph cut cost

… 200 scene matches

Nearest neighbors from a collection of 20 thousand images

Nearest neighbors from a collection of 2 million images

“Unreasonable Effectiveness of Data” Parts of our world can be explained by elegant mathematics physics, chemistry, astronomy, etc. But much cannot psychology, economics, genetics, etc. Enter The Data! Great advances in several fields: –e.g. speech recognition, machine translation –Case study: Google [Halevy, Norvig, Pereira 2009]

A.I. for the postmodern world: all questions have already been answered…many times, in many ways Google is dumb, the “intelligence” is in the data

How about visual data? Text is simple: clean, segmented, compact, 1D Visual data is much harder: Noisy, unsegmented, high entropy, 2D/3D Quick Overview Comparing Images Uses of Visual Data The Dangers of Data

Distance Metrics = Euclidian distance of 5 units = Gray value distance of 50 values = ? x y x y

SSD says these are not similar ?

Tiny Images A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: a large dataset for non-parametric object and scene recognition,” PAMI, 2008.

Image Segmentation (by humans)

Human Scene Recognition

Tiny Images Project Page

Powers of 10 Number of images on my hard drive: 10 4 Number of images seen during my first 10 years: 10 8 (3 images/second * 60 * 60 * 16 * 365 * 10 = ) Number of images seen by all humanity: ,456,367,669 humans 1 * 60 years * 3 images/second * 60 * 60 * 16 * 365 = 1 from Number of photons in the universe: Number of all 8-bits 32x32 images: *32*3 ~

Scenes are unique

But not all scenes are so original

A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008

Automatic Colorization Result Grayscale input High resolution Colorization of input using average A. Torralba, R. Fergus, W.T.Freeman. 2008

Automatic Orientation Many images have ambiguous orientation Look at top 25% by confidence correlation score Examples of high and low confidence images

Automatic Orientation Examples A. Torralba, R. Fergus, W.T.Freeman. 2008

Tiny Images Discussion Why SSD? Can we build a better image descriptor?

Space Shuttle Cargo Bay Image Representations: Histograms global histogram Represent distribution of features Color, texture, depth, … Images from Dave Kauchak

Image Representations: Histograms Joint histogram Requires lots of data Loss of resolution to avoid empty bins Images from Dave Kauchak Marginal histogram Requires independent features More data/bin than joint histogram

Space Shuttle Cargo Bay Image Representations: Histograms Adaptive binning Better data/bin distribution, fewer empty bins Can adapt available resolution to relative feature importance Images from Dave Kauchak

EASE Truss Assembly Space Shuttle Cargo Bay Image Representations: Histograms Clusters / Signatures “super-adaptive” binning Does not require discretization along any fixed axis Images from Dave Kauchak

Issue: How to Compare Histograms? Bin-by-bin comparison Sensitive to bin size. Could use wider bins … … but at a loss of resolution Cross-bin comparison How much cross-bin influence is necessary/sufficient?

Red Car Retrievals (Color histograms) Histogram matching distance

Capturing the “essence” of texture …for real images We don’t want an actual texture realization, we want a texture invariant What are the tools for capturing statistical properties of some signal?

Multi-scale filter decomposition Filter bank Input image

Filter response histograms

Heeger & Bergen ‘95 Start with a noise image as output Main loop: Match pixel histogram of output image to input Decompose input and output images using multi-scale filter bank (Steerable Pyramid) Match sub-band histograms of input and output pyramids Reconstruct input and output images (collapse the pyramids)

Image Descriptors Blur + SSD Color / Texture histograms Gradients + Histogram (GIST, SIFT, HOG, etc) “Bag of Visual Words”