Proceedings of the IEEE 2010 Antonio Torralba, MIT Jenny Yuen, MIT Bryan C. Russell, MIT
Outline Introduction Web Annotation and Data Statistics -A. Data Set Evolution and Distribution of Objects -B. Study of Online Labelers The Space of LabelMe Images -A. Distribution of Scene Types -B. The Space of Images -C. Recognition by Scene Alignment Beyond 2-D Images -A. From Annotations to 3-D -B. Video Annotation Conclusion
Introduction From small data set to large data set In 2005, an online tool LabelMe is created LabelMe provides functionalities for drawing polygons to outline the spatioal extent of object in images
Web Annotation and Data Statistics A. Data Set Evolution and Distribution of Objects B. Study of Online Labelers
The Features of LabelMe Database Object class recognition Learning about objects embedded in a scene High-quality labeling Many diverse object classes Many diverse images Many noncopyrighted images Open and dynamic
Data Set Evolution and Distribution of Objects(1/2) (a)Number of annotated objects (b)Number of images with at least one annotated object (c)Number of unique object descriptions
Data Set Evolution and Distribution of Objects(2/2)
Study of Online Labelers From July 7, 2008 to March 19, 2009 (a)Number of new annotations provided by individual users (b)Distribution of the length of time it takes to label an object
The Space of LabelMe Images A. Distribution of Scene Types B. The Space of Images C. Recognition by Scene Alignment
Distribution of Scene Types(1/1) Let’s start from cognitive psychology Next we study how many configurations of 4 objects are presented The distribution follows a power law ( n=1,2,4,8 )
The Space of Images(1/3)
Process of Defining Semantic Distance(2/3)
The Space of Images(3/3) A visualization of images that are fully annotated
Recognition by Scene Alignment When giving a new image as input, we use GIST descriptor to compute the distance
The Power of a Large Scale Database An algorithm provides an upper bound: find the nearest neighbor of input image as a labeling of the input image This result gives us a hint about “How many more images do we need to label”?
Beyond 2-D Images A. From Annotations to 3-D B. Video Annotation
From Annotations to 3-D(1/7) The label of objects now contains some implicit information observed by analyzing the overlap between object boundaries Object types Ground Objects Standing Objects Attached objects Relations between objects Supported-by Part-of
From Annotations to 3-D(2/7) Learning the relationship between objects 1) part-of : evaluate the frequency of high relative overlap between polygons 2)supported-by : have the bottom part of its polygon live inside the supporting object
From Annotations to 3-D(3/7)
From Annotations to 3-D(4/7) Reconstructing a 3D model for input image 1) define object type 2) define polygon edge type 3) compute the real distance between objects Object typeEdge type Ground objects(green)Contact(white) Standing objects(red)Attached(gray) Attached objects(yellow)Occlusion(black)
From Annotations to 3-D(5/7)
From Annotations to 3-D(6/7) The more labeling makes the quality better However, if the labeling goes wrong
From Annotations to 3-D(7/7)
Video Annotation(1/1)
Conclusion A web-based tool that allows the labeling of objects and their location in images LabelMe has collected a large annotated database of images with many different scene and object class LabelMe can recover the 3-D description of an image The next goal is expending the database of video and offering a promising direction of computer vision and computer graphics
References
There are a lot more references …