Salient Object Detection by Composition

Slides:

Advertisements

Similar presentations

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Ming-Ming Cheng 1 Ziming Zhang 2 Wen-Yan Lin 3 Philip H. S. Torr 1 1 Oxford University, 2 Boston University 3 Brookes Vision Group Training a generic objectness.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.

Recognition: A machine learning approach

Stas Goferman Lihi Zelnik-Manor Ayellet Tal. …

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Hierarchical Region-Based Segmentation by Ratio-Contour Jun Wang April 28, 2004 Course Project of CSCE 790.

Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.

Generic Object Recognition -- by Yatharth Saraf A Project on.

A Study of Approaches for Object Recognition

1 Learning to Detect Natural Image Boundaries David Martin, Charless Fowlkes, Jitendra Malik Computer Science Division University of California at Berkeley.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Spatial Pyramid Pooling in Deep Convolutional

© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.

04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Learning a Fast Emulator of a Binary Decision Process Center for Machine Perception Czech Technical University, Prague ACCV 2007, Tokyo, Japan Jan Šochman.

RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Object Detection 01 – Advance Hough Transformation JJCAO.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Object Detection with Discriminatively Trained Part Based Models

DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Geodesic Saliency Using Background Priors

Object detection, deep learning, and R-CNNs

Creating Better Thumbnails Chris Waclawik. Project Motivation Thumbnails used to quickly select a specific a specific image from a set (when lacking appropriate.

Human Re-identification by Matching Compositional Template with Cluster Sampling Yuanlu Xu 1, Liang Lin 1, Wei-Shi Zheng 1, Xiaobai Liu 2 Abstract This.

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

Stas Goferman Lihi Zelnik-Manor Ayellet Tal Technion.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

BEYOND SLIDING WINDOW: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann.

776 Computer Vision Jan-Michael Frahm Spring 2012.

More sliding window detection: Discriminative part-based models

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.

Copyright ©2008, Thomson Engineering, a division of Thomson Learning Ltd.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Lecture 26 Hand Pose Estimation Using a Database of Hand Images

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Enhanced-alignment Measure for Binary Foreground Map Evaluation

Recognizing and Learning Object categories

Rob Fergus Computer Vision

Brief Review of Recognition + Context

Presentation transcript:

Salient Object Detection by Composition Jie Feng1, Yichen Wei2, Litian Tao3, Chao Zhang1, Jian Sun2 1Key Laboratory of Machine Perception, Peking University 2Microsoft Research Asia 3Microsoft Search Technology Center Asia

A key vision problem: object detection Fundamental for image understanding Extremely challenging Huge number of object classes Huge variations in object appearances

What are salient objects? Visually distinctive and semantically meaningful Inherently ambiguous and subjective It’s not easy to define what is a salient object. Conceptually, a salient object is …. This definition is still very ambiguous. Let’s look at a few examples. Yes! Yes? probably No!

Why detect salient objects? Relatively easy: large and distinct Semantically important Image summarization, cropping… Object level matching, retrieval… A generic object detector for later recognition avoid running thousands of different detectors a scalable system for image understanding It’s relatively easy to find salient objects than other ones because they are …

Traditional approach: saliency map Measures per-pixel importance Loses information and deficient to find objects

sliding window object detection Face, human… Car, bus… Horse, dog… Table, couch… … millions of windows × thousands of object classes Slide different size windows over all positions Evaluate a quality function, e.g., a car classifier Output windows those are locally optimum

Salient object detection by composition A ‘composition’ based window saliency measure intuitive and generalizes to different objects A sliding window based generic object detector fast and practical: 1-2 seconds per image a few dozens/hundreds output windows Effective pre-processing for later recognition tasks

It is hard to represent a salient window Given image I and window W saliency(W) = cost of composing W using (I-W)

Benefits of ‘composition’ definition More information → better estimation from pixels to windows use entire image as context Less dependent on Background is homogeneous? Object has strong and continuous boundary? Object is spatially connected? Better generalization ability

Part based representation Each part S has an (inside/outside) area A(S) Each part pair (p, q) has a composition cost c(p, q)

Generate parts by over-segmentation Typically 100-200 segments in a natural image P.F.Felzenszwalb and D.P.Huttenlocher. Efficient graph-based image segmentation. IJCV, 2004

An illustrative ‘composition’ example W={A, B, C D, E} a c C saliency(W)= cost(A,a) +cost(B,b) +cost(C,c) +cost(D,d) +cost(E,e) b A B d D e E

Computational principles Appearance proximity Spatial proximity Non-reusability Non-scale-bias Intuitive perceptions about saliency

1. Appearance proximity q1 q2 p c(p, q1)=0.6 c(p, q2)=0.2 Salient parts have distinct appearances q1 and q2 are equally distant from p, q2 is more similar

2. Spatial proximity q2 p q1 c(p, q2)=0.2 c(p, q1)=0.3 Salient parts are far from similar parts q1 and q2 are equally similar as p, q2 is closer

3. Non-reusability An outside part can be used only once Robust to background clutters

4. Non-scale-bias 0.3 0.6 Normalized by window area and avoid large window bias tight bounding box > loose one

Define composition cost c(p, q) 𝑑 𝑎 (𝑝,𝑞) : appearance dissimilarity LAB color histogram distance 𝑑 𝑚𝑎𝑥 : maximum of all 𝑑 𝑎 (𝑝,𝑞) within the image 𝑑 𝑠 (𝑝, 𝑞) : spatial distance normalized Hausdorff distance 𝑐 𝑝,𝑞 = 1− 𝑑 𝑠 𝑝,𝑞 ∗ 𝑑 𝑎 𝑝,𝑞 + 𝑑 𝑠 𝑝,𝑞 ∗ 𝑑 𝑚𝑎𝑥 it is small when both 𝑑 𝑎 (𝑝,𝑞) and 𝑑 𝑠 (𝑝, 𝑞) are small

Part based composition Finding outside parts with the same area of inside parts and smallest composition cost Need to find which outside part to compose which inside part with how much area Formulated as an Earth Mover’s Distance (EMD) optimal solution has polynomial (cubic) complexity A greedy optimization pre-computation + incremental sliding window update

Greedy composition algorithm Input: window 𝑊, inside/outside segments 𝑆 𝑖 / 𝑆 𝑜 and their initial areas 𝐴( 𝑆 𝑖/𝑜 ) Output: cost 𝐶 of composing 𝑆 𝑖 using 𝑆 𝑜 for each 𝑝∈{ 𝑆 𝑖 } for each 𝑞∈{ 𝑆 𝑜 } (in ascending order of 𝑐 𝑝,𝑞 ) if 𝑝 still has area left update areas in 𝐴 𝑝 , 𝐴 𝑞 that are composed 𝐶=𝐶+𝑐 𝑝,𝑞 ∗𝑐𝑜𝑚𝑝𝑜𝑠𝑒𝑑 𝑎𝑟𝑒𝑎 𝐶=𝐶/|𝑊|

Algorithm pseudo code

Pre-computation and initialization Pre-compute all 𝑐 𝑝,𝑞 For each segment p, store a list of other segments in ascending order of 𝑐 𝑝, ∗ Initialize segment areas inside/outside 𝑊 Efficient histogram based sliding window, Yichen Wei and Litian Tao, CVPR 2010 Incremental update of segment areas

More implementation details 6 window sizes: 2% to 50% of image area 7 aspect ratios: 1:2 to 2:1 100-200 segments 1-2 seconds for 300 by 300 image Find local optimal windows by non-maximum suppression

Evaluation on PASCAL VOC 07 it’s for object detection 20 object classes Large object and background variation Challenging for traditional saliency methods not totally suitable for salient object detection Not all labeled objects are salient: small, occluded, repetitive Not all salient objects are labeled: only 20 classes but still the best database we have

Yellow: correct, Red: wrong, Blue: ground truth top 5 salient windows

Yellow: correct, Red: wrong, Blue: ground truth

Yellow: correct, Red: wrong, Blue: ground truth

Yellow: correct, Red: wrong, Blue: ground truth

Outperforms the state-of-the-art Objectness: B.Alexe, T.Deselaers, and V.Ferrari. What is an object. In CVPR, 2010. Uses mainly local cues: find locally salient windows that are globally not

Yellow: correct, Red: wrong, Blue: ground truth ours objectness

Yellow: correct, Red: wrong, Blue: ground truth ours ours objectness objectness

Failure cases: too complex

Failure cases: lack of semantics Partial background with object: man with background Not annotated objects: painting, pillows Similar objects together: two chairs

Failure cases: lack of semantics Partial object or object parts: wheels and seat

#windows V.S. detection rate #top windows 5 10 20 30 50 recall 0.25 0.33 0.44 0.5 0.57 Find many objects within a few windows A practical pre-processing tool

Evaluation on MSRA database Less challenging: only a single large object T.Liu, J.Sun, N.Zheng, X.Tang, and H.Shum. Learning to detect a salient object. In CVPR, 2007 Use the most salient window of our approach in evaluation pixel level precision/recall is comparable with previous methods Our approach is principled for multi-object detection benefits less from the database’s simplicity than previous methods

Summary A novel ‘composition’ based saliency measure pixel saliency → window saliency a saliency map → a generic (salient) object detector State-of-the-art accuracy and performance Future work better feature/composition algorithm learning a discriminative generic object classifier