A computational model of stereoscopic 3D visual saliency School of Electronic Information Engineering Tianjin University 1 Wang Bingren.

Slides:



Advertisements
Similar presentations
Chapter 10: Perceiving Depth and Size
Advertisements

Leveraging Stereopsis for Saliency Analysis
Hierarchical Saliency Detection School of Electronic Information Engineering Tianjin University 1 Wang Bingren.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
A Multicamera Setup for Generating Stereo Panoramic Video Tzavidas, S., Katsaggelos, A.K. Multimedia, IEEE Transactions on Volume: 7, Issue:5 Publication.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Introduction to Image Quality Assessment
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Effects of Viewing Geometry on Combination of Disparity and Texture Gradient Information Michael S. Landy Martin S. Banks James M. Hillis.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Virtual Control of Optical Axis of the 3DTV Camera for Reducing Visual Fatigue in Stereoscopic 3DTV Presenter: Yi Shi & Saul Rodriguez March 26, 2008.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.
Monocular vs. Binocular View Monocular view: one eye only! Many optical instruments are designed from one eye view. Binocular view: two eyes with each.
1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts.
Speaker: Chi-Yu Hsu Advisor: Prof. Jian-Jung Ding Leveraging Stereopsis for Saliency Analysis, CVPR 2012.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
CAP4730: Computational Structures in Computer Graphics 3D Concepts.
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
3D SLAM for Omni-directional Camera
The effects of relevance of on-screen information on gaze behaviour and communication in 3-party groups Emma L Clayes University of Glasgow Supervisor:
黃文中 Introduction The Model Results Conclusion 2.
A Viable Implementation of a Comparison Algorithm for Regions of Interest John P. Heminghous Computer Science Clemson University
Assessment of Computational Visual Attention Models on Medical Images Varun Jampani 1, Ujjwal 1, Jayanthi Sivaswamy 1 and Vivek Vaidya 2 1 CVIT, IIIT Hyderabad,
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 8 Seeing Depth.
Physiological Depth Cues – Convergence. Physiological Depth Cues – Convergence – small angle of convergence = far away – large angle of convergence =
“When” rather than “Whether”: Developmental Variable Selection Melissa Dominguez Robert Jacobs Department of Computer Science University of Rochester.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CS332 Visual Processing Department of Computer Science Wellesley College Binocular Stereo Vision Region-based stereo matching algorithms Properties of.
Stereo Viewing Mel Slater Virtual Environments
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Modeling Visual Search Time for Soft Keyboards Lecture #14.
Just Noticeable Difference Estimation For Images with Structural Uncertainty WU Jinjian Xidian University.
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Evaluating Perceptual Cue Reliabilities Robert Jacobs Department of Brain and Cognitive Sciences University of Rochester.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
Perception and VR MONT 104S, Fall 2008 Lecture 8 Seeing Depth
Wonjun Kim and Changick Kim, Member, IEEE
Optimal Eye Movement Strategies In Visual Search.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
Digital Image Processing CSC331
 Mentor : Prof. Amitabha Mukerjee Learning to Detect Salient Objects Team Members - Avinash Koyya Diwakar Chauhan.
1 Computational Vision CSCI 363, Fall 2012 Lecture 18 Stereopsis III.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
Independent Component Analysis features of Color & Stereo images Authors: Patrik O. Hoyer Aapo Hyvarinen CIS 526: Neural Computation Presented by: Ajay.
1 2D TO 3D IMAGE AND VIDEO CONVERSION. INTRODUCTION The goal is to take already existing 2D content, and artificially produce the left and right views.
ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.
Digital Video Representation Subject : Audio And Video Systems Name : Makwana Gaurav Er no.: : Class : Electronics & Communication.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
A Novel 2D-to-3D Conversion System Using Edge Information
Summary of “Efficient Deep Learning for Stereo Matching”
Structure-measure: A New Way to Evaluate Foreground Maps
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Common Classification Tasks
Perception We have previously examined the sensory processes by which stimuli are encoded. Now we will examine the ultimate purpose of sensory information.
Measuring Gaze Depth with an Eye Tracker During Stereoscopic Display
FOCUS PRIOR ESTIMATION FOR SALIENT OBJECT DETECTION
Presentation transcript:

A computational model of stereoscopic 3D visual saliency School of Electronic Information Engineering Tianjin University 1 Wang Bingren

Depth map and depth saliency map generation 2 Previous works on 3D visual attention modeling 1 Saliency maps combination 3 Performance assessment 4 5 OUTLINE 2 Brief introduction Eye-tracking database 6

3 Ⅰ、 Previous works on 3D visual attention modeling Two questions need to be addressed when developing a 3D visual attention model:  The influence of 2D visual features  The influence of depth on visual attention deployment in 3D viewing condition The first question concerns the possibility of adapting existing 2D visual attention models to 3D cases; the second question concerns the means by which depth information can be taken into account.

4 How the deployment of 3D visual attention is affected by various visual features: previous experimental studies PapersVisual featuresOperationsFindings Jansen et al. [24]depth informationconducted a free- viewing task on the 2D and 3D versions of the same set of images the importance of 2D visual feature detection in the design of a 3D visual attention model mean luminance , luminance contrast , texture contrast The possibility of adapting existing 2D visual attention models for the modeling of 3D visual attention Liu et al. [25] luminance contrast , luminance contrast focused on comparing visual features extracted from fixations and random locations in the viewing of 3D still images the values of some 2D visual features were generally higher at fixated areas Disparity contrast , disparity gradient disparity contrast and disparity gradient of fixated locations were lower than those at randomly selected locations Hakkinen et al. [21]binocular depth cue examined the difference in eye movement patterns between the viewing of 2D and 3D versions of the same video content eye movements are more widely distributed for 3D content depth information from the binocular depth cue provides viewers with additional information, and thus creates new salient areas in a scene

5 Ramasamy et al. [26] binocular depth cue the observers’ gaze points could be more concentrated when viewing the 3D version of some content (e.g. the scenes containing long deep hallway) Wang et al. [27]Depthexamined a so-called ‘depth-bias’ in the task-free viewing of still stereoscopic synthetic stimuli objects closest to the observer always attract the most fixations , This location prior indicates the possibility of integrating depth information by means of a weighting Wismeijer et al. [28] monocular perspective cues , binocular disparity cues present stimuli in which monocular perspective cues and binocular disparity cues conflicted a weighted linear combination of cues when the conflicts are small, and a cue dominance when the conflicts are large.

6 In the literature, all of computational models of 3D visual attention contain a stage in which 2D visual features are extracted and used to compute 2D saliency maps. These models can be classified into three different categories depending on the way they use depth information  Depth-weighting models  Depth-saliency models  Stereo-vision models

7

8  Depth-weighting models This type of models does not contain any depth- map-based feature-extraction processes. These models share a same step in which depth information is used as the weighting factor of the 2D saliency. The saliency of each location in the scene is directly related to its depth.

9  Depth-weighting models

10  Depth-saliency models The models in this category take depth saliency as additional information. This type of models relies on the existence of “depth saliency maps”. Depth features are first extracted from the depth map to create additional feature maps, which are then used to generate the depth saliency maps. These depth saliency maps are finally combined with 2D saliency maps by using a saliency map pooling strategy to obtain a final 3D saliency map.

11  Depth-saliency models

12  Stereo-vision models Instead of directly using a depth map, this type of models takes into account the mechanisms of the stereoscopic perception in the HVS. Images from both views are taken as input, from which 2D visual features can be considered. Most of the existing 3D visual attention models belong to the first and the second categories. A limitation of depth-weighting models is that they might fail to detect certain salient areas caused by depth features only.

13 Ⅱ、 Brief introduction Motivation  Lacking of 3D visual attention models that quantitatively integrate experimental observation results.  Lacking of eye-tracking database of 3D natural-content images containing various types of objects and scenes. (i.e. lacking of ground truth).  There is still not a strong agreement on how depth information should be used in 3D visual attention modeling.

14 This paper proposed a depth-saliency-based model of 3D visual attention.  Apply Bayes’s theory on the result of an eye-tracking experiment using synthetic stimuli to model the correlation between depth features and the level of depth saliency.  Conduct a binocular eye-tracking experiment on 3D natural content images to create ground-truth.  Given this ground truth, two methods to integrate depth information are also examined in this paper: a typical depth-weighting method and the propose depth saliency method.

15 A framework of computational model of 3D visual attention based on depth saliency

16 Ⅲ、 Depth map and depth saliency map generation A. Depth map creation B. A Bayesian approach of depth saliency map generation

17 A. Depth map creation transformation from a disparity map to a depth map The relationship between disparity (in pixel) and perceived depth can be modeled by the following equation: where D represents the perceived depth, V represents the viewing distance between observer and screen plane, I represents the interocular distance, is set to 6.3 cm, P is the disparity in pixels, W and Rx represent the width (in cm) and the horizontal resolution of the screen, respectively.

18 the proposed definition of depth saliency: the depth saliency (S) of each location (a pixel) equals the probability of this point being gazed at, given the depth features observed from this point: B. A Bayesian approach of depth saliency map generation By using Bayes’ rule, we can obtain:

19  Depth feature extraction  Probability distribution modeling proposed approach consists of two stages:

20  Depth feature extraction this paper particularly focused on using only depth contrast as the feature for depth saliency map prediction. Difference of Gaussians (DoG) filter is applied to the depth map for extracting depth contrast The DoG filters used in the proposed model were generated by:

21  Probability distribution modeling We propose to model this function using a probability- learning of eye movement data collected from a free- viewing eye-tracking experiment. In our study, synthetic stimuli were used for the eye- tracking experiment. These stimuli consisted of 3D scenes in which a background and some identical objects were deliberately displayed at different depth plane.

22

23

24 Ⅳ、 Saliency maps combination A. 2D saliency map generation B. Saliency maps combination

25 A. 2D saliency map generation Three bottom-up visual attention models using quite different mechanisms were used to perform the 2D saliency prediction.  Itti’s model  AIM model from Bruce  Hou’s model In the proposed model, 2D saliency computation is only performed based on the image from the left view which is selected arbitrarily.

26 B. Saliency maps combination where

27 Ⅴ、 Eye-tracking database So far, the lack of ground truth has limited the studies of computational models of 3D visual attention. To evaluate the performance of computational models, we create and publish a new eye-tracking database containing eighteen stereoscopic natural content images, the corresponding disparity maps, and the eye movement data for both eyes. A. Stimuli B. Apparatus and procedures C. Participants D. Fixation density map creation

28 A. Stimuli The Middlebury 2005/2006 dataset :

29 1). Stereo window violation removal 2). Disparity map refining We shifted the left view to left, the right view to the right. This shifting of the two views in opposite directions equals to adding a constant negative disparity for every pixel in the two views. The amount of added disparity was calculated by:

30 The IVC 3D image dataset:

31 D. Fixation density map creation  All gaze points recorded by the eyetracker from both the left and the right eyes were used to create the fixation density maps. The gaze points maps from each eye were first created respectively.  The left gaze points map was created by directly using the coordinates of the gaze positions of the left eye.  We created the right gaze points map by adding a displacement, horizontally and vertically, on the coordinates of each right-eye gaze point.  The two gaze points maps were then summed and filtered by a two-dimensional Gaussian kernel

32 Ⅵ、 Performance assessment A. Quantitative metrics of assessment B. Performance of depth saliency map C. Added value of a depth saliency map D. Content-based analysis

33 A. Quantitative metrics of assessment There exists a range of different measures that are widely used to perform the comparison between saliency maps for 2D content. The most common ones include:  Pearson Linear Correlation Coefficient (PLCC)  Kullback-Leibler divergence (KLD)  the area under the receiver operating characteristics curve (AUC)

34 B. Performance of depth saliency map

35 C. Added value of a depth saliency map To compare the two different ways of making the most of depth information, the performance of the following methods were measured and compared:  No-depth method  Depth-weighting (DW) method  Depth-saliency (DS) method  Other reference methods 2D + Depth 2D + Depth Contrast 2D * DSM

36

37 D. Content-based analysis In order to further investigate the influence of DSM, an analysis regarding the performance and the added value of the DSM is performed. We compute the difference of the PLCC value for each image by Equation 7 and Equation 8:

38

39

40  Thank you !