Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí.

Slides:



Advertisements
Similar presentations
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Advertisements

Visualization of dynamic power and synchrony changes in high density EEG A. Alba 1, T. Harmony2, J.L. Marroquín 2, E. Arce 1 1 Facultad de Ciencias, UASLP.
Echo Generation and Simulated Reverberation R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Localizing Sounds. When we perceive a sound, we often simultaneously perceive the location of that sound. Even new born infants orient their eyes toward.
M.S. Student, Hee-Jong Hong
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Stereo Matching Segment-based Belief Propagation Iolanthe II racing in Waitemata Harbour.
3-D Sound and Spatial Audio MUS_TECH 348. Cathedral / Concert Hall / Theater Sound Altar / Stage / Screen Spiritual / Emotional World Subjective Music.
On Constrained Optimization Approach To Object Segmentation Chia Han, Xun Wang, Feng Gao, Zhigang Peng, Xiaokun Li, Lei He, William Wee Artificial Intelligence.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
ECCV 2002 Removing Shadows From Images G. D. Finlayson 1, S.D. Hordley 1 & M.S. Drew 2 1 School of Information Systems, University of East Anglia, UK 2.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
EE 7730 Image Segmentation.
A Cross-modal Electronic Travel Aid Device F. Fontana, A. Fusiello, M. Gobbi, V. Murino, D. Rocchesso, L. Sartor, A. Panuccio. Università di Verona Dipartimento.
Segmentación de mapas de amplitud y sincronía para el estudio de tareas cognitivas Alfonso Alba 1, José Luis Marroquín 2, Edgar Arce 1 1 Facultad de Ciencias,
Multi video camera calibration and synchronization.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Spacecraft Stereo Imaging Systems Group S3. Variables Separation of the cameras Height of the cameras – relative to the bench Angle – The direction cameras.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
EEG synchrony pattern segmentation for the exploratory analysis of cognitive experiments Alfonso Alba1, José Luis Marroquín2, Edgar Arce1 1 Facultad de.
1 Manipulating Digital Audio. 2 Digital Manipulation  Extremely powerful manipulation techniques  Cut and paste  Filtering  Frequency domain manipulation.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Computer Vision Lecture 3: Digital Images
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
1 Formation et Analyse d’Images Session 7 Daniela Hall 7 November 2005.
Joint Histogram Based Cost Aggregation For Stereo Matching Dongbo Min, Member, IEEE, Jiangbo Lu, Member, IEEE, Minh N. Do, Senior Member, IEEE IEEE TRANSACTION.
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Introduction Autostereoscopic displays give great promise as the future of 3D technology. These displays spatially multiplex many views onto a screen,
ICPR/WDIA-2012 High Quality Novel View Synthesis Based on Low Resolution Depth Image and High Resolution Color Image Jui-Chiu Chiang, Zheng-Feng Liu, and.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
September 5, 2013Computer Vision Lecture 2: Digital Images 1 Computer Vision A simple two-stage model of computer vision: Image processing Scene analysis.
3-D Sound and Spatial Audio MUS_TECH 348. Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.
Developing a model to explain and stimulate the perception of sounds in three dimensions David Kraljevich and Chris Dove.
Audio Systems Survey of Methods for Modelling Sound Propagation in Interactive Virtual Environments Ben Tagger Andriana Machaira.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
Hearing Research Center
Human Detection and Localization of Sounds in Complex Environments W.M. Hartmann Physics - Astronomy Michigan State University QRTV, UN/ECE/WP-29 Washington,
APECE-505 Intelligent System Engineering Basics of Digital Image Processing! Md. Atiqur Rahman Ahad Reference books: – Digital Image Processing, Gonzalez.
Reference books: – Digital Image Processing, Gonzalez & Woods. - Digital Image Processing, M. Joshi - Computer Vision – a modern approach, Forsyth & Ponce.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.
Hearing in Distance Or Where is that sound? Today: Isabelle Peretz Musical & Non-musical Brains Nov. 12 noon + Lunch Rm 2068B South Building.
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Fletcher’s band-widening experiment (1940)
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
1 2D TO 3D IMAGE AND VIDEO CONVERSION. INTRODUCTION The goal is to take already existing 2D content, and artificially produce the left and right views.
3-D Sound and Spatial Audio MUS_TECH 348. What do these terms mean? Both terms are very general. “3-D sound” usually implies the perception of point sources.
Sound Localization and Binaural Hearing
Arithmetic and Geometric Transformations (Chapter 2) CS474/674 – Prof. Bebis.
Auditory Localization in Rooms: Acoustic Analysis and Behavior
PSYCHOACOUSTICS A branch of psychophysics
FM Hearing-Aid Device Checkpoint 2
A special case of calibration
Volume 62, Issue 1, Pages (April 2009)
Volume 62, Issue 1, Pages (April 2009)
Localizing Sounds.
Intensity Waves and Sound
Image Registration  Mapping of Evolution
Presentation transcript:

Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí

Contents Project description Estimation of disparity maps Segmentation of disparity maps Object sonification Test application Preliminary results Future work

Project description The goal of this project is to develop a scene sonification system for the visually impaired. Images from a stereo camera pair will be used to detect objects in the scene and estimate the distance between them and the subject. A binaural audio signal will be synthesized for each object, so that the subject can “hear” the objects in the scene in their corresponding locations.

Scene sonification system The system will consist of the following stages: – Stereo image acquisition – Disparity map estimation – Disparity map segmentation (object detection) – Binaural sonification of objects in the scene Here we will focus only on the segmentation of a given disparity map, and sonification stages.

Estimation of disparity maps Images from a pair of cameras, separated by a certain distance, form a stereo image pair. The position of a certain object in one of the images will be shifted in the other image by an amount inversely proportional to the distance between the object and the camera arrangement. This displacement is called disparity, and can be computed for each pixel to form a disparity map. We are currently working on a technique to compute disparity maps in realtime.

Segmentation of disparity maps Given a disparity map D(x,y), we perform a seeded region-growing segmentation to detect the objects in the scene. To choose the seeds, the algorithm uses a fitness measure given by where N(x,y) is the set of nearest-neighbors of (x,y), and q is a quality parameter (increases robustness to noise). This measure favors homogeneous regions (low d q )with the highest disparity (nearest objects).

Region-growing algorithm Take a pixel from a region’s border. For each unlabeled neighbor, compare its intensity to the region’s average intensity If they are similar enough, include the neighbor in the region

Object sonification Sound coming from a specific location will suffer a series of degradations before it reaches our ears. These degradations provide various cues that our brain uses to locate the sound sorce. Binaural spatialization attempts to model these cues, in order to allow the listener to hear a sound as if it were coming from a specific point in space, which is typically defined in spherical coordinates (see below).

Object sonification We represent each object in the scene with a ping-like sound whose frequency depends on the disparity, so that the sound becomes more alerting as the object becomes closer. The audio signal corresponding to each object is fed through a binaural spatialization system whose parameters depend on the object’s position. Spatialization is performed by modeling azimuth and range cues. Elevation cues have not been implemented (yet).

Azimuth cues Inter-aural Time Difference: – The sound source is delayed by a different amount for each ear: T n = a – a sin(  ), T f = a + a . Inter-aural Level Difference (head-shadow): – The sound is attenuated when passing through the head. – This cue can be modeled with a one-pole one-zero filter: Brown et al., 1998

Range cues Artificial Reverberation: – Reverberation is the result of a large number of echoes originated from the reflection of the sound in flat surfaces such as walls. – The level of reverberation is roughly constant and independent of source location. – We use a simple model composed of 4 parallel delay lines with feedback. Attenuation: – The audio signal is attenuated according to the inverse quadratic law. – The ratio between the signal and reverberation levels provides an additional cue for range.

Test application We simulate a moving scene by taking a 160 x 100 sub-frame of a precomputed disparity map. The 10 most relevant objects are segmented but only objects that are near enough are sonified.

Preliminary results Fast segmentation times – 5 ms per 160 x 100 frame in a 2.4 GHz dual core CPU – Over 100 frames per second including sonification stage (but without disparity map estimation) – Embedded implementation is viable Good azimuth representation: object direction is easily perceived. Object range is perceived in a relative manner (e.g., one object is nearer than another), but not in an absolute way. Between 3 and 5 objects can be sonified before too much clutter is heard.

Future work Camera setup and calibration Realtime estimation of disparity maps Elevation cues in binaural spatialization Optimization of sonification system Implementation in an embedded device

Thank you!