Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University.

Slides:



Advertisements
Similar presentations
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Advertisements

The fundamental matrix F
Acoustic Localization by Interaural Level Difference Rajitha Gangishetty.
Gratuitous Picture US Naval Artillery Rangefinder from World War I (1918)!!
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Lecture 8: Stereo.
Last Time Pinhole camera model, projection
Computer Vision : CISC 4/689 Adaptation from: Prof. James M. Rehg, G.Tech.
High-Quality Video View Interpolation
Stereopsis Mark Twain at Pool Table", no date, UCR Museum of Photography.
The plan for today Camera matrix
CS 223b 1 More on stereo and correspondence. CS 223b 2 =?f g Mostpopular For each window, match to closest window on epipolar line in other image. (slides.
Stereo and Structure from Motion
Stereo Computation using Iterative Graph-Cuts
3D Computer Vision and Video Computing 3D Vision Lecture 15 Stereo Vision (II) CSC 59866CD Fall 2004 Zhigang Zhu, NAC 8/203A
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER 2011 Qian Zhang, King Ngi Ngan Department of Electronic Engineering, the Chinese university.
Announcements PS3 Due Thursday PS4 Available today, due 4/17. Quiz 2 4/24.
Stereo matching “Stereo matching” is the correspondence problem –For a point in Image #1, where is the corresponding point in Image #2? C1C1 C2C2 ? ? C1C1.
Stereo matching Class 10 Read Chapter 7 Tsukuba dataset.
Manhattan-world Stereo Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Depth and Motion Discontinuities Stan Birchfield Ph.D. oral defense Stanford University January 1999.
Structure from images. Calibration Review: Pinhole Camera.
Fast Approximate Energy Minimization via Graph Cuts
Surface Stereo with Soft Segmentation Michael Bleyer 1, Carsten Rother 2, Pushmeet Kohli 2 1 Vienna University of Technology, Austria 2 Microsoft Research.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
A Local Adaptive Approach for Dense Stereo Matching in Architectural Scene Reconstruction C. Stentoumis 1, L. Grammatikopoulos 2, I. Kalisperakis 2, E.
Graph Cut Algorithms for Binocular Stereo with Occlusions
Graph Cut 韋弘 2010/2/22. Outline Background Graph cut Ford–Fulkerson algorithm Application Extended reading.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
Epipolar geometry Epipolar Plane Baseline Epipoles Epipolar Lines
Geometry 3: Stereo Reconstruction Introduction to Computer Vision Ronen Basri Weizmann Institute of Science.
Stereo Vision ECE 847: Digital Image Processing Stan Birchfield Clemson University.
Stereo Many slides adapted from Steve Seitz.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
A Acoustic Source Direction by Hemisphere Sampling Stanley T. Birchfield Daniel K. Gillmor Quindi Corporation Palo Alto, California.
Computer Vision, Robert Pless
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Computer Vision Stereo Vision. Bahadir K. Gunturk2 Pinhole Camera.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.
Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Solving for Stereo Correspondence Many slides drawn from Lana Lazebnik, UIUC.
Jeong Kanghun CRV (Computer & Robot Vision) Lab..
Journal of Visual Communication and Image Representation
Fast Bayesian Acoustic Localization
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
Advanced Computer Vision Chapter 11 Stereo Correspondence Presented by: 蘇唯誠 指導教授 : 傅楸善 博士.
Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
CSE 185 Introduction to Computer Vision Stereo 2.
Multiview geometry ECE 847: Digital Image Processing Stan Birchfield Clemson University.
Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
Stereo Vision ECE 847: Digital Image Processing Stan Birchfield
A Unifying Framework for Acoustic Localization
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
Markov Random Fields with Efficient Approximations
STEREOPSIS The Stereopsis Problem: Fusion and Reconstruction
Geometry 3: Stereo Reconstruction
EECS 274 Computer Vision Stereopsis.
Multiway Cut for Stereo and Motion with Slanted Surfaces
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Computer Vision Stereo Vision.
Image and Video Processing
Chapter 11: Stereopsis Stereopsis: Fusing the pictures taken by two cameras and exploiting the difference (or disparity) between them to obtain the depth.
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Human Spatial Sensing The five senses: Hearing Taste Touch Smell Seeing f(t) f(x,y,,t)

Visual and Auditory Pathways

Two Problems in Spatial Sensing Stereo VisionAcoustic Localization

Clemson Vision Laboratory head tracking root detectionreconstruction highway monitoring motion segmentation

Clemson Vision Lab (cont.) microphone position calibration speaker localization

Stereo Vision INPUT OUTPUT LeftRight Disparity mapDepth discontinuities epipolar constraint

Epipolar Constraint Left cameraRight camera world point center of projection epipolar plane epipolar line

Energy Minimization Left Right intensity occluded pixels minimize: dissimilarity discontinuity penalty (underconstrained) constraint

History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 MULTIWAY-CUT (2D) DYNAMIC PROGRAMMING (1D) Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998

Dynamic Programming: 1D Search D isparity map occlusion depth discontinuity RIGHT LEFT cart c a t string editing: stereo matching: penalties: mismatch = 1 insertion = 1 deletion = 1 c a t c a r t

Multiway-Cut: 2D Search pixels labels pixels labels [Boykov, Veksler, Zabih 1998]

Multiway-Cut Algorithm minimum cut Minimizes source label sink label pixels (cost of label discontinuity) (cost of assigning label to pixel) pixels labels

Sampling-Insensitive Pixel Dissimilarity d(x L,x R ) xLxL xRxR d(x L,x R ) = min{d(x L,x R ),d(x R,x L )}Our dissimilarity measure: [Birchfield & Tomasi 1998] ILIL IRIR

Given: An interval A such that [x L – ½, x L + ½] _ A, and [x R – ½, x R + ½] _ A Dissimilarity Measure Theorems If | x L – x R | ≤ ½, then d(x L,x R ) = 0 | x L – x R | ≤ ½ iff d(x L,x R ) = 0 ∩ ∩ Theorem 1: Theorem 2: (when A is convex or concave) (when A is linear)

Correspondence as Segmentation Problem: disparities (fronto-parallel)O(  ) surfaces (slanted) O(   2 n) => computationally intractable! Solution: iteratively determine which labels to use label pixels find affine parameters of regions multiway-cut (Expectation) Newton-Raphson (Maximization)

Stereo Results (Dynamic Programming)

Stereo Results (Multiway-Cut)

Stereo Results on Middlebury Database image Birchfield Tomasi 1999 Hong- Chen 2004

Multiway-Cut Challenges Multiway-cutDynamic programming

Acoustic Localization Problem: Use microphone signals to determine sound source location Traditional solutions: 1.Delay-and-sum beamforming ! 2.Time-delay estimation (TDE) ! compact distributed Recent solutions: 3.Hemisphere sampling !! 4.Accumulated correlation !! 5.Bayesian ! 6.Zero-energy ! ! efficient ! accurate

Localization Geometry t 2 t 1 t - 2 t =  1 (one-half hyperboloid) microphones sound source time 

Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]

Localization by Beamforming mic 1 signal delay mic 2 signal prefilter mic 3 signal find peak mic 4 signal prefilter sum  delay [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002 ] energy ! accurate NOT efficient makes decision late in pipeline (“principle of least commitment”) delays (shifts) each signal for each candidate location

Localization by Time-Delay Estimation (TDE) mic 1 signal correlate find peak mic 2 signal prefilter mic 3 signal correlate find peak mic 4 signal prefilter intersect  (may be no intersection) [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOT accurate decision is made early cross-correlation computed once for each microphone pair

Localization by Hemisphere Sampling mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak  [Birchfield & Gillmor 2001] ! efficient ! accurate (but restricted to compact arrays)

Localization by Accumulated Correlation mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak  [Birchfield & Gillmor 2002] ! efficient ! accurate

Accumulated Correlation Algorithm microphone candidate location = likelihood +... pair 1: pair 2: +

Comparison Bayesian: Zero energy: Acc corr: Hem samp: TDE: similarity energy efficient accurate Beamforming:

Unifying framework efficient accurate

Integration limits Beamforming Bayesian Zero energy Accumulated correlation Hemisphere sampling Time-delay estimation

Compact Microphone Array microphone d=15cm sampled hemisphere

Results on compact array pan tilt without PHAT prefilterwith PHAT prefilter

More Comparison Hemisphere Sampling [Birchfield & Gillmor 2001] Beamforming Accumulated Correlation [Birchfield & Gillmor 2002]

Results on distributed array

Computational efficiency Computing time per window (ms) (600x faster)(50x faster)

Simultaneous Speakers +=

Detecting Noise Sources background noise source

Connection with Stereo [Okutomi & Kanade 1993] “Multi-baseline stereo”

Conclusion Spatial sensing achieved by arrays of visual and auditory sensors Stereo vision –match visual signals from multiple cameras –recent breakthrough: multiway-cut –limitations of multiway-cut Acoustic localization –match acoustic signals from multiple microphones –recent breakthrough: accumulated correlation –connection with multi-baseline stereo