Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University
Human Spatial Sensing The five senses: Hearing Taste Touch Smell Seeing f(t) f(x,y,,t)
Visual and Auditory Pathways
Two Problems in Spatial Sensing Stereo VisionAcoustic Localization
Clemson Vision Laboratory head tracking root detectionreconstruction highway monitoring motion segmentation
Clemson Vision Lab (cont.) microphone position calibration speaker localization
Stereo Vision INPUT OUTPUT LeftRight Disparity mapDepth discontinuities epipolar constraint
Epipolar Constraint Left cameraRight camera world point center of projection epipolar plane epipolar line
Energy Minimization Left Right intensity occluded pixels minimize: dissimilarity discontinuity penalty (underconstrained) constraint
History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 MULTIWAY-CUT (2D) DYNAMIC PROGRAMMING (1D) Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998
Dynamic Programming: 1D Search D isparity map occlusion depth discontinuity RIGHT LEFT cart c a t string editing: stereo matching: penalties: mismatch = 1 insertion = 1 deletion = 1 c a t c a r t
Multiway-Cut: 2D Search pixels labels pixels labels [Boykov, Veksler, Zabih 1998]
Multiway-Cut Algorithm minimum cut Minimizes source label sink label pixels (cost of label discontinuity) (cost of assigning label to pixel) pixels labels
Sampling-Insensitive Pixel Dissimilarity d(x L,x R ) xLxL xRxR d(x L,x R ) = min{d(x L,x R ),d(x R,x L )}Our dissimilarity measure: [Birchfield & Tomasi 1998] ILIL IRIR
Given: An interval A such that [x L – ½, x L + ½] _ A, and [x R – ½, x R + ½] _ A Dissimilarity Measure Theorems If | x L – x R | ≤ ½, then d(x L,x R ) = 0 | x L – x R | ≤ ½ iff d(x L,x R ) = 0 ∩ ∩ Theorem 1: Theorem 2: (when A is convex or concave) (when A is linear)
Correspondence as Segmentation Problem: disparities (fronto-parallel)O( ) surfaces (slanted) O( 2 n) => computationally intractable! Solution: iteratively determine which labels to use label pixels find affine parameters of regions multiway-cut (Expectation) Newton-Raphson (Maximization)
Stereo Results (Dynamic Programming)
Stereo Results (Multiway-Cut)
Stereo Results on Middlebury Database image Birchfield Tomasi 1999 Hong- Chen 2004
Multiway-Cut Challenges Multiway-cutDynamic programming
Acoustic Localization Problem: Use microphone signals to determine sound source location Traditional solutions: 1.Delay-and-sum beamforming ! 2.Time-delay estimation (TDE) ! compact distributed Recent solutions: 3.Hemisphere sampling !! 4.Accumulated correlation !! 5.Bayesian ! 6.Zero-energy ! ! efficient ! accurate
Localization Geometry t 2 t 1 t - 2 t = 1 (one-half hyperboloid) microphones sound source time
Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]
Localization by Beamforming mic 1 signal delay mic 2 signal prefilter mic 3 signal find peak mic 4 signal prefilter sum delay [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002 ] energy ! accurate NOT efficient makes decision late in pipeline (“principle of least commitment”) delays (shifts) each signal for each candidate location
Localization by Time-Delay Estimation (TDE) mic 1 signal correlate find peak mic 2 signal prefilter mic 3 signal correlate find peak mic 4 signal prefilter intersect (may be no intersection) [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOT accurate decision is made early cross-correlation computed once for each microphone pair
Localization by Hemisphere Sampling mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak [Birchfield & Gillmor 2001] ! efficient ! accurate (but restricted to compact arrays)
Localization by Accumulated Correlation mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak [Birchfield & Gillmor 2002] ! efficient ! accurate
Accumulated Correlation Algorithm microphone candidate location = likelihood +... pair 1: pair 2: +
Comparison Bayesian: Zero energy: Acc corr: Hem samp: TDE: similarity energy efficient accurate Beamforming:
Unifying framework efficient accurate
Integration limits Beamforming Bayesian Zero energy Accumulated correlation Hemisphere sampling Time-delay estimation
Compact Microphone Array microphone d=15cm sampled hemisphere
Results on compact array pan tilt without PHAT prefilterwith PHAT prefilter
More Comparison Hemisphere Sampling [Birchfield & Gillmor 2001] Beamforming Accumulated Correlation [Birchfield & Gillmor 2002]
Results on distributed array
Computational efficiency Computing time per window (ms) (600x faster)(50x faster)
Simultaneous Speakers +=
Detecting Noise Sources background noise source
Connection with Stereo [Okutomi & Kanade 1993] “Multi-baseline stereo”
Conclusion Spatial sensing achieved by arrays of visual and auditory sensors Stereo vision –match visual signals from multiple cameras –recent breakthrough: multiway-cut –limitations of multiway-cut Acoustic localization –match acoustic signals from multiple microphones –recent breakthrough: accumulated correlation –connection with multi-baseline stereo