Download presentation
Presentation is loading. Please wait.
Published byBlaise Merritt Modified over 9 years ago
1
Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University
2
Human Spatial Sensing The five senses: Hearing Taste Touch Smell Seeing f(t) f(x,y,,t)
3
Visual and Auditory Pathways
4
Two Problems in Spatial Sensing Stereo VisionAcoustic Localization
5
Clemson Vision Laboratory head tracking root detectionreconstruction highway monitoring motion segmentation
6
Clemson Vision Lab (cont.) microphone position calibration speaker localization
7
Stereo Vision INPUT OUTPUT LeftRight Disparity mapDepth discontinuities epipolar constraint
8
Epipolar Constraint Left cameraRight camera world point center of projection epipolar plane epipolar line
9
Energy Minimization Left Right intensity occluded pixels minimize: dissimilarity discontinuity penalty (underconstrained) constraint
10
History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al. 1995 Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 MULTIWAY-CUT (2D) DYNAMIC PROGRAMMING (1D) Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998
11
Dynamic Programming: 1D Search D isparity map occlusion depth discontinuity RIGHT LEFT cart c a t 32111 21012 10123 01234 string editing: stereo matching: penalties: mismatch = 1 insertion = 1 deletion = 1 c a t c a r t
12
Multiway-Cut: 2D Search pixels labels pixels labels [Boykov, Veksler, Zabih 1998]
13
Multiway-Cut Algorithm minimum cut Minimizes source label sink label pixels (cost of label discontinuity) (cost of assigning label to pixel) pixels labels
14
Sampling-Insensitive Pixel Dissimilarity d(x L,x R ) xLxL xRxR d(x L,x R ) = min{d(x L,x R ),d(x R,x L )}Our dissimilarity measure: [Birchfield & Tomasi 1998] ILIL IRIR
15
Given: An interval A such that [x L – ½, x L + ½] _ A, and [x R – ½, x R + ½] _ A Dissimilarity Measure Theorems If | x L – x R | ≤ ½, then d(x L,x R ) = 0 | x L – x R | ≤ ½ iff d(x L,x R ) = 0 ∩ ∩ Theorem 1: Theorem 2: (when A is convex or concave) (when A is linear)
16
Correspondence as Segmentation Problem: disparities (fronto-parallel)O( ) surfaces (slanted) O( 2 n) => computationally intractable! Solution: iteratively determine which labels to use label pixels find affine parameters of regions multiway-cut (Expectation) Newton-Raphson (Maximization)
17
Stereo Results (Dynamic Programming)
18
Stereo Results (Multiway-Cut)
19
Stereo Results on Middlebury Database image Birchfield Tomasi 1999 Hong- Chen 2004
20
Multiway-Cut Challenges Multiway-cutDynamic programming
21
Acoustic Localization Problem: Use microphone signals to determine sound source location Traditional solutions: 1.Delay-and-sum beamforming ! 2.Time-delay estimation (TDE) ! compact distributed Recent solutions: 3.Hemisphere sampling !! 4.Accumulated correlation !! 5.Bayesian ! 6.Zero-energy ! ! efficient ! accurate
22
Localization Geometry t 2 t 1 t - 2 t = 1 (one-half hyperboloid) microphones sound source time
23
Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]
24
Localization by Beamforming mic 1 signal delay mic 2 signal prefilter mic 3 signal find peak mic 4 signal prefilter sum delay [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002 ] energy ! accurate NOT efficient makes decision late in pipeline (“principle of least commitment”) delays (shifts) each signal for each candidate location
25
Localization by Time-Delay Estimation (TDE) mic 1 signal correlate find peak mic 2 signal prefilter mic 3 signal correlate find peak mic 4 signal prefilter intersect (may be no intersection) [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOT accurate decision is made early cross-correlation computed once for each microphone pair
26
Localization by Hemisphere Sampling mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak [Birchfield & Gillmor 2001] ! efficient ! accurate (but restricted to compact arrays)
27
Localization by Accumulated Correlation mic 1 signal correlate map to common coordinate system sampled locus sum temporal smoothing mic 2 signal prefilter mic 3 signal correlate map to common coordinate system mic 4 signal prefilter final sampled locus correlate … find peak [Birchfield & Gillmor 2002] ! efficient ! accurate
28
Accumulated Correlation Algorithm microphone candidate location = likelihood +... pair 1: pair 2: +
29
Comparison Bayesian: Zero energy: Acc corr: Hem samp: TDE: similarity energy efficient accurate Beamforming:
30
Unifying framework efficient accurate
31
Integration limits Beamforming Bayesian Zero energy Accumulated correlation Hemisphere sampling Time-delay estimation
32
Compact Microphone Array microphone d=15cm sampled hemisphere
33
Results on compact array pan tilt without PHAT prefilterwith PHAT prefilter
34
More Comparison Hemisphere Sampling [Birchfield & Gillmor 2001] Beamforming Accumulated Correlation [Birchfield & Gillmor 2002]
35
Results on distributed array
36
Computational efficiency Computing time per window (ms) (600x faster)(50x faster)
37
Simultaneous Speakers +=
38
Detecting Noise Sources background noise source
39
Connection with Stereo [Okutomi & Kanade 1993] “Multi-baseline stereo”
40
Conclusion Spatial sensing achieved by arrays of visual and auditory sensors Stereo vision –match visual signals from multiple cameras –recent breakthrough: multiway-cut –limitations of multiway-cut Acoustic localization –match acoustic signals from multiple microphones –recent breakthrough: accumulated correlation –connection with multi-baseline stereo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.