Download presentation
Presentation is loading. Please wait.
Published byMaryam Taitt Modified over 10 years ago
1
Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab
2
Human-centered Interfaces Free users from desktop and wired interfaces Allow natural gesture and speech commands Give computers awareness of users Work in open and noisy environments -Outdoors -- H21 next to construction site! -Indoors -- crowded meeting room (E21) Vision’s role: provide perceptive context
3
Perceptive Context Who is there? (presence, identity) What is going on? (activity) Where are they? (individual location) Which person said that? (audiovisual grouping) What are they looking / pointing at? (pose, gaze) Today: Tracking speakers with an audio-visual microphone array Tracking faces for gaze aware dialog interfaces Speaker verification with higher-order joint audio-visual statistics
4
Tracking speakers Track location and short-term identity Should work with lots of fast lighting variation -Stereo based methods -New technique for dense background modeling Estimate trajectory of 3D foreground points from multiple views over time Guide active cameras and microphone array Recognize activity and participant roles
5
Range-based stereo person tracking Range can be insensitive to fast illumination change Compare range values to known background Project into 2D overhead view IntensityRangeForeground Plan view Merge data from multiple stereo cameras.. Group into trajectories… Examine height for sitting/standing…
6
Fast/dense stereo foreground left image (reference) right image Standard stereo searches exhaustively per frame: But if the background is predictable, we can prune most of the search!
7
Range Foreground depth Fast/dense stereo foreground Intensity Background New Image
8
Sparse Stereo Model SceneRange Image What to do when there are undefined range values in background and a new foreground image has a valid range value? conservative -- call it background liberal -- call it foreground
9
Conservative Segmentation Type I errors! (Misses most of person) Model Lighting Change Foreground New person Foreground
10
Liberal Segmentation Type II errors! (False Positives) Model New person Foreground Lighting Change Foreground
11
Dense Stereo Model Acquistion Different gain settings yield different regions of undefined range values
12
Dense Stereo Model Acquistion Combined valid measurements from observations at different gain and/or illumination settings: ++
13
State of the Art (cont’d) if you want really dense range backgrounds from one stereo view….
14
Visibility Constraints
15
Visibility Constraints for Virtual Backgrounds
16
Simple background subtraction
17
Virtual Background Segmentation
18
Range-based stereo person tracking Range can be insensitive to fast illumination change Compare range values to known background Project into 2D overhead view IntensityRangeForeground Plan view Merge data from multiple stereo cameras.. Group into trajectories… Examine height for sitting/standing…
19
Multiple stereo views
20
Merged Plan-view segmentation
21
Points -> trajectories -> active sensing Active Camera motion Microphone array Activity classification trajectories Spatio- temporal points
22
Test Environment
23
Active camera tracking
24
Audio input in noisy environments Acquire high-quality audio from untethered, moving speakers “Virtual” headset microphones for all users
25
Solutions Wireless close-talking microphone Shotgun microphone Microphone array Our solution: large, vision- and audio-guided microphone array
26
Our approach Large-array, non-linear geometry -allows selection of 3-D volume of space -can select based on distance (more than beamforming) Integrated with vision tracking -makes real-time localization of multiple sources feasible -known array geometry and target location ==> simple system -precalibrate array with known source tone Related Work -small-aperture vision-guided microphone arrays (Waibel) -large-aperture audio-guided arrays (Silverman)
27
Microphone Arrays Microphones at known locations synchronized in time Electronically focused directional receiver
28
Array focusing Delay-and-sum beamforming – compensate for propagation delays to reinforce target signal:
29
Delay and sum array processing Calibrate using cross-correlation analysis with single source presentation Compute delay and weight based on geometry of array and target -delay: time of flight -weight: estimated SNR based on distance Filtered source is delayed and weighted sum of all microphones.
30
Beamforming Example Received Signals Delayed Signals Delayed And Summed Signal
31
Array Size Beam width (array span) -1 Large arrays select fully bounded volumes Small arrays select directional beams
32
Related Work Small-aperture vision-guided microphone arrays (Bub, Hunke, and Waibel) Large-aperture audio-guided arrays (Silverman et al.)
33
First person moves on oval path while counting; second person stationary while reciting alphabet. Result from single microphone at center of room: Result from microphone array with focus fixed at initial position of moving speaker: Beamforming Demonstration Output Power (dB) Position (meters)
34
Array Steering Audio-only – max-power search
35
Audio-only steering is hard. Position (meters) Array output power (dB)
36
Audio-only steering is hard. Position (meters) Array output power (dB)
37
Hybrid Localization Vision-only steering isn’t perfect. -Joint calibration -Person tracking, not mouth tracking Can correct vision-based estimate with limited search (implemented as gradient ascent) in audio domain
38
System flow (single target) Vision-based tracker Gradient ascent search in array output power Delay-and-sum beamformer Video Streams Audio Streams
39
Results Single microphone: Hybrid tracking with beamforming: Localization TechniqueSNR (dB) Single microphone-6.6 Video only-4.4 Audio-Video Hybrid2.3
40
Results continued
41
Status Fully 3-d, multimodal sound source localization and separation system Realtime implementation of delay-and-sum array processing Future work: -Compare to commercial linear array -More sophisticated beamforming (null steering) -Connect to automated speech recognition (in progress) -Incorporating single channel source separation techniques -AVMI -ICA -source modeling
42
Today Tracking speakers with an audio-visual microphone array Tracking faces for gaze aware dialog interfaces [John Fisher] Speaker verification with higher-order joint audio-visual statistics
43
Brightness and depth motion constraints I t I t + 1 II ZZ Z t Z t + 1
44
Brightness and depth motion constraints I t I t + 1 II ZZ Z t Z t + 1 y t = y t-1 Parameter space
45
New bounded error tracking algorithm Influence region open loop 2D tracker closed loop 2D tracker Track relative to all previous frames which are close in pose space
46
Closed-loop 3D tracker Track users head gaze for hands-free pointing…
47
Head-driven cursor Related Projects: Schiele Kjeldsen Toyama Current application for second pointer or scrolling / focus of attention…
48
Head-driven cursor
49
Task
50
Single cursor
51
Two hand cursors
52
Head-hand cursors
53
Gaze aware dialog interface Interface Agent responds to gaze of user -agent should know when it’s being attended to -turn-taking pragmatics -anaphora / object reference Prototype -E21 interface “sam” -current experiments with face tracker on meeting room table WOZ initial user tests Integrating with wall cameras and hand gesture interface
54
Is that you talking? New single channel algorithm to prevent stray utterances Match video to audio! -Audio-visual Synchrony Detection -Analyze Mutual Information between signals Find maximally informative subspace projection between audio and video… [ Fisher and Darrell ]
55
Perceptual context Take-home message: vision provides Perceptual Context to make applications aware of users.. activity -- adapting outdoor activity classification [ Grimson and Stauffer ] to indoor domain… So far: detection, ID, head pose, audio enhancement and synchrony verification… Soon: gaze -- add eye tracking on pose stabilized face pointing -- arm gestures for selection and navigation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.