Download presentation
Presentation is loading. Please wait.
Published byJuliana Jones Modified over 9 years ago
2
Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory m
3
CIRAS 2003 cameras on active vision head microphone array above torso periodically moving object (hammer) periodically generated sound (banging) Cog – the humanoid platform
4
CIRAS 2003 Motivation Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, … Rhythmic information across the visual and acoustic sensory modalities have complementary properties Features extracted from visual and acoustic processing are what is needed to build an object recognition system
5
CIRAS 2003 Interacting with the ro bot robot
6
CIRAS 2003 Talk outline Matching sound and vision Matching with visual distraction Matching with acoustic distraction Matching multiple sources Priming sound detection using vision Towards object recognition
7
CIRAS 2003 Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files. Points tracked using Lukas-Kanade algorithm Periodicity Analysis FFTs of tracked trajectories Periodicity Histograms Phase verification Detecting periodic events
8
CIRAS 2003 Matching sound and vision frequency (kHz) 05001000 1500 2000 0 2 4 6 8 0 0 500 1000 1500 energy 0 -80 -70 -60 -50 5001000150020002500 5001000150020002500 time (ms) hammer position The sound intensity peaks once per visual period of the hammer
9
CIRAS 2003 Matching with visual distraction One object (the car) making noise Another object (the ball) in view –Problem: which object goes with the sound? –Solution: Match periods of motion and sound
10
CIRAS 2003 Comparing periods 05001000150020002500 0 5 10 energy 05001000150020002500 -50 0 50 car position 05001000150020002500 40 50 60 time (ms) ball position The sound intensity peaks twice per visual period of the car
11
CIRAS 2003 Matching with acoustic distraction 50010001500 2000 50 60 70 80 90 car position frequency (kHz) 0500100015002000 0 2 4 6 8 Sound 5001000150020002500 4 6 8 10 12 snake position
12
CIRAS 2003 Matching multiple sources Two objects making sounds with distinct spectrums –Problem: which object goes with which sound? –Solution: Match periods of motion and sound
13
CIRAS 2003 Binding periodicity features frequency (kHz) 0500100015002000 0 2 4 6 8 The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands
14
CIRAS 2003 134146162723Plane and Mouse 00100494Snake rattle (Car in Background) 001005165Car (Snake rattle in background) 17083566Car 860141117Car & ball 00100686hammer Missed Binds (%) Incorrect Binds (%) Correct binds (%) Bind made Sound period found Visual period found Experiment Statistics An evaluation of cross-modal binding for various objects and situations the sound generated by a periodically moving object can be much more complex and ambiguous than its visual trajectory
15
CIRAS 2003 Priming sound detection using vision Signals in Phase
16
CIRAS 2003 Signals out of phase!
17
CIRAS 2003 Object recognition Visual object segmentation Cross-modal object recognition –Ratio between acoustic/visual fundamental frequencies –Phase between acoustic and visual signals –Range of acoustic frequency bands
18
CIRAS 2003 Cross-modal object recognition Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary) Causes sound when changing direction after striking object; quiet when changing direction to strike again Causes sound while moving rapidly with wheels spinning; quiet when changing direction
19
CIRAS 2003 0.40.50.60.70.80.91 -200 -150 -100 -50 0 50 100 Phase difference (sound/visual) Fundamental period Ratio (sound/visual) Clustering 0.4 0.5 0.6 0.7 0.8 0.9 1 -200 -150 -100 -50 0 50 100 0 5 10 15 Phase difference (sound/visual) Fundamental period Ratio (sound/visual) Frequency bands
20
CIRAS 2003 Conclusions Different objects = distinct acoustic-visual patterns which are a rich source of information for object recognition. Object differentiation from both its visual and acoustic backgrounds by binding pixels and frequency bands that are oscillating together Cognitive evidence that, for humans, simple visual periodicity can aid the detection of acoustic periodicity More feature can be used for better discrimination, like the ratio of the sound/visual peak amplitudes Each type of features are important for recognition when the other is absent. But when both are present, then we can do better by looking at the relationship between visual motion and the sound generated.
21
CIRAS 2003 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.