Interactive Learning of the Acoustic Properties of Objects by a Robot Jivko Sinapov Mark Wiemer Alexander Stoytchev {jsinapov|banff|alexs}@iastate.edu Iowa State University
Motivation: why study sound? Sound Producing Event 1. Conscious - mention Gaver: “In particular, such an examination suggests that a given sound provides information about an interaction of materials at a location in an environment.” Human beings and many animals have the remarkable ability to infer and detect events in the world using acoustic information. This example, adopted from Gaver, illustrates how sound waves allow us to experience events that are beyond the reach of range of the rest of our senses. Based on our experience in life, we can easily tell that a car is approaching just from the sound we hear. Gaver in particular argues that when we hear a non-speech sound, our brain tries to figure out the physical event that caused that generated that sound. [Gaver, 1993]
Motivation (2) Why should a robot use acoustic information? Human environments are cluttered with objects that generate sounds Help robot perceive events and objects outside of field of view Help robot perceive material properties of objects
Related Work Krotkov et al. (1996) and Klatzky et al. (2000): Perception of material using contact sounds. Learned sound models for tapping aluminum, brass, glass, wood, and plastic (one object per material) Richmond and Pai (2000) Robotic platform for measuring contact sounds between robot’s end effector and object surfaces Models the contact sounds from different materials using spectrogram averaging [Richmond and Pai, 200]
Related Work (2) Torres-Jara, Natale and Fitzpatrick (2005) Robot taps objects and records spectrogram of sound Recognize objects using spectrogram matching Recognized 4 test objects used during training. Tapping objects Spectrogram of tapping
Our Study Demonstrate object recognition using acoustic features from interaction 18 Different Objects 3 Different behaviors: push, grasp, drop Evaluate different machine learning algorithms
Robot and Objects 7-DOF Barret WAM arm with Barret Hand 18 Different objects: Get rid of figure text, paste list of objects in ppt
Robot Behaviors Three behaviors: grasp, push, drop Grasping: Talk to kevin Segment it better, skip first few seconds
Robot Behaviors Three behaviors: grasp, push, drop Pushing:
Robot Behaviors Three behaviors: grasp, push, drop Dropping:
Sound Feature Representation Step 1: segment sound wave during interaction: Step 2: Compute Discrete Fourier transform (DFT) of sound wave: Step 3: Compute 2-D histogram of DFT matrix using block averaging: Put labels for time and frequency on dft picture 5 frequency bins Frequency Time 10 temporal bins
Object Recognition using Acoustic Properties of Objects Problem: given robot’s behavior and detected sound features from interaction, predict the object. Example: Behavior: Sound Features: Object Class: grasp
Problem Formulation Let be the set of exploratory behaviors Let be the set of objects, Let be a data point such that: , , and For each behavior learn a model that can estimate
Learning Algorithms K-NN Support Vector Machine (SVM) Bayesian Network Simple instance-based algorithm Uses Euclidean distance function Support Vector Machine (SVM) Discriminative approach, uses Kernel trick Bayesian Network Probabilistic graphical model Sound Features are discretized into bins Add picture for each algorithm
Learning Algorithms: k-NN, SVM, and Bayesian Network k-NN: memory-based learning algorithm With k = 3: 2 neighbors 1 neighbors Test point ? Therefore, Pr(red) = 0.66 Pr(blue) = 0.33
Learning Algorithms: k-NN, SVM, and Bayesian Network Support Vector Machine: discriminative learning algorithm Finds maximum margin hyperplane that separates two classes Uses Kernel trick to map data points into a feature space in which such a hyperplane exists [http://www.imtech.res.in/raghava/rbpred/svm.jpg]
Learning Algorithms: k-NN, SVM, and Bayesian Network Bayesian Network: a probabilistic graphical model Full power of statistical modeling and inference Learning: learns both the structure of the network and the parameters (conditional probability tables) Numerical features are discretized into bins A B C D E
Using Multiple Behaviors Given trained models , , Given novel sounds , , from behaviors performed on the same object Assign prediction to object class that maximizes: Fix cropping from paper
Evaluation 6 trials recorded with each of the 18 objects with each of the 3 behaviors Leave-one-out cross-validation Compared performance of learning algorithms as well as behaviors Performance Measure:
Results Chance accuracy = 1/18 = 5.6667%
Confusion Matrix for model Mpush using Bayesian Network Predicted → 4 - 2 5 1 6 3 Perfect classification and no false positives for: Add pictures of rest of the objects
Confusion Matrix for model Mcombined using Bayesian Network Predicted → 6 - 1 5 Conclusion: The errors made by models Mgrasp, Mpush and Mdrop are uncorrelated.
Learning rate of algorithms Compare performance of the model Mgrasp as a function of dataset size for: k-NN Support Vector Machine Bayesian Network Get rid of figure caption, Add next figure after end of talk in case
Learning Rate per Behavior with Bayesian Network
Summary and Conclusions Accurate acoustic-based object recognition with 18 objects and 3 behaviors Using multiple behaviors improves recognition regardless of learning algorithm Bayesian network performed best with given feature representation Grasping and Pushing interaction produces sound features that are more informative of the object than Dropping “improves recognition regardless of the type of learning algorithm being used”
Future Work Scaling up: Increase number of objects Vary object and robot pose Autonomous interaction Use unsupervised learning to form object sound categories More powerful feature representations Temporal features (i.e. periodicity) of sounds Use models to detect events in the world performed by others (humans or other robots) Change to arial