Information-Theoretic Listening * 07/16/96 Information-Theoretic Listening Paris Smaragdis Machine Listening Group MIT Media Lab 11/11/2018 *
Outline Defining a global goal for computational audition * 07/16/96 Outline Defining a global goal for computational audition Example 1: Developing a representation Example 2: Developing grouping functions Conclusions *
Auditory Goals Goals of computational audition are all over the place, should they? Lack of formal rigor in most theories Computational listening is fitting psychoacoustic experiment data
Auditory Development What really made audition? How did our hearing evolve? How did our environment shape our hearing? Can we evolve, rather than instruct, a machine to listen?
Goals of our Sensory System Distinguish independent events Object formation Gestalt grouping Minimize thinking and effort Perceive as few objects as possible Think as little as possible
Entropy Minimization as a Sensory Goal Long history between entropy and perception Barlow, Attneave, Attick, Redlich, etc ... Entropy can measure statistical dependencies Entropy can measure economy in both ‘thought’ (algorithmic entropy) and ‘information’ (Shannon entropy)
What is Entropy? Shannon Entropy: A measure of: Order Predictability Information Correlations Simplicity Stability Redundancy ... High entropy = Little order Low entropy = Lots of order
Representation in Audition Frequency decompositions Cochlear hint Easier to look at data! Sinusoidal bases Signal processing framework
Evolving a Representation Develop a basis decomposition Bases should be statistically independent Satisfaction of minimal entropy idea Decomposition should be data driven Account for different domains
Method Use bits of natural sounds to derive bases Analyze these bits with ICA
Results We obtain sinusoidal bases! Transform is driven by the environment Uniform procedure for different domains
Auditory Grouping Heuristics Bootstrapped to individual domains Hard to implement on computers Require even more heuristics to resolve ambiguity Weak definitions Bootstrapped to individual domains Vision Gestalt Auditory Gestalt … Common AM Common FM Good Continuation
Method Goal: Find grouping that minimizes scene entropy Parameterized Auditory Scene s(t,n) Density Estimation Ps(i) Shannon Entropy Calculation
Common Modulation - Frequency Scene Description: Entropy Measurement: Time n = 0.5 Frequency
Common Modulation - Amplitude Scene Description: Entropy Measurement: Sine 2 Amplitude n = 0.5 Sine 1 Amplitude Time
Common Modulation - Onset/Offset Scene Description: Entropy Measurement: Sine 2 Amplitude n = 0.5 Sine 1 Amplitude Time
Similarity/Proximity - Harmonicity I Scene Description: Entropy Measurement: Time Frequency
Similarity/Proximity - Harmonicity II Scene Description: Entropy Measurement: Time Frequency
Simple Scene Analysis Example 5 Sinusoids 2 Groups Simulated Annealing Algorithm Input: Raw sinusoids Goal: Entropy minimization Output: Expected grouping
Important Notes No definition of time Developed a concept of frequency No parameter estimation requirement Operations on data not parameters No parameter setting!
Conclusions Elegant and consistent formulation No constraint over data representation Uniform over different domains (Cross-modal!) No parameter estimation No parameter tuning! Biological plausibility Barlow et al ... Insight to perception development
Future Work Good Cost Function? Incorporate time Joint entropy vs entropy of sums Shannon entropy vs Kolmogorov complexity Joint-statistics (cumulants, moments) Incorporate time Sounds have time dependencies I’m ignoring Generalize to include perceptual functions
Teasers Dissonance and Entropy Pitch Detection Instrument Recognition