GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1
Outlines Overview of MIR system – Human Listening – Machine Listening Audio and Music representations – Time-domain representation Waveform – Time-frequency domain representations Sinusoids DFT STFT, Spectrogram 2
Human Listening L. Watts, “Visualizing Complexity in the Brain”, Ears Auditory Transduction watch?v=PeTriGTENoc
Machine Listening Emulated the human auditory system? – Well, it might be better to understand the functionalities in a high level and implement them in efficient ways for machines… Basic Functionalities – Capture sounds and convert the air vibration to an accessible form (by the machine) – Transform the input to have a better view of sounds – Extract only necessary part – Obtain desired information from the extracted part 4
(Content-based) MIR System 5 Algorithms Feature Extraction Sound Capture Representation Transform Block Diagram of MIR system
Sound Capture 6 Microphone – Mechanical vibration to electrical signals A-D converter – Sampling and Quantization – Produce digital waveforms Often, we store the waveforms as audio files – If necessary, the audio files are compressed (mp3, wma, …) Algorithms Feature Extraction Sound Capture Representation Transform
Representation Transform Transform waveforms to have better view of sounds – Mostly using sinusoidal basis functions Types – Short-time Fourier Transform (STFT): Spectrogram – Constant-Q transform – Auditory filter banks – Remapped spectrogram (frequency or amplitude) – Auto-correlation 7 Algorithms Feature Extraction Sound Capture Representation Transform
Feature Extraction and Algorithms Feature extraction – Extract only necessary variations in the data representation Algorithms – Determine categories or specific values through training Two approaches in feature extraction and algorithms – Heuristic approach: make computational rules based on domain knowledge and trial-and-error – Learning-based approach: training the system using labeled (or unlabeled) data – The rest of this course is all about this 8 Algorithms Feature Extraction Sound Capture Representation Transform
Sound Capture 9 Microphone – Mechanical vibration to electrical signals – Followed by pre-amplifiers – Microphones and pre-amps have characteristic frequency responses that colorize the input sound A-D converter – Sampling: continuous-to discrete-time signals – Quantization: finite numbers of amplitude steps – Produce digital waveforms Often, multiple input channels are used – Stereo (2-ch) is standard in music recordings – Microphone arrays: good for sound localization and spatial filtering (e.g. beam-forming )
Sampling Convert continuous signals to a series of discrete numbers by uniformly picking up the signal values in time Sampling theorem – Sampling rate must be twice as high as the highest frequency the continuous signals contain. – Lowpass filter is applied before sampling to avoid aliasing Human can hear up to 20kHz – Sampling rate of 40kHz or above Examples of sampling rates – Speech: 8kHz, 16kHz – Music: 22.05Hz, 44.1KHz, 48KHz – Professional audio gears: 48kHz, 96kHz 10
Quantization 11 Convert continuous level of values to a finite set of steps in amplitude Create “quantization error” – Can be regarded as additive noise – Sufficient number of quantization steps is necessary to prevent the noise from being audible Examples of quantization steps – 8 bit: 48dB (dynamic range) – 16 bit: 96dB – 24 bit: 144dB – Human ears: about 110 dB (depending on frequency)
(Digital) Waveform 12 The most basic audio representation that computers can take – x(n) = [a1, a2, a3,...] Good to view energy change – Overall dynamic range when zoomed out – Fine-time note onset when zoomed in But not very intuitive
Another View of Waveform Waveform can be seen as representing signals with the following basis functions For example, the signal x(n) is like: Can we find better basis functions? – New basis functions: 13
Sinusoids A periodic waveform drawn from a circle Why sinusoids are important – Fundamental in Physics – Eigen-functions of linear systems – Human ears is a kind of spectrum analyzer 14 : Amplitude : Angular Frequency : Initial Phase
Discrete Fourier Transform (DFT) Complex Sinusoid – By Euler’s Identity: Discrete Fourier Transform – Inner product with complex sinusoid Inverse Discrete Fourier Transform 15
DFT Inverse DFT Basis Function View Practical Form of DFT 16
Matrix Multiplication View of DFT In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” Complexity reduction by FFT: O( N 2 ) O( Nlog 2 N ) Practical Form of DFT 17
Practical Form of DFT DFT produces complex numbers! Magnitude – Correspond to energy at frequency k Phase – Corresponds to phase at frequency k 18
Examples of DFT 19 Sine waveform Drum Flute
Short-Time Fourier Transform (STFT) DFT assumes that the signal is stationary – It is not a good idea to apply DFT to long and dynamically changing signals like music – Instead, we segment the signal and apply DFT separately Short-Time Fourier Transform 1.Segment a frame using a window function 2.Zero-padding if necessary 3.Apply DFT to the zero-padded windowed waveform 4.Progress by “hop size” 5.Repeat step 1-4 This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude Parameters: window size, window type, FFT size, hop size 20
Windowing Types of window functions – Rectangular, Triangle, Hann, Hamming, Blackman-Harris – Trade-off between the width of main-lobe and the level of side-lobe 21 Main-lobe width Side-lobe level
Zero-padding Adding zeros to a windowed frame in time domain Corresponds to “ideal interpolation” in frequency domain In practice, FFT size increases by the size of zero-padding 22
Example: Music 23
Example: Deep Note 24
Time-Frequency Resolutions in STFT Trade-off between time-resolution and frequency-resolution – Long window: high frequency-resolution / low time-resolution – short window: low frequency-resolution / high time-resolution 25
References JOS DSP Books – Mathematics of DFT – Spectral Audio Signal Processing The Scientist and Engineer’s Guide to Digital Signal Processing – (See chapter 8-12) 26