Presentation is loading. Please wait.

Presentation is loading. Please wait.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Similar presentations


Presentation on theme: "GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1."— Presentation transcript:

1

2 GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1

3 Outlines Overview of MIR system – Human Listening – Machine Listening Audio and Music representations – Time-domain representation Waveform – Time-frequency domain representations Sinusoids DFT STFT, Spectrogram 2

4 Human Listening L. Watts, “Visualizing Complexity in the Brain”, 2003 3 Ears Auditory Transduction http://www.youtube.com/ watch?v=PeTriGTENoc

5 Machine Listening Emulated the human auditory system? – Well, it might be better to understand the functionalities in a high level and implement them in efficient ways for machines… Basic Functionalities – Capture sounds and convert the air vibration to an accessible form (by the machine) – Transform the input to have a better view of sounds – Extract only necessary part – Obtain desired information from the extracted part 4

6 (Content-based) MIR System 5 Algorithms Feature Extraction Sound Capture Representation Transform Block Diagram of MIR system

7 Sound Capture 6 Microphone – Mechanical vibration to electrical signals A-D converter – Sampling and Quantization – Produce digital waveforms Often, we store the waveforms as audio files – If necessary, the audio files are compressed (mp3, wma, …) Algorithms Feature Extraction Sound Capture Representation Transform

8 Representation Transform Transform waveforms to have better view of sounds – Mostly using sinusoidal basis functions Types – Short-time Fourier Transform (STFT): Spectrogram – Constant-Q transform – Auditory filter banks – Remapped spectrogram (frequency or amplitude) – Auto-correlation 7 Algorithms Feature Extraction Sound Capture Representation Transform

9 Feature Extraction and Algorithms Feature extraction – Extract only necessary variations in the data representation Algorithms – Determine categories or specific values through training Two approaches in feature extraction and algorithms – Heuristic approach: make computational rules based on domain knowledge and trial-and-error – Learning-based approach: training the system using labeled (or unlabeled) data – The rest of this course is all about this 8 Algorithms Feature Extraction Sound Capture Representation Transform

10 Sound Capture 9 Microphone – Mechanical vibration to electrical signals – Followed by pre-amplifiers – Microphones and pre-amps have characteristic frequency responses that colorize the input sound A-D converter – Sampling: continuous-to discrete-time signals – Quantization: finite numbers of amplitude steps – Produce digital waveforms Often, multiple input channels are used – Stereo (2-ch) is standard in music recordings – Microphone arrays: good for sound localization and spatial filtering (e.g. beam-forming )

11 Sampling Convert continuous signals to a series of discrete numbers by uniformly picking up the signal values in time Sampling theorem – Sampling rate must be twice as high as the highest frequency the continuous signals contain. – Lowpass filter is applied before sampling to avoid aliasing Human can hear up to 20kHz – Sampling rate of 40kHz or above Examples of sampling rates – Speech: 8kHz, 16kHz – Music: 22.05Hz, 44.1KHz, 48KHz – Professional audio gears: 48kHz, 96kHz 10

12 Quantization 11 Convert continuous level of values to a finite set of steps in amplitude Create “quantization error” – Can be regarded as additive noise – Sufficient number of quantization steps is necessary to prevent the noise from being audible Examples of quantization steps – 8 bit: 48dB (dynamic range) – 16 bit: 96dB – 24 bit: 144dB – Human ears: about 110 dB (depending on frequency)

13 (Digital) Waveform 12 The most basic audio representation that computers can take – x(n) = [a1, a2, a3,...] Good to view energy change – Overall dynamic range when zoomed out – Fine-time note onset when zoomed in But not very intuitive

14 Another View of Waveform Waveform can be seen as representing signals with the following basis functions For example, the signal x(n) is like: Can we find better basis functions? – New basis functions: 13

15 Sinusoids A periodic waveform drawn from a circle Why sinusoids are important – Fundamental in Physics – Eigen-functions of linear systems – Human ears is a kind of spectrum analyzer 14 : Amplitude : Angular Frequency : Initial Phase

16 Discrete Fourier Transform (DFT) Complex Sinusoid – By Euler’s Identity: Discrete Fourier Transform – Inner product with complex sinusoid Inverse Discrete Fourier Transform 15

17 DFT Inverse DFT Basis Function View Practical Form of DFT 16

18 Matrix Multiplication View of DFT In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” Complexity reduction by FFT: O( N 2 )  O( Nlog 2 N ) Practical Form of DFT 17

19 Practical Form of DFT DFT produces complex numbers! Magnitude – Correspond to energy at frequency k Phase – Corresponds to phase at frequency k 18

20 Examples of DFT 19 Sine waveform Drum Flute

21 Short-Time Fourier Transform (STFT) DFT assumes that the signal is stationary – It is not a good idea to apply DFT to long and dynamically changing signals like music – Instead, we segment the signal and apply DFT separately Short-Time Fourier Transform 1.Segment a frame using a window function 2.Zero-padding if necessary 3.Apply DFT to the zero-padded windowed waveform 4.Progress by “hop size” 5.Repeat step 1-4 This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude Parameters: window size, window type, FFT size, hop size 20

22 Windowing Types of window functions – Rectangular, Triangle, Hann, Hamming, Blackman-Harris – Trade-off between the width of main-lobe and the level of side-lobe 21 Main-lobe width Side-lobe level

23 Zero-padding Adding zeros to a windowed frame in time domain Corresponds to “ideal interpolation” in frequency domain In practice, FFT size increases by the size of zero-padding 22

24 Example: Music 23

25 Example: Deep Note 24

26 Time-Frequency Resolutions in STFT Trade-off between time-resolution and frequency-resolution – Long window: high frequency-resolution / low time-resolution – short window: low frequency-resolution / high time-resolution 25

27 References JOS DSP Books – Mathematics of DFT https://ccrma.stanford.edu/~jos/mdft/ https://ccrma.stanford.edu/~jos/mdft/ – Spectral Audio Signal Processing https://ccrma.stanford.edu/~jos/sasp/ https://ccrma.stanford.edu/~jos/sasp/ The Scientist and Engineer’s Guide to Digital Signal Processing – http://www.dspguide.com/pdfbook.htm (See chapter 8-12) http://www.dspguide.com/pdfbook.htm 26


Download ppt "GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1."

Similar presentations


Ads by Google