Presentation is loading. Please wait.

Presentation is loading. Please wait.

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.

Similar presentations


Presentation on theme: "GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1."— Presentation transcript:

1

2 GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1

3 Outlines Audio and Music representations (cont’d) – Frequency scaling using spectrogram Pitch scale in music Pitch scale in human hearing Mapping frequency scale using STFT – Constant-Q transform – Auditory Filter bank Tools 2

4 Motivation Spectrogram is the most standard way of visualizing sounds. – Good to see the harmonic structure of a single tone. – However, is it the the best way to visualize musical signals Musical signals – Musical notes are not linearly scaled – The majority of notes is located in low frequency range Frequency is not intuitive for human hearing as well. 3 Piano (Chromatic Scale) Aerosmith “Jaded”

5 Pitch Scale in Music Musical notes are scaled logarithmically in frequency Music tuning systems – Different ways of sub-dividing the octave – Just Intonation: using harmonics e.g.) 1:1, 9:8, 5:4, 4:3, 3:2, 5:3, 15:9, 2:1 (diatonic scale) – Pythagorean tuning: 3:2 (5 th ) for all notes – Equal temperament: 1: 2 1/12 between two adjacent notes e.g.) MIDI note ( m ) and frequency ( f ) in Hz 4 http://newt.phys.unsw.edu.au/jw/notes.html

6 Pitch Scale in Human hearing Human also perceive tones in a log scale Psychoacoustic pitch scale – Mel scale: based on pitch ratio of tones – Bark scale: based on critical band measurement – Equivalent Regular Bandwidth (EBR) rate: based on critical band measurement but with a different approach 5

7 Mapping frequency scale using STFT Mapping linear frequency scale to a log-like scale – Computing each mapping point by multiplying weight (i.e. interpolation coefficient) Limitation – Simple but time frequency resolutions are still constrained on STFT 6 ( M : mapping matrix, X : spectrogram, Y : scaled spectrogram)

8 Constant-Q transform A more sophisticated way of obtaining log-frequency scale Use a set of sinusoidal kernels (wavelets) such that – the frequencies are logarithmically spaced – the kernels (i.e. filters) have constant Q = frequency/bandwidth 7

9 Constant-Q transform Time-frequency resolutions are not uniform – High frequency-resolution and low time-resolution in low frequency range – Low frequency-resolution and high time-resolution in high frequency range 8 Short-Time Fourier Transform Constant-Q transform

10 Example of Constant-Q transform 9 Log-frequency Spectrogram (mapping) Log-frequency Spectrogram (Constant-Q transform) Regular Spectrogram

11 Example of Constant-Q transform 10 Log-frequency Spectrogram (mapping) Log-frequency Spectrogram (Constant-Q transform) Regular Spectrogram

12 Auditory Filter bank A set of filter bank that imitates the magnitude and delay of traveling waves on basilar membrane in cochlear – Produce 3-D representation (time-channel-lag) or “auditory images” 11 input Cochlear Filter banks...... HC...... Stabilize & Combine output Oval window High Freq. Low Freq.

13 Types of Auditory Filter banks Gamma-tone Filter banks – Gamma-tone – Used in Patterson’s Auditory Filter banks based on ERB Pole-Zero Filter Cascade (Lyon) 12

14 Hair-Cell 13 (Inner) Hair-cell – Transform mechanical movement into neural spikes Modeled as cascade of – Half-wave rectification – Compression – Low-pass filtering This conducts a non-linear processing – Generate new harmonic partials – Associated with missing fundamentals

15 Example of Auditory Filter Bank (Correlogram) 14 Piano (Chromatic Scale)

16 Example of Auditory Filter Bank (Correlogram) 15 Aerosmith “Jaded”

17 Tools Audio Editor and Analysis – Audacity – Adobe Audition – Praat – SonicVisualizer – SndTool Software Library – Constant-Q transform Toolbox (Matlab): http://www.cs.tut.fi/sgn/arg/CQT/ http://www.cs.tut.fi/sgn/arg/CQT/ – Auditory Toolbox (Matlab): https://engineering.purdue.edu/~malcolm/interval/1998-010/ https://engineering.purdue.edu/~malcolm/interval/1998-010/ – Auditory Image Model (C++): https://code.google.com/p/aimc/ https://code.google.com/p/aimc/ 16

18 References Constant-Q transform – J.C. Brown, “Calculation of a constant Q transform”, 1991 – Schörkhuber and Klapuri, “Constant-Q transform toolbox for music processing”, 2010 – M. Dörfler, N. Holighaus, T. Grill and G. Velasco,“ Constructing an Invertible Constant-Q Transform with Non-stationary Gabor Frames,” 2011 Auditory Filter bank – R.D. Patterson, M.H. Allerhand, C. Giguere, “Time‐domain modeling of peripheral auditory processing: A modular architecture and a software platform”, 1995 – R. F. Lyon, “Machine Hearing: An Emerging Field”, 2010 – R. F. Lyon, A. C. Katsiamis, and E. M. Drakakis, "History and future of auditory filter models,” 2010 – R. F. Lyon, “Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function”, 2011 17


Download ppt "GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1."

Similar presentations


Ads by Google