Multiresolution STFT for Analysis and Processing of Audio

Slides:



Advertisements
Similar presentations
Decibel values: sum and difference. Sound level summation in dB (1): Incoherent (energetic) sum of two different sounds: Lp 1 = 10 log (p 1 /p rif ) 2.
Advertisements

Frequency analysis.
DCSP-13 Jianfeng Feng
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
The evaluation and optimisation of multiresolution FFT Parameters For use in automatic music transcription algorithms.
AES 120 th Convention Paris, France, 2006 Adaptive Time-Frequency Resolution for Analysis and Processing of Audio Alexey Lukin AES Student Member Moscow.
Time-scale and pitch modification Algorithms review Alexey Lukin.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Report about polyphonic music transcription.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Time-Frequency Analysis of Non-stationary Phenomena in Electrical Engineering Antonio Bracale, Guido Carpinelli Universita degli Studi di Napoli “Federico.
Extensions of wavelets
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
DCSP-13 Jianfeng Feng Department of Computer Science Warwick Univ., UK
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Wavelet.
An Introduction to S-Transform for Time-Frequency Analysis S.K. Steve Chang SKC-2009.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Time and Frequency Representations Accompanying presentation Kenan Gençol presented in the course Signal Transformations instructed by Prof.Dr. Ömer Nezih.
2005/11/101 KOZ Scalable Audio Speaker: 陳繼大 An Introduction.
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Wavelet Transform 國立交通大學電子工程學系 陳奕安 Outline Comparison of Transformations Multiresolution Analysis Discrete Wavelet Transform Fast Wavelet Transform.
DEVON BRYANT CS 525 SEMESTER PROJECT Audio Signal MIDI Transcription.
Short Time Fourier Transform (STFT)
System Microphone Keyboard Output. Cross Synthesis: Two Implementations.
Multi-Resolution Analysis (MRA)
A PRE-STUDY OF AUTOMATIC DETECTION OF LEP EVENTS ON THE VLF SİGNALS.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Representing Acoustic Information
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Details, details… Intro to Discrete Wavelet Transform The Story of Wavelets Theory and Engineering Applications.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
README Lecture notes will be animated by clicks. Each click will indicate pause for audience to observe slide. On further click, the lecturer will explain.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Gammachirp Auditory Filter
“Digital stand for training undergraduate and graduate students for processing of statistical time-series, based on fractal analysis and wavelet analysis.
Wavelets and Multiresolution Processing (Wavelet Transforms)
Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Time Frequency Analysis
Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
AUDIOFILES Harika Basana ), Elizabeth Chan ), Nikolai ), Frank Zhang ) 6100.
Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.
The Story of Wavelets Theory and Engineering Applications
By Dr. Rajeev Srivastava CSE, IIT(BHU)
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
Signal acquisition A/D conversion Sampling rate  Nyquist-Shannon sampling theorem: If bandlimited signal x(f) holds in [-B;B], then if f s = 1 / T.
Short Time Fourier Transform (STFT) CS474/674 – Prof. Bebis.
Wavelet Transform Advanced Digital Signal Processing Lecture 12
Content: Distortion at electronic loads
Ch. 2 : Preprocessing of audio signals in time and frequency domain
CS 591 S1 – Computational Audio
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
FFT-based filtering and the
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Multiresolution STFT for Analysis and Processing of Audio Talk at B.U. Sept. 2010 Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA

Short-Time Fourier Transform Most commonly used transform for audio: Spectral analysis Noise reduction (spectral subtraction algorithm) Time-variable filters and other effects Very fast implementation for a large number of bands via FFT Good energy compaction for many musical signals Many oscillations in basis functions → ringing (Gibbs phenomenon) Uniform frequency resolution → inadequate resolution at low freqs. + – A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Short-Time Fourier Transform Spectrogram: displays evolution of spectrum in time A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Spectrograms Problems: Most perceptually meaningful energy is concentrated in a narrow band below 4 kHz → can’t see enough details Time/frequency resolution trade-off Conventional STFT spectrogram (linear frequency scale) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Problems: Poor frequency resolution at low frequencies → can’t separate bass harmonics from the bass drum Time/frequency resolution trade-off Mel-scale STFT spectrogram (window size = 12 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Problems: Poor time resolution at transients → time-smearing of drums and other percussive sounds Mel-scale STFT spectrogram (window size = 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Filter banks Idea: Decompositions of a time-frequency plane Decomposition Processing of subband signals Synthesis x[n] y[n] … f t STFT DWT Uncertainty principle A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Filter banks Perceptual coding of audio mp3 file x[n] FFT Filter bank Q Huffman Psychoacoustic model Diagram of an mp3 encoder A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Filter banks Window size switching (guided by transients detection) Transient Pre-echo Reduced pre-echo A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Proposed approach Transforms should vary their time-frequency resolution in a perceptually motivated way Imitation of time-frequency resolution of human hearing Adaptation of resolution to local signal features A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Simple solution: Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Simple solution: combine spectrograms with different resolutions Each spectrogram is computed on the same grid of time-frequency points (using zero padding) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Better approach: select best resolution for each time-frequency neighborhood Criteria? Better frequency resolution at bass (reflects a-priori psychoacoustical knowledge) Maximal energy compaction (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity) best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window size A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Calculation of sparsity (in a given block, for all T/F resolutions r) Here ai,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution r, r0 is the resolution with best sparsity. best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window sizes A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Benefits: Sharper bass drum hits and other transients, even in mid-frequency range Sharper guitar harmonics at high frequencies Adaptive resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms Simple solution: Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms More examples Conventional STFT spectrogram Tone onset waveform A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Spectrograms More examples Adaptive resolution spectrogram Combined resolution spectrogram A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Processing framework General framework for multi-resolution processing Perform processing with several different resolutions Adaptively combine (mix) results in a time-frequency space Mixing is controlled by a-priori knowledge of psychoacoustics and analysis of local signal features (e.g. transience or sparsity) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Noise reduction Spectral subtraction algorithm STFT of a noisy signal Estimate power spectrum of noise (manually or automatically) Subtract noise power spectrum from a signal power spectrum Inverse STFT STFT Noise spectrum estimation Inverse x[t] X[f,t] – W[f] S[f,t] s[t] A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Noise reduction Example of adaptive resolution Better frequency resolution at low frequencies (according to the resolution of human hearing) Better temporal resolution near signal transients (for reduction of Gibbs phenomenon) Spectral subtraction (short windows) of coefficients Mixer y[t] x3[t] (long windows) STFT Synthesis x1[t] x2[t] Transience analysis control A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

A. Lukin, J. Todd “Adaptive Time-Frequency Resolution” Noise reduction Results of single-resolution and multi-resolution algorithms Noisy recording (guitar + castanets) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Noise reduction Results of single-resolution and multi-resolution algorithms Single resolution Multi-resolution (notice less pre-ringing on transients) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

Conclusion When using STFT – do care about the window size! Choose the size wisely: Maximize sparsity (spactrogram sharpness) Account for human perception A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

? Your questions Demo web page: http://www.izotope.com/tech/aes_adapt/ A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”