Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Slides:



Advertisements
Similar presentations
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Advertisements

Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Intro. to Audio Signals Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
A PRESENTATION BY SHAMALEE DESHPANDE
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.
Representing Acoustic Information
Sound Applications Advanced Multimedia Tamara Berg.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Numerical algorithms for power system protection Prof. dr. sc. Ante Marušić, doc. dr. sc. Juraj Havelka University of Zagreb Faculty of Electrical Engineering.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
National Taiwan University
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
1 Prof. Nizamettin AYDIN Digital Signal Processing.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –
Music Information Retrieval: Overview and Challenges
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Query by Singing and Humming System
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin EE445S Real-Time Digital Signal Processing Lab Spring.
Onset Detection, Tempo Estimation, and Beat Tracking
Lecture 6 Periodic Signals, Harmonics & Time-Varying Sinusoids
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
PATTERN COMPARISON TECHNIQUES
Query by Singing/Humming via Dynamic Programming
Discrete Fourier Transform (DFT)
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Vocoders.
自我介紹 學歷: 研究方向: 經歷: 1984:學士,台大電機系 1992:博士,加州大學柏克萊分校、電機電腦系
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Linear Predictive Coding Methods
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Linear Prediction.
Endpoint Detection ( 端點偵測)
CEN352, Dr. Ghulam Muhammad King Saud University
Query by Singing/Humming via Dynamic Programming
Duration & Pitch Modification via WSOLA
ENEE222 Elements of Discrete Signal Analysis Lab 9 1.
Music Signal Processing
Pre and Post-Processing for Pitch Tracking
Presentation transcript:

Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Pitch ( 音高) zDefinition of pitch yFundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform yPitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) zCharacteristics of pitch yNoise and unvoiced sounds do not have pitch.

Pitch Tracking ( 音高追蹤 ) z Pitch tracking (PT): The process of computing the pitch vector of a give audio segment ( 對整段音訊求 取音高 ) z Sample applications y Query by singing/humming ( 哼唱選歌 ) y Tone recognition for Mandarin ( 華語的音調辨識 ) y Intonation scoring for English ( 英語的音調評分 ) yProsody analysis for speech synthesis ( 語音合成中的韻律 分析 ) y Pitch scaling and duration modification ( 音高調節與長度 改變 )

Typical Steps for Pitch Tracking zPre-processing yFiltering yExcitation extraction zMain processing yFrame blocking yPDF (periodicity detection function) computation yPitch candidates via max picking over PDF zPost-processing yUnreliable pitch removal via volume/clarity thresholding yPitch refinement via parabolic interpolation yPitch smoothing via median filters, etc.

Frame Blocking Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Overlap Frame

Periodicity Detection Functions zPDF (periodicity detection function) is used to detect the period of a waveform zTwo categories of PDF y Time domain ( 時域 ) xACF (Autocorrelation function) xNSDF (Normalized squared difference function) xAMDF (Average magnitude difference function) y Frequency domain ( 頻域 ) xHarmonic product spectrum xCepstrum

ACF: Auto-correlation Function Shifted frame s(t-  ): Original frame s(t):  =30 acf(30) = inner product of the overlap part  Pitch period To play safe, the frame size needs to cover at least two fundamental periods! 0-index based, [s(0), s(1), …, s(n-1)] Quiz candidate! Quiz candidate!

ACF: Formula 1 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t-  ): s(t):  s(t-  ) t s(t) Shift to right

ACF: Formula 2 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t+  ): s(t):  s(t+  ) t s(t) Shift to left This formula is the same as the previous one!

Example of ACF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yMax of ACF occurs at index 131 yFF = 16000/131 = Hz zframe2acf01.mframe2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.

Locating the Pitch Point zIf the range of human’s FF is [40, 1000], then we have the interval for locating fundamental period (FP): zframe2acfPitchPoint01.mframe2acfPitchPoint01.m Index: 0 Index: FP Quiz candidate! Sample rate

Locating the Fundamental Period (II) zThe human pitch range could go wrong yPitch too high xVitas (local short clip)Vitaslocal short clip xWhistlingWhistling yLow-pitch singing/humming  requires a big frame size

Example of ACF Based PT zSpecs ySample rate = Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = f/s zPlayback yOriginal singingOriginal singing yPitch by ACFPitch by ACF zwave2pitchByAcf01.mwave2pitchByAcf01.m

Example of ACF Based PT (II) zSpecs yThe previous script is converted into a function pitchTrackingSimple.m for easy access. zptByAcf01.mptByAcf01.m

Demo of ACF-based PT zReal-time display of ACF for pitch tracking ygoPtByAcf.mdl under SAP toolbox zReal-time pitch tracking for mic input ygoPtByAcf2.mdl under SAP toolbox

ACF Variants to Avoid Tapering zNormalized version zframe2acf02.mframe2acf02.m zHalf-frame shifting zframe2acf03.mframe2acf03.m method=2method=3

NSDF: ACF Variant with Normalize Range zNSDF: normalized squared difference function yFormula: yA variant of ACF within the range [-1 1], based on the inequality:

NSDF Example zframe2nsdf01.mframe2nsdf01.m Clarity: height of the pitch point

AMDF: Average Magnitude Difference Function Shifted frame s(i-  ): Original frame s(i):  =30 30 amdf(30) = sum of abs. difference of the overlap part  Pitch period Quiz candidate!

Comparison between ACF & AMDF zFormulas yACF: yAMDF: zTwo major advantages of AMDF over ACF yAMDF requires less computing power yAMDF is less likely to have the risk of overflow Quiz candidate!

Example of AMDF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is harder to determine zframe2amdf01.mframe2amdf01.m Index 0 Index 131

Example of AMDF to Pitch zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is determined correctly yFF = 16000/131 = Hz zframe2amdf4pt01.mframe2amdf4pt01.m Index 0 Index 131

Example of AMDF Based PT zSpecs ySample rate = Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = f/s zPlayback yOriginal singingOriginal singing yPitch by AMDFPitch by AMDF zptByAmdf01.mptByAmdf01.m

AMDF: Variations to Avoid Tapering zNormalized version zframe2amdf02.mframe2amdf02.m zHalf-frame shifting zframe2amdf03.mframe2amdf03.m method=2method=3

Combining ACF and AMDF ACF AMDF Frame ACF/AMDF

Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP

Audio Features in Frequency Domain zEnergy: Sum of power spectrum zPitch: Distance between harmonics zTimber: Smoothed spectrum Second formant F2 First formant F1 Pitch freq Energy

About DFT & FFT zTerminology yDFT: Discrete Fourier transform yFFT: Fast Fourier transform, which is an efficient method for computing DFT zMore about DFTMore about DFT

Harmonic Product Spectrum (HPS) zProcedure 1.Compute the power spectrum of a frame 2.Eliminate its trend obtained from 20-order polynomial fitting  Formants are removed 3.Apply exponential weighting to suppress high- frequency harmonics 4.Down sample and add to enhance the harmonics at the fundamental frequency 5.Find the max as the pitch point

“Down Sample and Add” in HPS

Example of HPS xframe2hps01.mframe2hps01.m

Example of PT by HPS xptByHps01.mptByHps01.m

PT by Cepstrum zFormula for cepstrum zProcedure for PT by cepstrum 1.Compute the power spectrum of a frame. 2.Eliminate the trend of the power spectrum if necessary. 3.Take the inverse FFT on the (symmetric) power spectrum. (The result is real, why?) 4.Find position of the max to compute the pitch.

PT by Cepstrum: How It Works? Close to sinusoids! This should be a single pulse only!

Example of Cepstrum xframe2ceps01.mframe2ceps01.m

Example of PT by Cepstrum xptByCeps01.mptByCeps01.m

Two Parts of PT zPT has two parts yVoicing detection xDecide if a frame has a melody pitch or not yPitch estimation xEstimate the most likely melody pitch of a frame zThese two parts can be performed in any order zPerformance evaluation of PT depends on these two parts

Performance Evaluation of PT zSeveral criteria for PT performance evaluation yRaw pitch accuracy xProb. of a correct pitch value (to within ±¼ tone or ±0.5 semitone) over the voiced frames yRaw chroma accuracy xProb. that the chroma (i.e. the note name) is correct over the voiced frames yOverall accuracy xProb. of a correct pitch value (via pitch estimation) and pitched decision (via voicing detection) over all frames

Preprocessing for Pitch Tracking zSome commonly used preprocessing for the audio signals before pitch tracking yPre-filtering the signals yClipping the signals ySIFT method for the signals

Preprocessing: Pre-filtering zObservation yRange of humans’ pitch: [40, 1000] zIdea yLow-pass the signals with a cutoff frequency between 800 and 1000 zCharacteristics yThe effect is yet to be verified

Preprocessing: Clipping zObservation ySmall signals near zero is likely to cause pitch tracking error zIdea yClip the signals zCharacteristics ySave computation for embedded system yOverall effect is yet to be verified

Preprocessing: SIFT zObservation yChannel effect is likely to cause pitch tracking error zIdea of SIFT (simple inverse filter tracking) yIdentify the excitation via LPC yUse the excitation for PDF zCharacteristics yOverall effect is yet to be verified

Example of SIFT zsiftAcf01.msiftAcf01.m

Example of PT based on SIFT & ACF zptBySiftAcf01.mptBySiftAcf01.m

Postprocessing for Pitch Tracking zSome commonly used postprocessing for pitch tracking ySmoothing to remove abrupt-changing pitch yInterpolation to increase pitch precision

Postprocessing: Smoothing zSmoothing by a median filter zptWithMedianFilter01.mptWithMedianFilter01.m

Postprocessing: Interpolation zIdea yUsing the pitch point and its neighbors to identify the max position zptWithParabolicFit01.mptWithParabolicFit01.m

48/44 UPDUDP (1/4) zUPDUDP: Unbroken Pitch Determination Using DP yGoal: To take pitch smoothness into consideration z : a given path in the AMDF matrix z : Number of frames z : Transition penalty z : Exponent of the transition difference Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.

UPDUDP (2/4) zOptimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j) zRecurrent formula: z Initial conditions : z Optimum cost :

Example of UPDUDP zA typical example (via AMDF)

Robustness of UPDUDP zInsensitivity in

Another Example of UPDUDP zExample of MATLAB code using UPDUDP (via ACF) zResult waveFile='arina_short.wav'; wObj=waveFile2obj(waveFile); ptOpt=ptOptSet(wObj.fs, wObj.nbits, 1); pitch=pitchTracking(wObj, ptOpt, 1);

Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz Hz ( - )

Unreliable Pitch Removal (1/2) zPitch removal via volume thresholding

Unreliable Pitch Removal (2/2) zPitch removal via volume/clarity thresholding

Rest Handling Rests are removed. Good for DTW. Rests are replaced by previous nonzero pitch. Good for LS. Original pitch vectors with rests.

Typical Result of Pitch Tracking Pitch tracking via autocorrelation for 茉莉花 (jasmine)

Comparison of Pitch Vectors Yellow line : Target pitch vector

Other Pitch Related Demos zPitch scaling ypitchShiftDemo/project1.exe ypitchShift-multirate/multirate.m