Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Slides:



Advertisements
Similar presentations
Dynamic Time Warping (DTW)
Advertisements

Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.
Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Intro. to Audio Signals Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
LE 460 L Acoustics and Experimental Phonetics L-13
Lab #8 Follow-Up: Sounds and Signals* * Figures from Kaplan, D. (2003) Introduction to Scientific Computation and Programming CLI Engineering.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
NM7613: Music Signal Analysis and Retrieval 音樂訊號分析與檢索 Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
2015/9/111 Introduction to ISMIR/MIREX J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.
Principal Component Analysis (PCA)
2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ.
Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
National Taiwan University
Digital Linear Filters 張智星 (Roger Jang) 多媒體資訊檢索實驗室 清華大學 資訊工程系.
Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic.
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
On a Generalization of the GCD for Intervals in R + Stan BaggenJune 4, 2014 or how can a camera see at least 1 tone for unkown T exp.
Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Binary Search Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
1/20 System Overview Cyclic mo-cap data (walking, running..) Cyclic mo-cap data (walking, running..) Music / Sound (audio) Music / Sound (audio) Resulting.
Music Information Retrieval: Overview and Challenges
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.
Some Research Activities in MIR Lab J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin EE445S Real-Time Digital Signal Processing Lab Spring.
Introduction to Music Information Retrieval (MIR)
Introduction to ISMIR/MIREX
Onset Detection, Tempo Estimation, and Beat Tracking
Search in Google's N-grams
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Quadratic Classifiers (QC)
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
DP for Optimum Strategies in Games
Query by Singing/Humming via Dynamic Programming
Discrete Fourier Transform (DFT)
Sampling and Aliasing Prof. Brian L. Evans
Introduction to Pattern Recognition
Singing Voice Separation via Active Noise Cancellation 使用主動式雜訊消除於歌聲分離
MART: Music Assisted Running Trainer
ASRA: Automatic Speech Recognition & Assessment
Vocoders.
自我介紹 學歷: 研究方向: 經歷: 1984:學士,台大電機系 1992:博士,加州大學柏克萊分校、電機電腦系
National Taiwan University
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Introduction to Music Information Retrieval (MIR)
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Search in OOXX Games J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Introduction to Music Information Retrieval (MIR)
Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)
Signal Processing and Data Analysis Simon Godsill Lent 2015
Endpoint Detection ( 端點偵測)
Query by Singing/Humming via Dynamic Programming
Scientific Computing: Closing 科學計算:結語
Game Trees and Minimax Algorithm
Duration & Pitch Modification via WSOLA
Pre and Post-Processing for Pitch Tracking
Presentation transcript:

Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP

Pitch ( 音高) zDefinition of pitch yFundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform yPitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) zCharacteristics of pitch yNoise and unvoiced sounds do not have pitch.

Pitch Tracking ( 音高追蹤 ) z Pitch tracking (PT): The process of computing the pitch vector of a give audio segment ( 對整段音訊求 取音高 ) z Sample applications y Query by singing/humming ( 哼唱選歌 ) y Tone recognition for Mandarin ( 華語的音調辨識 ) y Intonation scoring for English ( 英語的音調評分 ) y Stress detection in English word ( 英語單字的重音偵測 ) y Text-to-speech synthesis ( 語音合成 ) y Pitch scaling and duration modification ( 音高調節與音長 改變 ) Quiz!

Frame Blocking Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = = 320 samples Frame rate = 16000/320 = 50 frames/sec = 50 pitches/serc = Pitch rate Zoom in Overlap Frame

Typical Steps for Pitch Tracking zMain processing yFrame blocking yPDF (periodicity detection function) computation yPitch candidates via max picking over PDF yPitch refinement via parabolic interpolation zPre-processing yFiltering yExcitation extraction zPost-processing yUnreliable pitch removal via volume/clarity thresholding yPitch smoothing via median filters, etc. Frame based Segment based

Periodicity Detection Functions (PDF) zUse PDF to detect the period of a waveform zTwo types of PDF y Time domain ( 時域 ) xACF (Autocorrelation function) xAMDF (Average magnitude difference function) y Frequency domain ( 頻域 ) xHarmonic product spectrum xCepstrum

ACF: Auto-correlation Function Shifted frame s(t-  ): Original frame s(t):  =30 acf(30) = inner product of the overlap part  Pitch period To play safe, the frame size needs to cover at least two fundamental periods! 0-index based, [s(0), s(1), …, s(n-1)] Quiz!

ACF: Formula 1 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t-  ): s(t):  s(t-  ) t s(t) Shift to right Quiz!

ACF: Formula 2 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t+  ): s(t):  s(t+  ) t s(t) Shift to left This formula is the same as the previous one! Quiz!

Example of ACF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yMax of ACF occurs at index 131 yFF = 16000/131 = Hz zframe2acf01.mframe2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.

Locating the Pitch Point zIf human’s FF range is [40, 1000], then the interval for locating fundamental period (FP) is: zframe2acfPitchPoint01.mframe2acfPitchPoint01.m Index: 0 Index: FP Sample rate Quiz!

What Could Go Wrong? zThe human pitch range could go wrong yPitch too high xVitas (local short clip)Vitaslocal short clip xWhistlingWhistling yLow-pitch singing/humming  requires a big frame size to cover at least two fundamental periods

Example of ACF Based PT zSpecs ySample rate = Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = f/s zPlayback yOriginal singingOriginal singing yPitch by ACFPitch by ACF zwave2pitchByAcf01.mwave2pitchByAcf01.m

Example of ACF Based PT (II) zNote yThe previous script is simplified by calling pitchTrackBasic.m in SAP toolbox. zptByAcf01.mptByAcf01.m

Demo of ACF-based PT zReal-time display of ACF for pitch tracking ygoPtByAcf.mdl under SAP toolbox zReal-time pitch tracking for mic input ygoPtByAcf2.mdl under SAP toolbox

ACF Variants to Avoid Tapering zNormalized version zframe2acf02.mframe2acf02.m zHalf-frame shifting zframe2acf03.mframe2acf03.m method=2method=3

NSDF: ACF Variant with Normalize Range zNSDF: normalized squared difference function yFormula: yA variant of ACF within the range [-1 1], based on the inequality:

NSDF Example zframe2nsdf01.mframe2nsdf01.m Clarity: height of the pitch point

AMDF: Average Magnitude Difference Function Shifted frame s(i-  ): Original frame s(i):  =30 30 amdf(30) = sum of abs. difference of the overlap part  Pitch period Quiz!

Comparison between ACF & AMDF zFormulas yACF: yAMDF: zTwo major advantages of AMDF over ACF yAMDF requires less computing power yAMDF is less likely to run into the risk of overflow Quiz!

Example of AMDF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is harder to determine zframe2amdf01.mframe2amdf01.m Index 0 Index 131

Example of AMDF to Pitch zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is determined correctly yFF = 16000/131 = Hz zframe2amdf4pt01.mframe2amdf4pt01.m Index 0 Index 131

Example of AMDF Based PT zSpecs ySample rate = Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = f/s zPlayback yOriginal singingOriginal singing yPitch by AMDFPitch by AMDF zptByAmdf01.mptByAmdf01.m

AMDF: Variations to Avoid Tapering zNormalized version zframe2amdf02.mframe2amdf02.m zHalf-frame shifting zframe2amdf03.mframe2amdf03.m method=2method=3

Combining ACF and AMDF ACF AMDF Frame ACF/AMDF

Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz Hz ( - ) Quiz!