Download presentation
Presentation is loading. Please wait.
Published byBlaise Hardy Modified over 8 years ago
1
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang
2
Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP
3
Pitch ( 音高) zDefinition of pitch yFundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform yPitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) zCharacteristics of pitch yNoise and unvoiced sounds do not have pitch.
4
Pitch Tracking ( 音高追蹤 ) z Pitch tracking (PT): The process of computing the pitch vector of a give audio segment ( 對整段音訊求 取音高 ) z Sample applications y Query by singing/humming ( 哼唱選歌 ) y Tone recognition for Mandarin ( 華語的音調辨識 ) y Intonation scoring for English ( 英語的音調評分 ) y Stress detection in English word ( 英語單字的重音偵測 ) y Text-to-speech synthesis ( 語音合成 ) y Pitch scaling and duration modification ( 音高調節與音長 改變 ) Quiz!
5
Frame Blocking Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec = 50 pitches/serc = Pitch rate Zoom in Overlap Frame
6
Typical Steps for Pitch Tracking zMain processing yFrame blocking yPDF (periodicity detection function) computation yPitch candidates via max picking over PDF yPitch refinement via parabolic interpolation zPre-processing yFiltering yExcitation extraction zPost-processing yUnreliable pitch removal via volume/clarity thresholding yPitch smoothing via median filters, etc. Frame based Segment based
7
Periodicity Detection Functions (PDF) zUse PDF to detect the period of a waveform zTwo types of PDF y Time domain ( 時域 ) xACF (Autocorrelation function) xAMDF (Average magnitude difference function) y Frequency domain ( 頻域 ) xHarmonic product spectrum xCepstrum
8
ACF: Auto-correlation Function Shifted frame s(t- ): Original frame s(t): =30 acf(30) = inner product of the overlap part Pitch period To play safe, the frame size needs to cover at least two fundamental periods! 0-index based, [s(0), s(1), …, s(n-1)] Quiz!
9
ACF: Formula 1 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t- ): s(t): s(t- ) t s(t) Shift to right Quiz!
10
ACF: Formula 2 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t+ ): s(t): s(t+ ) t s(t) Shift to left This formula is the same as the previous one! Quiz!
11
Example of ACF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yMax of ACF occurs at index 131 yFF = 16000/131 = 123.077 Hz zframe2acf01.mframe2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.
12
Locating the Pitch Point zIf human’s FF range is [40, 1000], then the interval for locating fundamental period (FP) is: zframe2acfPitchPoint01.mframe2acfPitchPoint01.m Index: 0 Index: FP Sample rate Quiz!
13
What Could Go Wrong? zThe human pitch range could go wrong yPitch too high xVitas (local short clip)Vitaslocal short clip xWhistlingWhistling yLow-pitch singing/humming requires a big frame size to cover at least two fundamental periods
14
Example of ACF Based PT zSpecs ySample rate = 11025 Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = 31.25 f/s zPlayback yOriginal singingOriginal singing yPitch by ACFPitch by ACF zwave2pitchByAcf01.mwave2pitchByAcf01.m
15
Example of ACF Based PT (II) zNote yThe previous script is simplified by calling pitchTrackBasic.m in SAP toolbox. zptByAcf01.mptByAcf01.m
16
Demo of ACF-based PT zReal-time display of ACF for pitch tracking ygoPtByAcf.mdl under SAP toolbox zReal-time pitch tracking for mic input ygoPtByAcf2.mdl under SAP toolbox
17
ACF Variants to Avoid Tapering zNormalized version zframe2acf02.mframe2acf02.m zHalf-frame shifting zframe2acf03.mframe2acf03.m method=2method=3
18
NSDF: ACF Variant with Normalize Range zNSDF: normalized squared difference function yFormula: yA variant of ACF within the range [-1 1], based on the inequality:
19
NSDF Example zframe2nsdf01.mframe2nsdf01.m Clarity: height of the pitch point
20
AMDF: Average Magnitude Difference Function Shifted frame s(i- ): Original frame s(i): =30 30 amdf(30) = sum of abs. difference of the overlap part Pitch period Quiz!
21
Comparison between ACF & AMDF zFormulas yACF: yAMDF: zTwo major advantages of AMDF over ACF yAMDF requires less computing power yAMDF is less likely to run into the risk of overflow Quiz!
22
Example of AMDF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is harder to determine zframe2amdf01.mframe2amdf01.m Index 0 Index 131
23
Example of AMDF to Pitch zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is determined correctly yFF = 16000/131 = 123.077 Hz zframe2amdf4pt01.mframe2amdf4pt01.m Index 0 Index 131
24
Example of AMDF Based PT zSpecs ySample rate = 11025 Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = 31.25 f/s zPlayback yOriginal singingOriginal singing yPitch by AMDFPitch by AMDF zptByAmdf01.mptByAmdf01.m
25
AMDF: Variations to Avoid Tapering zNormalized version zframe2amdf02.mframe2amdf02.m zHalf-frame shifting zframe2amdf03.mframe2amdf03.m method=2method=3
26
Combining ACF and AMDF ACF AMDF Frame ACF/AMDF
27
Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz - 1047 Hz ( - ) Quiz!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.