Automatic Pitch Tracking January 16, 2013
The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll be working on intonation transcription… The plan for today: Automatic Pitch Tracking On Friday: 1.(Brief) suprasegmentals review 2.The basics of English intonation
The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice. Also known as a pitch track How can we automatically “track” F0 in a sample of speech? Praat can give us a representation of speech that looks like:
Pitch Tracking Voicing: Air flow through vocal folds Rapid opening and closing due to Bernoulli Effect Each cycle sends an acoustic shockwave through the vocal tract …which takes the form of a complex wave. The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.
Voicing Bars
Individual glottal pulses
Voicing = Complex Wave Note: voicing is not perfectly periodic. …always some random variation from one cycle to the next. How can we measure the fundamental frequency of a complex wave?
The basic idea: figure out the period between successive cycles of the complex wave. Fundamental frequency = 1 / period duration = ???
Measuring F0 To figure out where one cycle ends and the next begins… The basic idea is to find how well successive “chunks” of a waveform match up with each other. One period = the length of the chunk that matches up best with the next chunk. Automatic Pitch Tracking parameters to think about: 1.Window size (i.e., chunk size) 2.Step size 3.Frequency range (= period range)
Window (Chunk) Size Here’s an example of a small window
Window (Chunk) Size Here’s an example of a large(r) window
Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform
Matching The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match ???
Autocorrelation The measure of correlation = Sum of the point-by-point products of the two chunks. The technical name for this is autocorrelation… because two parts of the same wave are being matched up against each other. (“auto” = self)
Autocorrelation Example Ex: consider window x, with n samples… What’s its correlation with window y? (Note: window y must also have n samples) x 1 = first sample of window x x 2 = second sample of window x … x n = nth (final) sample of window x y 1 = first sample of window y, etc. Correlation (R) = x 1 *y 1 + x 2 * y 2 + … + x n * y n The larger R is, the better the correlation.
By the Numbers Sample x y product Sum of products = -.48 These two chunks are poorly correlated with each other.
By the Numbers, part 2 Sample x z product Sum of products = 1.26 These two chunks are well correlated with each other. (or at least better than the previous pair) Note: matching peaks count for more than matches close to 0.
Back to (Digital) Reality The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match ??? These two windows are poorly correlated
Next: the pitch tracking algorithm moves further down the waveform and grabs a new window
The distance the algorithm moves forward in the waveform is called the step size “step”
Matching, again The next window gets compared to the original. ???
Matching, again The next window gets compared to the original. ??? These two windows are also poorly correlated
The algorithm keeps chugging and, eventually… another “step”
Matching, again The best match is found. ??? These two windows are highly correlated
The fundamental period can be determined by the calculating the length of time between the start of window 1 and the start of (well correlated) window 2. period
Frequency is 1 / period Q: How many possible periods does the algorithm need to check? Frequency range (default in Praat: 75 to 600 Hz) Mopping up
Moving on Another comparison window is selected and the whole process starts over again.
would Uhm I like A flight to Seattle from Albuquerque The algorithm ultimately spits out a pitch track. This one shows you the F0 value at each step. Thanks to Chilin Shih for making these materials available
Pitch Tracking in Praat Play with F0 range. Create Pitch Object. Also go To Manipulation…Pitch. Also check out:
Summing Up Pitch tracking uses three parameters 1.Window size Ensures reliability In Praat, the window size is always three times the longest possible period. E.g.: 3 X 1/75 =.04 sec. 2.Step size For temporal precision 3.Frequency range Reduces computational load
Deep Thought Questions What might happen if: The shortest period checked is longer than the fundamental period? AND two fundamental periods fit inside a window? Potential Problem #1: Pitch Halving The pitch tracker thinks the fundamental period is twice as long as it is in reality. It estimates F0 to be half of its actual value
Pitch Halving pitch is halved Check out normal file in Praat.