Pitch Tracking + Prosody January 20, 2009 The Plan for Today One announcement: On Thursday, we’ll meet in the Tri-Faculty Computer Lab (SS 018) Section.

Slides:



Advertisements
Similar presentations
Digital Signal Processing
Advertisements

Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Spectral Analysis Feburary 24, 2009 Sorting Things Out 1.TOBI transcription homework rehash. And some structural reminders. 2.On Thursday: back in the.
Frequency, Pitch, Tone and Length October 15, 2012 Thanks to Chilin Shih for making some of these lecture materials available.
1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.
INTONATION Chapters 15 & 16.
Suprasegmentals The term suprasegmental refers to those properties of an utterance which aren't properties of any single segment. The following are usually.
Motor Control Strategies for Chinese Intonation Greg Kochanski (University of Oxford, UK) Chilin Shih (University of Illinois, Urbana-Champaign) Tan Lee.
Syllables Most of us have an intuitive feeling about syllables No doubt about the number of syllables in the majority of words. However, there is no agreed.
Syllables and Stress, part II October 22, 2012 Potentialities There are homeworks to hand back! Production Exercise #2 is due at 5 pm today! First off:
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
PHONETICS AND PHONOLOGY
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Accent Profile Qin Yan Dept of Electronic & Computer Engineering, Brunel University November, 2002.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Syllables and Stress October 21, 2009 Syllables “defined” “Syllables are necessary units in the organization and production of utterances.” (Ladefoged,
Chapter three Phonology
Syllables and Stress October 25, 2010 Practicalities Some homeworks to return… Review session on Wednesday. Mid-term on Friday. Note: transcriptions.
Intonation September 18, 2014 The Plan for Today Also: I have posted a couple of readings on TOBI (an intonation transcription system) to the course.
STUDY OF ENGLISH STRESS AND INTONATION
Phonology, phonotactics, and suprasegmentals
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Source/Filter Theory and Vowels February 4, 2010.
Phonetics and Phonology
Voice Quality Feburary 11, 2013 Practicalities Course project reports to hand in! And the next set of guidelines to hand out… Also: the mid-term is on.
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Automatic Pitch Tracking January 16, 2013 The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Intonation January 21, 2014 The Plan for Today There’s a DSP exercise for you to work on! Due next Thursday. Also: I have posted a couple of readings.
VOT + Suprasegmentals April 8, 2010 Announcements Next Tuesday--Silke and Jon will be presenting. Any order preferences? I may have a few things to say.
Syllables and Stress October 19, 2012 Practicalities Mid-sagittal diagrams to turn in! Plus: homeworks to hand back. Production Exercise #2 is still.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Pitch Tracking + Prosody January 17, 2012 The Plan for Today One announcement: On Thursday, we’ll meet in the Craigie Hall D 428 We’ll be working on.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
Syllables and Stress October 25, 2010 Practicalities Some homeworks to return… Review session on Wednesday. Mid-term on Friday. Note: transcriptions.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
TOBI, continued (continued) February 2, 2010 Languages! Polish2 Tagalog2 Urdu Spanish Afrikaans Korean Gujarati Italian Russian Swedish Also: Perception.
Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude.
A Fully Annotated Corpus of Russian Speech
Frequency, Pitch, Tone and Length February 12, 2014 Thanks to Chilin Shih for making some of these lecture materials available.
TOBI Basics April 13, 2010.
INTONATION (Chapter 17).
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Phonetics, part III: Suprasegmentals October 19, 2012.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Phonation + Voice Quality Feburary 11, 2014 Weekday Update Course project report #2 is due right now! I have guidelines for course project report #3,
Syllables and Stress October 21, 2015.
Voicing + Basic Acoustics October 14, 2015 Agenda Production Exercise #2 is due on Friday! No transcription exercise this Friday! Today, we’ll begin.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Phonetics: consonants
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Against formal phonology (Port and Leary).  Generative phonology assumes:  Units (phones) are discrete (not continuous, not variable)  Phonetic space.
Phonetics, part III: Suprasegmentals October 18, 2010.
TOBI, continued January 29, 2008 The Outlook 1.Return course project reports. 2.New course schedule. 3.Today: Continue the discussion of English Intonation.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
Exploring Shared pitch dimensions in music & Mandarin Chinese
Suprasegmental features and Prosody
(2) Suprasegmentals The features such as pitch, stress, and length, which are used simultaneously with units larger than segments, are called “suprasegmentals.”
SUPRASEGMENTAL PHONEME
Phonetics SPAU 3343 Chap. 10 – Grasping the melody of language
Presentation transcript:

Pitch Tracking + Prosody January 20, 2009

The Plan for Today One announcement: On Thursday, we’ll meet in the Tri-Faculty Computer Lab (SS 018) Section 1 We’ll be working on intonation transcription… 1.Automatic Pitch Tracking 2.(Brief) suprasegmentals review 3.The basics of English intonation

The Thin Blue Line The blue line represents the fundamental frequency (F0) of the speaker’s voice. Also known as a pitch track How can we automatically “track” F0 in a sample of speech? Praat can give us a representation of speech that looks like:

Pitch Tracking Voicing: Air flow through vocal folds Rapid opening and closing due to Bernoulli Effect Each cycle sends an acoustic shockwave through the vocal tract …which takes the form of a complex wave. The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.

Voicing Bars

Individual glottal pulses

Voicing = Complex Wave Note: voicing is not perfectly periodic. …always some random variation from one cycle to the next. How can we measure the fundamental frequency of a complex wave?

The basic idea: figure out the period between successive cycles of the complex wave. Fundamental frequency = 1 / period duration = ???

Measuring F0 To figure out where one cycle ends and the next begins… The basic idea is to find how well successive “chunks” of a waveform match up with each other. One period = the length of the chunk that matches up best with the next chunk. Automatic Pitch Tracking parameters to think about: 1.Window size (i.e., chunk size) 2.Step size 3.Frequency range (= period range)

Window (Chunk) Size Here’s an example of a small window

Window (Chunk) Size Here’s an example of a large(r) window

Initial window of the waveform is compared to another window (of the same size) at a later point in the waveform

Matching The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match ???

Autocorrelation The measure of correlation = Sum of the point-by-point products of the two chunks. The technical name for this is autocorrelation… because two parts of the same wave are being matched up against each other.

Autocorrelation Example Ex: consider window x, with n samples… What’s its correlation with window y? (Note: window y must also have n samples) x 1 = first sample of window x x 2 = second sample of window x … x n = nth (final) sample of window x y 1 = first sample of window y, etc. Correlation (R) = x 1 *y 1 + x 2 + y 2 + … + x n * y n The larger R is, the better the correlation.

By the Numbers Sample x y product Sum of products = -.48 These two chunks are poorly correlated with each other.

By the Numbers, part 2 Sample x y product Sum of products = 1.26 These two chunks are well correlated with each other. (or at least better than the previous pair) Note: matching peaks count for more than matches close to 0.

Back to (Digital) Reality The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match ??? These two windows are poorly correlated

Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

The distance the algorithm moves forward in the waveform is called the step size “step”

Matching, again The next window gets compared to the original. ???

Matching, again The next window gets compared to the original. ??? These two windows are also poorly correlated

The algorithm keeps chugging and, eventually… another “step”

Matching, again The best match is found. ??? These two windows are highly correlated

The fundamental period can be determined by the calculating the length of time between window 1 and window 2. period

Frequency is 1 / period Q: How many possible periods does the algorithm need to check? Frequency range (default in Praat: 75 to 600 Hz) Mopping up

Moving on Another comparison window is selected and the whole process starts over again.

would Uhm I like A flight to Seattle from Albuquerque The algorithm ultimately spits out a pitch track. This one shows you the F0 value at each step. Thanks to Chilin Shih for making these materials available

Pitch Tracking in Praat Play with F0 range. Create Pitch Object. Also go To Manipulation…Pitch. Also check out:

Summing Up Pitch tracking uses three parameters 1.Window size Ensures reliability In Praat, the window size is always three times the longest possible period. E.g.: 3 X 1/75 =.04 sec. 2.Step size For temporal accuracy 3.Frequency range Reduces computational load

Deep Thought Questions What might happen if: The shortest period checked is longer than the fundamental period? AND two fundamental periods fit inside a window? Potential Problem #1: Pitch Halving The pitch tracker thinks the fundamental period is twice as long as it is in reality.  It estimates F0 to be half of its actual value

Pitch Halving pitch is halved Check out normal file in Praat.

More Deep Thoughts What might happen if: The shortest period checked is less than half of the fundamental period? AND the second half of the fundamental cycle is very similar to the first? Potential Problem #2: Pitch doubling The pitch tracker thinks the fundamental period is half as long as it actually is.  It estimates the F0 to be twice as high as it is in reality.

Pitch Doubling pitch is doubled

Microperturbations Another problem: Speech waveforms are partly shaped by the type of segment being produced. Pitch tracking can become erratic at the juncture of two segments. In particular: voiced to voiceless segments sonorants to obstruents These discontinuities in F0 are known as microperturbations. Also: transitions between modal and creaky voicing tend to be problematic.

Back to Language F0 is important because it can be used by languages to signal differences in meaning. Note: Acoustic=Fundamental Frequency Perceptual=Pitch Linguistic=Tone

A Typology F0 is generally used in three different ways in language: 1. Tone languages (Chinese, Navajo, Igbo) Lexically determined tone on every syllable “Syllable-based” tone languages 2. Accentual languages (Japanese, Swedish) The location of an accent in a particular word is lexically marked. “Word-based” tone languages 3. Stress languages (English, Russian) It’s complicated.

Mandarin Tone ma1: mother ma2: hemp ma3: horse ma4: to scold Mandarin (Chinese) is a classic example of a tone language.

How to Transcribe Tone Tones are defined by the pattern they make through a speaker’s frequency range. The frequency range is usually assumed to encompass five levels (1-5). (although this can vary, depending on the language) Highest F0 Lowest F0

In Mandarin, tones span a frequency range of 1-5 Each tone is denoted by its (numerical) path through the frequency range Each syllable can also be labeled with a tone number (e.g., ma 1, ma 2, ma 3, ma 4 ) Tone

How to Transcribe Tone Tone is relative i.e., not absolute Each speaker has a unique frequency range. For example: Highest F0 Lowest F0 FemaleMale 100 Hz 200 Hz350 Hz 150 Hz

General Relativity In ordinary conversation, for European languages (Fant, 1956) : Men have an average F0 of 120 Hz A range of Hz Women have an average F0 of 220 Hz A range of Hz Children have an average F0 of 330 Hz In a normal utterance, the F0 range is usually one octave. i.e., highest F0 = 2 * lowest F0

Relativity, in Reality The same tones may be denoted by completely different frequencies, depending on the speaker.  Tone is an abstract linguistic unit. female speaker male speaker ma, tone 1 (55)

Accent Languages In accent languages, there is only one pitch accent associated with each word. The pitch accent is realized on only one syllable in the word. The other syllables in the word can have no accent. Accent is lexically determined, so there can be minimal pairs. Japanese is a pitch accent language… for some, but not all, words for some, but not all, dialects

Japanese Japanese words have one High accent it attaches to one “mora” in the word A mora = a vowel, or a consonant following a vowel, within a syllable. For example: [ni] ‘two’has one mora. [san] ‘three’ has two morae. The first mora, if not accented, has a Low F0. Morae following the accent have Low F0. It’s actually slightly more complicated than this; for more info, see:

Japanese Examples asa‘morning’H-L asa‘hemp’L-H

“chopsticks”H-L-L “bridge”L-H-L “edge”L-H-H

Stress Languages Stress is a suprasegmental property that applies to whole syllables. It is defined by more than just differences in F0. Stressed syllables are higher in pitch (usually) Stressed syllables are longer (usually) Stressed syllables are louder (usually) Stressed syllables reflect more phonetic effort. More aspiration, less coarticulation in stressed syllables. Vowels often reduce to schwa in unstressed syllables. The combination of these factors give stressed syllables more prominence than unstressed syllables.

Stress: Pitch (N) (V) Complicating factor: pitch tends to drift downwards at the end of utterances

Intonation Languages superimpose pitch contours on top of word- based stress or tone distinctions. This is called intonation. It turns out that English: has word-based stress and phrase-based pitch accents (intonation) The pitch accents are pragmatically specified, rather than lexically specified. =they change according to discourse context.

English Intonation We’ll analyze English intonation with a framework called TOBI Tones and Break Indices Note: intonational patterns vary across dialects The patterns and examples presented today might not match up with your own intonational system Also: this framework has only been applied to a few (primarily western) languages There’s more info at Course in Phonetics, pp

Levels of Prominence In English, pitch accents align with stressed syllables. Example: “exploitation” vowelX X X X full vowelX X X stressX X pitch accent X Normally, the accent falls on the last stressed syllable.

Pitch Accent Types In English, pitch accents can be either high or low H* or L* Examples:High (H*)Low (L*) Yes.Yes? H* L* Magnification.Magnification? As with tones in tone languages, “high” and “low” pitch accents are defined relative to a speaker’s pitch range. My pitch range: H* = 155 HzL* = 100 Hz Mary Beckman: H* = 260 HzL* = 130 Hz

Whole Utterances The same pitch pattern can apply to an entire sentence: H* H*:Manny came with Anna. L* L*:Manny came with Anna? H* H*:Marianna made the marmalade. L* L*:Marianna made the marmalade?

Information Note that there’s a tendency to accent new information in the discourse. 4 different patterns for 4 different contexts: H* H*:Manny came with Anna. H* H*:Manny came with Anna. L* L*:Manny came with Anna? L* L*:Manny came with Anna?

Pitch Tracking H* is usually associated with a peak in F0; L* is usually associated with a valley (trough) in F0 Pitch tracking can help with the identification of pitch peaks and valleys. Note: it’s easier to analyze utterances with lots of sonorants. Check out both productions of “Manny came with Anna” in Praat. Note that there is more to the intonation contour than just pitch peaks and valleys The H* is followed by a falling pitch pattern The L* is followed by a rising pitch pattern

Tone Transcription L* H%

Tone Types There are two types of tones at play: 1.Pitch Accents associated with a stressed syllable may be either High (H) or Low (L) marked with a * 2.Boundary Tones appear at the end of a phrase not associated with a particular syllable may be either High (H) or Low (L) marked with a %

Phrases Intonation organizes utterances into phrases “chunks” Boundary tones mark the end of intonational phrases Intonational phrases are the largest phrases In the transcription of intonation, phrase boundaries are marked with Break Indices Hence, TOBI: Tones and Break Indices Break Indices are denoted by numbers 1 = break between words 4 = break between intonational phrases

Break Index Transcription Tones:L* H% Breaks: