Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.

Similar presentations


Presentation on theme: "CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a."— Presentation transcript:

1

2 CS 4705 Lecture 22 Intonation and Discourse

3 What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a statement or a question? –The speaker state Is the speaker getting angry, frustrated? In dialogue, information about: –The structure of the dialogue Is the user or the system trying to start a new topic? Is the speaker talking about given or new information? –The state of the interaction: Is the user having trouble being understood? Is the user having trouble understanding the system?

4 Current Trends New description schemes (e.g. ToBI) Corpus-based research and machine learning Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) Investigation of spontaneous speech phenomena and variation in speaking style Applications to CTS, ASR and SDS

5 Corpora Public and semi-public databases –ATIS, SwitchBoard, Call Home, Meetings (NIST/DARPA/LDC) –TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC (Harvard, AT&T) Private collections –Acquired for speech or dialogue research (August, KTH; Voicemail, AT&T, IBM) –Meetings, call centers, operator services, focus group collections The Web –Newscasts, radio

6 To(nes and)B(reak)I(ndices) Developed by prosody researchers in four meetings over 1991-94 Goals: –devise common labeling scheme for Standard American English that is robust and reliable –promote collection of large, prosodically labeled, shareable corpora ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....

7 Minimal ToBI transcription: –recording of speech –f0 contour –ToBI tiers: orthographic tier: words break-index tier: degrees of junction (Price et al ‘89) tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) miscellaneous tier: disfluencies, non-speech sounds, etc.

8 Sample ToBI Labeling

9 Online training material,available at: –http://www.ling.ohio-state.edu/phonetics/ToBI/ Evaluation –Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)

10 Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: –H*simple high(declarative) –L*simple low(ynq) –L*+Hscooped, late rise (uncertainty/ incredulity) –L+H*early rise to stress(contrastive focus) –H+!H*fall onto stress (implied familiarity)

11 Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence:  within a phrase: HiF0  across phrases

12 Functions of Pitch Accent Given/new information –S: Do you need a return ticket? –U: No, thanks, I don’t need a return. Contrast (narrow focus) –U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) Disambiguation of discourse markers –S: Now let me get you the train information. –U: Okay (thanks) vs. Okay….(but I really want…)

13 Predicting Accent: Is it accented or not? Applications: TTS and CTS Corpora: read and spontaneous speech Features: pos window of 3, sentence position, position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information Results: 75-85% correct, depending on genre

14 Prosodic Phrasing in ToBI ‘Levels’ of phrasing: –intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) –intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier –0 no word boundary –1 word boundary –2 strong juncture with no tonal markings –3 intermediate phrase boundary –4 intonational phrase boundary

15 Functions of Phrasing Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause: –S: You should buy the ticket with the discount coupon. –S: The itinerary which I faxed includes deluxe accommodations Disambiguates scope ambiguities, e.g. Negation: –S: You aren’t booked through Rome because of the fare. Or modifier scope: –S: This fare is restricted to retired politicians and civil servants.

16 Predicting Phrase Boundaries Applications: TTS, CTS, ASR Corpora: AP news, Penn Treebank, ATIS Features: sentence position, sentence length, pos window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure Results: 96% correct

17 Contours: Accent + Phrasing What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? –Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) –Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) –Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) –“Personality” S: Welcome to the Sunshine Travel System.

18 Pitch Range and Timing Level of speaker engagement –S: Welcome to InfoTravel. How may I help you? Contour interpretation –S: You can take the L*+H bus from Malpensa to Rome L-H%. –U: Take the bus. vs. Take the bus! Discourse/topic structure –Topic beginnings have higher pitch range, faster, preceded by longer pauses –Endings the opposite

19 Prosody and Speaker Emotion What makes an utterance sound angry? Sad? –How much comes from the lexical information? –How much from the acoustic/prosodic? –Does all anger, e.g., sound the same? Cahn ‘88 (examples)

20 Applications Text-to-Speech and Concept-to-Speech generation: improve naturalness Speech Recognition: identify suprasegmental meaning Spoken Dialogue Systems: understand when people are confused, angry Audio Browsing: format corpora for browsing and search

21 Challenges We don’t really know what most contours ‘mean’ Our accent prediction needs more sensitivity to better model of given/new, focus, grammatical function Our phrasing prediction needs better information about e.g. attachment We don’t know much about emotional speech or ‘personality’ -- critical to applications


Download ppt "CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a."

Similar presentations


Ads by Google