Download presentation
Presentation is loading. Please wait.
1
Meaningful Intonational Variation
11/12/2018
2
Today Assigning variation for TTS, CTS Contours Pitch Range
Accent Phrasing Pitch Range Amplitude and timing 11/12/2018
3
TTS Production Pipeline
Orthographic input: Dr. Smith lives on Elm Dr. Text normalization: abbreviation expansion… Pronunciation modeling: POS id, WS disambiguation Intonation assignment: parsing, POS id, robust semantics… Phonetic/phonological realization: phonological parsing, phonetic analysis Unit selection: acoustic analysis 11/12/2018
4
Intonation Assignment: Phrasing
Traditional: hand-built rules Punctuation Context/function word: no breaks after function word He went to dinner Parse? She favors the nuts and bolts approach Current: statistical analysis of large labeled corpus Punctuation, pos window, utt length,… 11/12/2018
5
Functions of Phrasing Disambiguates syntactic constructions, e.g. PP attachment: S: You should buy the ticket with the discount coupon. Disambiguates scope ambiguities, e.g. Negation: S: You aren’t booked through Rome because of the fare. Or modifier scope: S: This fare is restricted to retired politicians and civil servants. 11/12/2018
6
Intonation Assignment: Accent
Hand-built rules Function/content distinction He went out the back door/He threw out the trash Complex nominals: Main Street/Park Avenue city hall parking lot Statistical procedures trained on large corpora Contrastive stress, given/new distinction? 11/12/2018
7
Functions of Pitch Accent
Given/new information S: Do you need a return ticket. U: No, thanks, I don’t need a return. Contrast (narrow focus) U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) Disambiguation of discourse markers S: Now let me get you the train information. U: Okay (thanks) vs. Okay….(but I really want…) 11/12/2018
8
Intonation Assignment: Contours
Simple rules ‘.’ = declarative contour ‘?’ = yes-no-question contour unless wh-word present at/near front of sentence Well, how did he do it? And what do you know? What else might we do? 11/12/2018
9
Contours: Accent + Phrasing
What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) “Personality” S: Welcome to the Sunshine Travel System. 11/12/2018
10
Propositional attitude (uncertainty) Did you feed the animals?
I fed the L*+H goldfish L-H% Distinguish direct/indirect speech acts Can you open the door? 11/12/2018
11
The TTS Front End Today Corpus-based statistical methods instead of hand-built rule-sets Dictionaries instead of rules (but fall-back to rules) Modest attempts to infer contrast, given/new Text analysis tools: pos tagger, morphological analyzer, little parsing 11/12/2018
12
Where good match between input and database
TTS: Where are we now? Natural sounding speech for some utterances Where good match between input and database Still…hard to vary prosodic features and retain naturalness Yes-no questions: Do you want to fly first class? Context-dependent variation still hard to infer from text and hard to realize naturally: 11/12/2018
13
Appropriate contours from text
Emphasis, de-emphasis to convey focus, given/new distinction: I own a cat. Or, rather, my cat owns me. Variation in pitch range, rate, pausal duration to convey topic structure Characteristics of ‘emotional speech’ little understood, so hard to convey: …a voice that sounds friendly, sympathetic, authoritative…. How to mimic real voices? 11/12/2018
14
TTS vs. CTS Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how But….generating prosody for CTS isn’t so easy In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 11/12/2018
15
To(nes and)B(reak)I(ndices)
Developed by prosody researchers in four meetings over Goals: devise common labeling scheme for Standard American English that is robust and reliable promote collection of large, prosodically labeled, shareable corpora ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,.... 11/12/2018
16
Minimal ToBI transcription: recording of speech f0 contour ToBI tiers:
orthographic tier: words break-index tier: degrees of junction (Price et al ‘89) tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) miscellaneous tier: disfluencies, non-speech sounds, etc. 11/12/2018
17
Sample ToBI Labeling 11/12/2018
18
Online training material,available at:
Evaluation Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94) 11/12/2018
19
Pitch Accent/Prominence in ToBI
Which items are made intonationally prominent and how? Accent type: H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) 11/12/2018
20
Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence:
within a phrase: HiF0 across phrases 11/12/2018
21
Prosodic Phrasing in ToBI
‘Levels’ of phrasing: intermediate phrase: one or more pitch accents plus a phrase accent (H or L- ) intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary 11/12/2018
22
L*+H L* H* H-H% H-L% L-H% L-L% 11/12/2018
23
H* !H* H+!H* L+H* H-H% H-L% L-H% L-L% 11/12/2018
24
Contour Examples 11/12/2018
25
And Other Things Contribute: Pitch Range and Timing (Rate, Pause)
Level of speaker engagement Hello vs. HELLO Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable Discourse/topic structure: paratones 11/12/2018
26
Corpus-Based Research
Predicting accent, phrasing, contours from large ToBI-labeled corpora Features: Word position, p.o.s. window, word cooccurence, punctuation, capitalization, sentence length, paragraph position, … Results: ~80-85% correct accent prediction ~92-96% correct phrase boundary prediction Contours???? Reality… 11/12/2018
27
I don’t want cereal I want toast. ….
This is my version of a rather long sentence which ideally should be broken into several phrases automatically by a smart system but we don't know if this will actually happen do we? Is a yes-no question uttered with falling intonation? Does that sound delightful? Mellifluous? I don’t want cereal I want toast. …. 11/12/2018
28
Next: Story analysis and generation (readings will be available later this week – we’ll send mail) 11/12/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.