Predicting Phrasing and Accent

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
4/29/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius
6/10/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
6/28/20151 Predicting Phrasing and Accent Julia Hirschberg.
10/10/20051 Acoustic/Prosodic and Lexical Correlates of Charismatic Speech Andrew Rosenberg & Julia Hirschberg Columbia University 10/10/05 - IBM.
Sentential issues in translation
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Intonation and Information Discourse and Dialogue CS359 October 16, 2001.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Linguistic knowledge for Speech recognition
SUPRASEGMENTAL PHONEME
Robust Semantics, Information Extraction, and Information Retrieval
Improving a Pipeline Architecture for Shallow Discourse Parsing
Why Study Spoken Language?
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Representing Intonational Variation
Studying Intonation Julia Hirschberg CS /21/2018.
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Meaningful Intonational Variation
Speech Generation: From Concept and from Text
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Why Study Spoken Language?
Meanings of Intonational Contours
Statistical NLP: Lecture 9
Turn-taking and Disfluencies
Representing Intonational Variation
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
“Downstepped contours in the given/new distinction”
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Discourse Structure in Generation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Predicting Phrasing and Accent
Intonational and Its Meanings
Sentential issues in translation
Recognizing Structure: Dialogue Acts and Segmentation
Discourse & Dialogue CMSC October 28, 2004
Prosody in Generation JH 4/8/2019.
Low Level Cues to Emotion
Statistical NLP : Lecture 9 Word Sense Disambiguation
Automatic Prosodic Event Detection
CS249: Neural Language Model
Presentation transcript:

Predicting Phrasing and Accent Julia Hirschberg CS 6998 12/1/2018

A car bomb attack on a police station in the northern Iraqi city of Kirkuk early Monday killed four civilians and wounded 10 others U.S. military officials said. A leading Shiite member of Iraq's Governing Council on Sunday demanded no more "stalling" on arranging for elections to rule this country once the U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite cleric and Governing Council member said the U.S.-run coalition should have begun planning for elections months ago. 12/1/2018

Why predict phrasing and accent? TTS and CTS Naturalness Intelligibility Recognition Decrease perplexity Modify durational predictions for words at phrase boundaries Identify most ‘salient’ words 12/1/2018

How predict phrasing and accent? Default prosodic assignment from simple text analysis Hand-built rule-based system: hard to modify and adapt to new domains Corpus-based approaches (Sproat et al ’92) Train prosodic variation on large labeled corpora using machine learning techniques Accent and phrasing decisions TTS default prosodic assignment developed to be independent of domains and tasks. Uses simple text analysis to vary phrasing, accent, possibly pitch range. While hand-built rule-sets are still used for particular application domains, most systems have moved toward automatically trained prosodic assignment systems. 12/1/2018

Associate prosodic labels with simple features of transcripts distance from beginning or end of phrase orthography: punctuation, paragraphing part of speech, constituent information Apply learned rules to new text Note: problems in labeling consistency of e.g. phrase boundaries. How to produce a gold standard?? 12/1/2018

Prosodic Phrasing 2 `levels’ of phrasing in ToBI intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary 12/1/2018

What are the indicators of phrasing in speech? Timing Pause Lengthening F0 changes Vocal fry/glottalization 12/1/2018

What linguistic and contextual features are linked to phrasing? Syntactic structure? We only suspected they all knew that a burglary had been committed. Abney ’91 chunking Steedman ’90, Oehrle ’91 CCGs … Length of sentence Word co-occurrence statistics I should have known and so should you have. 12/1/2018

Are words on each side accented or not? Where is previous phrase boundary? Part-of-speech of words around potential boundary site Largest constituent dominating w(i) but not w(j) Smallest constituent dominating w(i),w(j) 12/1/2018

Statistical learning methods Classification and regression trees (CART) Rule induction (Ripper) HMMs 12/1/2018

Evaluation: Hard! How to define Gold Standard? Natural speech corpus Multi-speaker/same text Subjective judgments 12/1/2018

Today Incremental improvements continue: Adding higher-accuracy parsing (Koehn et al ‘00) Collins ‘99 parser More sophisticated learning algorithms (Schapire & Singer ‘00) Better representations: tree based? Rules always impoverished Where to next? 12/1/2018

Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) 12/1/2018

What are the indicators of accent? F0 excursion Durational lengthening Voice quality Vowel quality Loudness 12/1/2018

What are the correlates of accent in text and context? Word class: content vs. function words Information status: Given/new Topic/Focus Contrast Grammatical function The ball touches the cone. Surface position: Today George is hungry. 12/1/2018

Association with focus: Dogs must be carried. John only introduced Mary to Sue. 12/1/2018

How can we capture such information simply? POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal Word co-occurrence 12/1/2018

Today Concept-to-Speech (CTS) – Pan&McKeown99 systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how New features! New learning methods: Boosting and bagging (Sun02) Combine text and acoustic cues for ASR Co-training?? In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 12/1/2018

MAGIC MM system for presenting cardiac patient data Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients Uses mostly traditional NLG hand-developed components Generate text, then annotate prosodically Corpus-trained prosodic assignment component To give an example of how more traditional NLG systems are coping with these problems, I want to briefly describe some work Shimei Pan has done for her doctoral thesis at Columbia University with Kathy McKeown and me on training a prosody module for a medical domain. 12/1/2018

Corpus: written and oral patient reports 50min multi-speaker, spontaneous + 11min single speaker, read 1.24M word text corpus of discharge summaries Transcribed, ToBI labeled Generator features labeled/extracted: syntactic function 12/1/2018

semantic ‘informativeness’ (rarity in corpus) p.o.s. semantic category semantic ‘informativeness’ (rarity in corpus) semantic constituent boundary location and length salience given/new focus theme/ rheme ‘importance’ ‘unexpectedness’ Lessons I personally learned from this: First, just because CTS systems have more information doesn’t mean this information will be useful for prosodic assignment. Second, there is something unsatisfactory about viewing prosodic decisions being made after the text has been constructed. 12/1/2018

Very hard to label features Results: new features to specify TTS prosody Of CTS-specific features only semantic informativeness (likeliness of occurring in a corpus) useful so far Looking at context, word collocation for accent placement helps predict accent RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy 12/1/2018

Future Intonation Prediction Assigning affect (emotion) Personalizing TTS Conveying personality, charisma? 12/1/2018

Next Week Look at 2 other phenomena we’d like to capture in TTS systems: Information status Discourse (topic) structure Homework 2 is due on Monday! 12/1/2018