4/29/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

M. A. K. Halliday Notes on transivity and theme in English (4.2 – 4.5) Part 2.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Making & marking text for synthesis Caroline Henton 10 August 2006.
FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Chapter 18: Discourse Tianjun Fu Ling538 Presentation Nov 30th, 2006.
6/10/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
CS 4705 Lecture 1 CS4705 Introduction to Natural Language Processing.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
6/28/20151 Predicting Phrasing and Accent Julia Hirschberg.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Toshiba Update 14/09/2005 Zeynep Inanoglu Machine Intelligence Laboratory CU Engineering Department Supervisor: Prof. Steve Young A Statistical Approach.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Intonation and Information Discourse and Dialogue CS359 October 16, 2001.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
CS 6961: Structured Prediction Fall 2014 Course Information.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
Neural Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Representing Intonational Variation
Studying Intonation Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Meaningful Intonational Variation
Speech Generation: From Concept and from Text
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Meanings of Intonational Contours
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Predicting Phrasing and Accent
Advanced NLP: Speech Research and Technologies
Predicting Phrasing and Accent
Intonational and Its Meanings
Chunk Parsing CS1573: AI Application Development, Spring 2003
CS249: Neural Language Model
Presentation transcript:

4/29/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706

Today Motivation for intonation assignment algorithms Approaches: hand-built vs. corpus-based rules Predicting phrasing Predicting accent Future research: emotion, personalization, personality 4/29/20152

3 Why worry about accent and phrasing? A car bomb attack on a police station in the northern Iraqi city of Kirkuk early Monday killed four civilians and wounded 10 others U.S. military officials said. A leading Shiite member of Iraq's Governing Council on Sunday demanded no more "stalling" on arranging for elections to rule this country once the U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite cleric and Governing Council member said the U.S.-run coalition should have begun planning for elections months ago. -- LoquendoLoquendo

4/29/20154 Why predict phrasing and accent? TTS and CTS –Naturalness –Intelligibility Recognition –Decrease perplexity –Modify durational predictions for words at phrase boundaries –Identify most ‘salient’ words Summarization, information extraction

4/29/20155 How do we predict phrasing and accent? Default prosodic assignment from simple text analysis –Accent content words –Deaccdent function words The president went to Brussels to make up with Europe. –Limitations Doesn’t work all that well, e.g. particles Hand-built rule-based systems hard to modify or adapt to new domains Corpus-based approaches (Sproat et al ’92) –Train prosodic variation on large hand-labeled corpora using machine learning techniques

4/29/20156 –Accent and phrasing decisions trained separately Binary prediction Feat1, Feat2,…Accent Feat1, Feat2,…Boundary –Associate prosodic labels with simple features of transcripts that can be automatically computed distance from beginning or end of phrase orthography: punctuation, paragraphing part of speech, constituent information –Apply automatically learned rules when processing text

4/29/20157 Reminder: Prosodic Phrasing 2 `levels’ of phrasing in ToBI –intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) –intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier –0 no word boundary –1 word boundary –2 strong juncture with no tonal markings –3 intermediate phrase boundary –4 intonational phrase boundary

4/29/20158 What are the indicators of phrasing in speech? Timing –Pause –Lengthening F0 changes Vocal fry/glottalization

4/29/20159 What linguistic and contextual features are linked to phrasing? Syntactic information –Abney ’91 chunking major constituents –Steedman ’90, Oehrle ’91 CCGs … –Which ‘chunks’ tend to stick together? –Which ‘chunks’ tend to be separated intonationally? Largest constituent dominating w(i) but not w(j) NP[The man in the moon] |? VP[looks down on you] Smallest constituent dominating w(i),w(j) NP[The man PP[in |? moon]] –Part-of-speech of words around potential boundary site The/DET man/NN |? in/Prep moon/NN Sentence-level information –Length of sentence

4/29/ This is a very |? very very long sentence ?| which thus might have a lot of phrase boundaries in?| it ?| don’t you think? This |? isn’t. Orthographic information –They live in Butte, ?| Montana, ?| don’t they? Word co-occurrence information Vampire ?| bat …powerful ?| but benign… Are words on each side accented or not? The cat in |? the Where is the most recent previous phrase boundary? He asked for pills | but |? What else?

4/29/ Statistical learning methods Classification and regression trees (CART) Rule induction (Ripper), Support Vector Machines, HMMs, Neural Nets All take vector of independent variables and one dependent (predicted) variable, e.g. ‘there’s a phrase boundary here’ or ‘there’s not’ Feat1 Feat2 …FeatN DepVar Input from hand labeled dependent variable and automatically extracted independent variables Result can be integrated into TTS text processor

4/29/ How do we evaluate the result? How to define a Gold Standard? –Natural speech corpus –Multi-speaker/same text –Subjective judgments No simple mapping from text to prosody –Many variants can be acceptable The car was driven to the border last spring while its owner an elderly man was taking an extended vacation in the south of France.

4/29/ Integrating More Syntactic Information Incremental improvements continue: –Adding higher-accuracy parsing (Koehn et al ‘00) Collins ‘99 parser Different learning algorithms (Schapire & Singer ‘00) Different syntactic representations: relational? Tree-based? Ranking vs. classification? Rules always impoverished Where to next?

4/29/ Predicting Pitch Accent Accent: Which items are made intonationally prominent and how? Accent type: –H*simple high(declarative) –L*simple low(ynq) –L*+Hscooped, late rise (uncertainty/ incredulity) –L+H*early rise to stress(contrastive focus) –H+!H*fall onto stress (implied familiarity)

4/29/ What are the indicators of accent? F0 excursion Durational lengthening Voice quality Vowel quality Loudness

4/29/ What phenomena are associated with accent? Word class: content vs. function words Information status: –Given/new He likes dogs and dogs like him. –Topic/Focus Dogs he likes. –Contrast He likes dogs but not cats. Grammatical function –The dog ate his kibble. Surface position in sentence: Today George is hungry.

4/29/ Association with focus: –John only introduced Mary to Sue. Semantic parallelism –John likes beer but Mary prefers wine. How many of these are easy to compute automatically?

4/29/ How can we approximate such information? POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal City hall, parking lot, city hall parking lot Word co-occurrence Blood vessel, blood orange

4/29/ Current Research Concept-to-Speech (CTS) – Pan&McKeown99 –systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how Information status –Given/new –Topic/focus

4/29/ Future Intonation Prediction: Beyond Phrasing and Accent Assigning affect (emotion) from text – how? Personalizing TTS: modeling individual style in intonation – how? Conveying personality, charisma – how?

4/29/ Next Class Information status: focus and given/new information