Advanced NLP: Speech Research and Technologies

Slides:



Advertisements
Similar presentations
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Advertisements

1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Detecting missrecognitions Predicting with prosody.
Intonation and Information Discourse and Dialogue CS359 October 16, 2001.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
CS 4705 Natural Language Processing Who am I? Julia Hirschberg –Computational Linguist in CS –Focus: Spoken Language Processing –Lab: The Speech Lab,
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Univeristy of Tennessee Knoxville Science Journals and Science Students: Bringing Them Together Dr. Carol Tenopir University of Tennessee
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Artificial Intelligence for Speech Recognition
Deep Exploration and Filtering of Text (DEFT)
Why Study Spoken Language?
Recognizing Structure: Dialogue Acts and Segmentation
Spoken Language Processing
Error Detection and Correction in SDS
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
…It’s how you say it Julia Hirschberg CS /21/2018.
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
Automatic Speech Recognition
The American School and ToBI
THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.
Meaningful Intonational Variation
Speech Generation: From Concept and from Text
Dialogue Acts Julia Hirschberg CS /18/2018.
Information Structure and Prosody
Why Study Spoken Language?
Meanings of Intonational Contours
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Representing Intonational Variation
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Searching and Summarizing Speech
“Downstepped contours in the given/new distinction”
Predicting Phrasing and Accent
Spoken Language Processing:Summing Up
Searching and Summarizing Speech
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Discourse Structure in Generation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Predicting Phrasing and Accent
Intonational and Its Meanings
PROJ2: Building an ASR System
Spoken Dialogue Systems
Recognizing Structure: Dialogue Acts and Segmentation
Applied Linguistics Chapter Four: Corpus Linguistics
Chapter 8 Communicative competence
Automatic Speech Recognition
Prosody in Generation JH 4/8/2019.
Automatic Speech Recognition
Tools for Speech Analysis
Spoken Language Processing
Low Level Cues to Emotion
Presentation transcript:

Advanced NLP: Speech Research and Technologies Julia Hirschberg CS 6998 11/24/2018

Spoken Natural Language Processing NLP/Computational Linguistics historically text-oriented Speech research domain of EE and Linguistics 1980s: efforts to bring together by DARPA Today: applications motivate collaboration Automatic Speech Recognition (ASR) Text/Concept-to-Speech (TTS/CTS) Spoken Dialogue Systems (SDS), Speech-to-Speech Translation, Speech Search/Data Mining 11/24/2018

Studying Speech is Different Understanding input and generating output are more complicated ASR errors and lack of formatting cues TTS/CTS naturalness issues But there is also more information to take advantage of Pitch variation, loudness, rate, voice quality Filled pauses, self-repairs 11/24/2018

Labeled Waveform and F0 Contour 11/24/2018

Current Approaches Corpus-based studies Hand-labeled data (ToBI etc.) Tools: Analysis (pitch tracks, spectrograms….) ASR toolkits TTS systems Machine learning Laboratory studies Evaluation 11/24/2018

Prosodic Generation for TTS Corpus-based approaches Train prosodic variation on large labeled corpora using machine learning techniques Accent and phrasing decisions Associate prosodic labels with simple features of transcripts To do: Contour variation TTS default prosodic assignment developed to be independent of domains and tasks. Uses simple text analysis to vary phrasing, accent, possibly pitch range. While hand-built rule-sets are still used for particular application domains, most systems have moved toward automatically trained prosodic assignment systems. 11/24/2018

Timing and backchanneling Disfluencies? Emotion and ‘personality’ Personalized voices Work in spoken language generation is only beginning as a serious topic of research and development. Along the way there are large questions to answer, both for dialogue and monologue generation: 11/24/2018

Concept to Speech Decisions in TTS depend on text analysis Concept-to-Speech (CTS) systems should be able to do better System knows what it wants to say and can specify how But…. Still need labeled corpora to train on CTS features may be hard to label (focus, given/new,…) How to decide how to realize these? In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 11/24/2018

Prosody in ASRU Little success in improving ASR transcription More promise in other areas: Improving rejection Shrinking search space Automatic topic segmentation for browsing/retrieval Identifying ‘salient’ words in turns Disambiguating speech/dialogue acts: okay 11/24/2018

Recognizing communicative ‘problems’ ASR errors User corrections ‘Aware’ turns ‘Problematic’ dialogues Disfluencies and self-repairs Recognizing speaker emotion 11/24/2018

My Research Meaning of intonational contours: Rise/fall/rise (L*+H L-H%) A: Did you take out the garbage? B: Sort of. A: Sort of! High rise questions (H* H-H%) This is the chicken Chermula? I’m from Skokie? 11/24/2018

Compositional theory of intonational meaning (w/Pierrehumbert) Intonational disambiguation across languages: Spanish, Italian and English (w/Avesani & Prieto) William isn’t drinking because he’s unhappy Disfluencies: self-repairs (w/Nakatani) I want to go to Ba- Baltimore. Cue phrases (w/Litman) Now let’s go to work. Get a3 and a4 for disambig gw for other 11/24/2018

Accent and strict/sloppy interpretations of ellipsis (w/Ward) People who live in Los Angeles adore it’s beaches and so do people who live in New York 11/24/2018

Accent and given/new (w/Terken) The ball touches the circle. The ball touches the triangle. The ball touches the cone. The square touches the ball. Intonation and discourse structure (w/Grosz & Nakatani) Boston Directions Corpus Automatic assignment of accent and phrasing for TTS (w/Wang, Sproat, Koehn, Abney, Collins, Rambow) 11/24/2018

ToBI prosodic labeling conventions w/many) Prosody in dialogue systems (w/Litman & Swerts): generation and understanding (TOOT) Audio browsing and retrieval: SCAN and SCANMail (w/many) 11/24/2018

CS 6998 Requirements: Class Participation: Questions for class discussion Helping lead a class Lab exercises Project Literature review Data collection and/or analysis from a corpus 11/24/2018

Building a system or system component (e. g Building a system or system component (e.g. a preprocessor to assign intonation in a generation system) 11/24/2018

Next Week Read Hirschberg 2003 and ToBI conventions Make sure you have access to supplementary readings if you need them Bring 3 discussion questions to class Check access on cs servers to corpora and /proj/nlp/tools/mathTools/ Xwaves (solaris and linux) esps531.sol, esps531.linux (also downloadable from KTH) wavesurfer (win, linux, mac) available at KTH 11/24/2018

Projects: Start thinking about what area you want to work in for your project and what type of project you’d like to do 11/24/2018