Guest Lecture: Advanced Topics in Spoken Language Processing

Slides:



Advertisements
Similar presentations
APPROACHES TO T&L Language
Advertisements

Men vs. Women Language.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Language Use and Understanding BCS 261 LIN 241 PSY 261 CLASS 12: BRANIGAN ET AL.: PRIMING.
Social Interaction Functions Making Conversations Work.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Prosodic analysis: theoretical value and practical difficulties Anne Wichmann Nicole Dehé.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Psycholinguistics 09 Conversational Interaction. Conversation is a complex process of language use and a special form of social interaction with its own.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Teaching Productive Skills Which ones are they? Writing… and… Speaking They have similarities and Differences.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Male Cheerleaders and their Voices. Background Information: What Vocal Folds Look Like.
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Gender What question would you like to ask these people? DO NOT CHOOSE THE OBVIOUS QUESTION tch?v=WDswiT87oo8.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Effective verbal communication
Dyadic Behavior Analysis in Depression Severity Assessment Interviews
Investigating Pitch Accent Recognition in Non-native Speech
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.
The interactive alignment model
Agustín Gravano1,2 Julia Hirschberg1
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Julia Hirschberg Columbia University SIGdial 2008
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Turn-taking and Disfluencies
Nigel G. Ward, Anais G. Rivera, Karen Ward, David G. Novick
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
High Frequency Word Entrainment in Spoken Dialogue
Implications of interactive alignment
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
ConnectPro User Guide for Students
Discourse Structure in Generation
Agustín Gravano1,2 Julia Hirschberg1
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Q: “Language exists within the context of culture
Emotional Speech Julia Hirschberg CS /16/2019.
Tools for Speech Analysis
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Within-speaker variability in long-term F0
Presentation transcript:

Guest Lecture: Advanced Topics in Spoken Language Processing Entrainment Rivka Levitan, PhD Guest Lecture: Advanced Topics in Spoken Language Processing Spring 2019

What is entrainment? 'Are their heads off?' shouted the Queen. 'Their heads are gone, if it please your Majesty!' the soldiers shouted in reply. 'That's right!' shouted the Queen. 'Can you play croquet?’ 'Yes!' shouted Alice. 'Come on, then!' roared the Queen, and Alice joined the procession, wondering very much what would happen next. −Alice’s Adventures in Wonderland

What is entrainment? 'Jeeves,' I said, 'you're talking rot.’ 'Very good, sir.’ 'Absolute drivel.’ 'Pure mashed potatoes.’ 'Very good, sir − I mean, very good, Jeeves, that will be all,' I said. And I drank a modicum of tea, with a good deal of hauteur. −Very Good, Jeeves

Evidence of entrainment Lexical Referring expressions: Brennan & Clark, 1992 High frequency words: Nenkova et al., 2008 Syntax: Branigan et al., 2000; Reitter et al., 2010 Linguistic Style Matching: Niederhoffer & Pennebaker, 2002; Danescu-niculescu-mizil et al., 2011 To a computer: Brennan, 1996; Stoyanchev & Stent, 2009 Acoustic-prosodic: Response time: Matarazzo & Wiens, 1967; Street, 1984 Intensity, pitch: Natale, 1975; Gregory et al., 2003; Ward & Litman, 2007 To a computer: Bell et al., 2003; Coulston et al., 2002 Intensity, pitch, speaking rate, voice quality, backchannel-inviting cues, pitch contours: Levitan et al. 2011, 2012, 2014, 2015, 2016

Entrainment theory Communication Accommodation Theory (Giles et al., 1991) Communication model (Natale, 1975) Perception-behavior link (Chartrand & Bargh, 1999) Interactive Alignment Theory (Pickering & Garrod, 2004) Social Automatic

Dialogue quality Positive interactions in married couples (Lee et al., 2010) Score on the Map Task (Reitter and Moore, 2007) Liking, smoother interaction (Chartrand & Bargh, 1999) Social desirability (Natale, 1975) Power (Danescu-Niculescu-Mizil et al., 2012) Smoother interaction, task success (Nenkova et al., 2008) Romantic interest (Ireland et al., 2014) Turn taking, encouraging, trying to be liked (Levitan et al., 2012)

Columbia Games Corpus ~9 hours recorded dialogue 12 sessions (~30 minutes each) (each 4 games) 13 participants: 6 female, 7 male Native speakers of Standard American English

speech <silence> speech <silence> speech Units of analysis Inter-pausal unit (IPU) Pause-free segment of speech from a single speaker. speech <silence> speech <silence> speech Turn Sequence of speech from one speaker without intervening speech from the other speaker. Session Complete interaction between two subjects.

speech <silence> speech <silence> speech Units of analysis Inter-pausal unit (IPU) Pause-free segment of speech from a single speaker. speech <silence> speech <silence> speech Turn Sequence of speech from one speaker without intervening speech from the other speaker. Session Complete interaction between two subjects. IPU IPU IPU

Features Intensity Shimmer Pitch (F0) Noise-to-harmonics ratio (NHR) Syllables per second Jitter

Measuring entrainment Global vs. local Global: compare average to baseline other speakers self in other conversation Local: compare difference at turn exchanges to baseline non-adjacent turns

Measuring entrainment Global vs. local Exact vs. relative Exact: compare difference between adjacent feature values to baseline Relative: correlation of adjacent feature values

Measuring entrainment Global vs. local Exact vs. relative Converging vs. constant Global: compare difference in averages over time Local: correlate adjacent differences with time

Results Global: intensity, speaking rate Convergence: Pitch max, NHR, speaking rate (reset effect) Local: intensity, NHR Convergence: all except jitter and speaking rate; weak Synchrony: moderate for intensity, none for speaking rate, others weak

Variation across speakers

Variations across speakers Some speakers don’t entrain at all Some entrain only positively Some entrain only negatively Some entrain positively for some features, negatively for others This variation is not explained by gender, native language, or conversational role

Implementing entrainment

Performance

Errors Feature extraction SSML compliance TTS output quality Sanity checks SSML compliance TTS output quality “What ho!" I said. "What ho!" said Motty. "What ho! What ho!" "What ho! What ho! What ho!" After that it seemed rather difficult to go on with the conversation. ― P.G. Wodehouse, My Man Jeeves

Do users prefer an entraining system?

Do users prefer an entraining system?

Do users prefer an entraining system?

Do users prefer an entraining system? 19 participants: 9 female, 10 male; ages 20—35 Each session: ~45 user turns, entraining + control turns ~ 9 minutes Acoustic-prosodic features extracted by Praat Advice logged

Do users prefer an entraining system? Trust “Who gave better advice?” ✗ Implicit trust scores ✓ Liking “Which advisor did you like better?” ✓ Voice “Whose voice did you like better?” ✗

Do users prefer an entraining system?

What we don’t know How much? (effect size) Significance of different kinds of entrainment (feature, measure) Influence of speaker traits/identity Influence of dialogue context

Collaborators Andreas Weise (CUNY Graduate Center) Julia Hirschberg (Columbia University) Stefan Benus (Constantine the Philosopher University) Agustin Gravano (Universidad de Buenos Aires) Sarah Ita Levitan (Columbia University) Shirley Xia (Jiangsu Normal University)