1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.

Slides:



Advertisements
Similar presentations
KeTra.
Advertisements

Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Chapter 1: Information and Computation. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Review key ideas from last few.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Detecting missrecognitions Predicting with prosody.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Psycholinguistics 09 Conversational Interaction. Conversation is a complex process of language use and a special form of social interaction with its own.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Lexical patterning in academic talk
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
 Anthropological linguists Language and culture inseparable Meaning comes from  A. spoken word  B. culturally agreed upon conventions about 1. how.
CSD 5100 Introduction to Research Methods in CSD Observation and Data Collection in CSD Research Strategies Measurement Issues.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Patterns of Event Causality Suggest More Effective Corrective Actions Abstract: The Occurrence Reporting and Processing System (ORPS) has used a consistent.
Studies in language & capitalisim Critical discourse analysis: History, ideology and methodology.
Breathing and speech planning in turn-taking Francisco Torreira Sara Bögels Stephen Levinson Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Speaking while monitoring addressees for understanding Torsten Jachmann Herbert H. Clark and Meredyth A. Krych Seminar „Gaze as function of.
Communicative and Academic English for the EFL Professional.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Turn-taking and Backchannels Ryan Lish. Turn-taking We all learned it in preschool, right? Also an essential part of conversation Basic phenomenon of.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
R ITSUMEIKAN 13 th International Conference on Multimodal Interaction (ICMI 2011) Alicante, Spain, Nov. 16th, 2011 COMmunication Software Lab. Making Virtual.
Can a blind person guess the state of mind of someone they are talking with without seeing them? SAK-WERNICKA, JOLANTA. "EXPLORING THEORY OF MIND USE IN.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Analysis of spontaneous speech
Studying Intonation Julia Hirschberg CS /21/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Statistical NLP: Lecture 9
Turn-taking and Disfluencies
Learner resource 7 Features of spoken discourse
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Towards Automatic Fluency Assessment
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005

2 Outline From Back Channel to backchannels Function of the Back Channel Characteristics of the Back Channel The Back Channel in Spoken Dialogue Systems

3 From back channel… 70s: Conversation Analysts attempt to describe systematic rules for turn-taking management –Goal: minimize gaps and overlaps between speakers BUT many overlaps in natural speech –E.g.: “mm-hmm”, “okay”, “yeah”… “Back channel” (Yngve 1970): Parallel channel for communication (Duncan 1972) –“Back channel communication does not constitute a turn or a claim for a turn” –But it “may participate in a variety of communication functions, including the regulation of speaking turns.”

4 …to backchannels “Backchannel”: listener-produced signal such as “mm-hmm”, “yeah”… (“To backchannel”: to produce such signals) Does not imply the will to take the turn Implies some form of acknowledgment (in general)

5 Front ChannelBack Channel FunctionPropositional Transactional Conversation managmt Social Conversation managmt Social ProtocolTurn-taking Floor sharing ? (controlled by FC?) No floor to share Lexical contentAnythingvocalizations, short words, phrases (“That’s true”) Front vs Back Channel

6 Front-channel cues to back- channel signals Koiso et al (1998) Analyze the relationship between different syntactic and prosodic features and the occurrence of backchannels

7 Koiso et al (Methodology) Data: 8 dialogs from Japanese Map Task corpus: –replica of the Edinburgh MT –Face-to-face and speech only (no difference) Features –Syntactic: POS –Duration of last mora (normal/long/short) –F0 pattern of last mora (flat-fall, rise…) –Peak F0 (low/high) –Energy pattern (late-decr, decr, no-decr) –Peak energy (low/high)

8 Koiso et al (Results) Frequency of feature values BC > no-BC POS=verb-phrase, post-position, conjunction F0 pat=flat-fall or rise-fall Energy pat=late-decr Peak energy=high no-BC > BC POS=adv, conjunction, interjection, filler Dur=short F0 pat=fall or flat Energy pat=non-decr Peak energy=low

9 Koiso et al (Results) Decision Tree analysis Compare the loss in performance by not using each feature –POS: single best feature –Prosodic features altogether: as good as POS

10 Koiso et al (Discussion) Some POS strongly inhibit BC Individual prosodic features are not good indicators of BC occurrence BC occurrence is conditioned by both POS and prosody (as a whole) What about other languages? What about BC overlapping with speech?

11 BC cues in English and Japanese Ward and Tsukahara (2000) Tests one hypothesis (“BC are triggered by low pitch cues”) for two languages

12 The Low Pitch Cue Both in American English and Japanese, it appears that “after a region of low pitch lasting 110 ms the listener tends to produce back-channel feedback”. Goal of this paper: quantitatively test this on naturally occurring conversations

13 Ward and Tsukahara (Methodology) Data: –English: 8 conversations, 12 speakers (first author participates in 5 conversations!) –Japanese: 18 conversations, 24 speakers Prediction: –Every 10ms decide BC/no-BC by applying a hand coded rule with 5 parameters tuned to the data

14 Ward and Tsukahara (Results) Each predicted BC was considered correct if it fell within 500ms of an actual BC Low pitch region rule is better than chance both in English and Japanese

15 Ward and Tsukahara (Results) Issues: –Evaluation (tolerance window size, speakers produce BCs with different frequencies…) –No actual comparison between languages –Are low pitch regions and BCs simply correlated to other phenomena (syntactic completion, disfluencies…) or is there a direct cause/consequence relationship?

16 Effects of Native Language and Gender on BC Feke (2003) Conversation Analysis study of BC in native-English and native-Spanish, same- and mixed-gender dialogs

17 Definition of BC BC: responses of the participant that is “clearly not holding the floor”… Very loose compared to previous papers: –e.g. “How did you find Quechua?” is a BC Distinguishes In-Between BC and Overlap BC

18 Feke (Methodology) Recorded 8 non-scripted conversations between 8 different speakers (2 native languages x 2 genders x 2 subjects) Manually coded In-Between BCs and Overlap BCs

19 Feke (Results) No differences observed across cultures Participants of both genders tend to use more BC when conversing with someone of the opposite gender Difference seems bigger for females than for males

20 Feke (Discussion) Interesting/surprising result from the ethnological/sociological point of view Very few data points, no significance analysis Only looked at number of BCs Consequences on SDS? (e.g. using gender information in BC prediction, selecting the gender of an agent…)

21 BC in Practical Systems… Takeuchi et al (2003) Method to determine the timing of turn transitions and aizuchi (≈BC) on Japanese Human-Human corpus

22 Takeuchi (Approach) Similar to Koiso et al, but only using automatically extracted features Every 100 ms decide between: –Take turn –Aizuchi (BC) –Leave turn (wait)

23 Takeuchi (Approach) Decision Tree using –Syntax (POS, content/function words) –Utterance duration –Pause duration/pause since last content wd –Content word duration –F0 –Power

24 Takeuchi (Results) Precision/Recall of frame classification: –Around 80% on the training set –Less then 50% on a test set Subjective evaluation: –Artificially insert BC at predicted time –Timing was judged “good” in 70-80% –On real utterances: 72% (!)

25 Takeuchi (Discussion) Found that syntactic information did not help (contradicts Koiso?) Underscores the difficulty of evaluating turn-taking/backchanneling systems

26 Conclusion Hard to account for simultaneous turns in conversation Back Channel framework offers one explanation But most work remains very specific Missing a good theory of conversation…