Download presentation
Presentation is loading. Please wait.
Published byJemimah Hawkins Modified over 6 years ago
1
Turn-Taking, Grounding and Speaker Segmentation
Julia Hirschberg LSA07 353 11/17/2018
2
Today Turn-taking behaviors in human-human conversation
Conversational Analysis accounts Task/circumstance/individual dependencies Linguistic/cultural differences Grounding analyses Diarization: Automatic Turn Identification 11/17/2018
3
Turn-taking Behavior How do speakers know when it is appropriate to contribute to a conversation? Conversational Analysis Theory: Conversational partners expect certain patterns of behavior in normal conversation Pat: You got an A? That’s great! Chris: Yeah, I’m really smart you know. Chris: Well, I was just lucky I happened to read the chapter on dialogue systems right before the test. Otherwise I never would have squeaked through. General patterns in ordinary conversation Deviation is significant 11/17/2018
4
Children learn turn taking within first 2 years (Stern ’74)
Children liked by their peers are more skilled (Black & Hazen ’90) General individual differences Shy people pause longer and speak less and less often (Pilkonis ’77) Schizophrenics, neurotics, depressed people less skilled in turn-taking 11/17/2018
5
Expectations of What to Say Depend on Task at Hand
Telephone Openings Pat: Hello? Chris: Hi, Pat. It’s Chris. Pat: Hi! Closings (6-turn) Chris: Well, I just wanted to see how you were doing Pat: Thanks for calling. We'll have to have lunch sometime Chris: I'd like to Pat: Okay Chris: Okay Pat: See you Chris: Yeah, see you 11/17/2018
6
Email Service encounters Meetings News broadcasts
Pat: “Hi, can we switch lunch to 12:30? I’m running late.” Chris: “Sure. 12:30.” Pat: “Great. See you.” Service encounters Clerk: Good morning. Is there something I can help you with? Pat: Hi. Yeah. I wonder if you could show me…. Meetings Boss: Today I want to focus on next year’s goal statements. Chris, could you report please…. Chris: … Boss: Pat, now let’s hear from you… Pat: … News broadcasts Anchor: …Chris Smith reports from Rome now on the upcoming conclave. Chris? Reporter: Thanks, Pat….. And now back to Pat Jones in New York. 11/17/2018
7
Conversational Analysis (Sacks et al ’74)
Can we characterize expectations of ‘what to say’ more generally? ‘Rules’ of turn-taking If, during this turn the current speaker has selected A as the next speaker, then A must speak next If the current speaker does not select the next speaker, any other speaker may take the next turn If no one else takes the next turn, the current speaker may take the next turn Rules Apply at Transition Relevant Places (TRPs) where something allows speaker changes to occur 11/17/2018
8
Where Can Speaker Shifts Occur
Adjacency pairs Question/answer Greeting/greeting Compliment/downplayer Dispreferred responses Silence ‘No’ to a simple request without explanation Changing the topic abruptly without transition Important for Spoken Dialogue Systems 11/17/2018
9
Cultural Differences in Turn-Taking
Chinese telephone conversations Openings (Zhu ’04) Mandarin vs. British Identification differences British self-report Chinese callees ask the caller Closings (Sun ’05) 39 female-female telephone conversations Closings initiated through matter-of-fact statement of intention to end conversation Verbalized thanking occurs except in mother/daughter closings – not the standard English model Finnish business calls (Halmari ’93) vs. American Americans get right to the point Finns chat 11/17/2018
10
But where is the intent? Purpose?
11/17/2018
11
Grounding Approaches to Conversational Modeling
Conversation is a joint process through which S and H are constantly negotiating a common ground (Stalnaker ’78, Clark ’96 inter alia) Cf mutual belief Principle of Closure: agents performing an action require evidence that they have succeeded (Norman ’88)…or not Clark & Schaeffer ’89 Presentation (by S) and Acceptance (by H) via Continued attention, relevant next contribution, acknowledgement/assessment, demonstration, display 11/17/2018
12
S: John Stewart is my favorite comedian
H: {continued attention} H: The Daily Show is not to miss {rel next contrib} H: Mhmm {acknowledgement} H: He’s the funniest person you know {demonstr} H: Your favorite comedian {display} 11/17/2018
13
Importance in SDS Turn-taking models and theories of grounding of considerable potential use in SDS What is the User likely to say next and when? How can we be sure what the User has said and its relationship to what s/he believes to be true? What type of response does s/he expect the system to make? When? Obstacles for practical use: What cues signal when it is appropriate to speak? How do we negotiate a common system/user ground? 11/17/2018
14
When Is It Appropriate to Speak? (Beattie ’82)
Data: 25m televised interviews before 1979 British General election Margaret Thatcher (Tory leader): the Iron Lady Jim Callaghan (Prime Minister): Sunny Jim Who interrupts? Less intelligent, highly neurotic, extroverted Men interrupt women Interruptions may indicate Desire for dominance Desire for social approval Conveyance of ‘joint enthusiasm’, heightened involvement 11/17/2018
15
Beattie’s classification scheme:
Identify spkr 2 attempts to take the turn Smooth switches: no simultaneous speech, spkr 1’s utterance complete, turn to spkr 2 Simple interruptions: simultaneous speech, spkr 1 doesn’t complete utterance, turn to spkr 2 Overlap: simultaneous speech, spkr 1 completes utterance, turn to spkr 2 Butting-in: simultaneous speech but no change of turn, spkr 1 keeps the turn Silent interruption: spkr 1’s utterance incomplete, no simultaneous speech, turn to spkr 2 11/17/2018
16
Analyze acoustic/prosodic and gestural information
Turn-yielding behavior Pauses Speaking rate slows Drawl at end of clause Drop in pitch or loudness Completion of syntactic clause Gesture of termination Attempt suppression signals Filled pauses Gestures 11/17/2018
17
Results Thatcher interrupted almost twice as often as she interrupts interviewer (19/10)– unlike Callaghan (14/23) Thatcher: Starts slow and gets faster, few FPs (4) Callaghan: starts fast and gets slower, many FPs (22) Public perception: Thatcher is domineering in interviews and Callaghan is a ‘nice guy’ But Thatcher does not dominate Why is Thatcher interrupted? Interruptions come at end of syntactic clause when drawl on stressed syllable in clause and falling intonation 11/17/2018
18
No suppression signals Why does she do this?
Speech training before election? Why is she still perceived as domineering? When interrupted she doesn’t cede the floor despite lengthy stretches of simultaneous speech 11/17/2018
19
Automatic Speaker Identification/Segmentation
Diarization: Segmentation of audio corpora (Broadcast News, meetings, telephone conversations) into speaker segments Speaker turns Speaker identification Speech and music Speaker segmentation Initial segmentation Segment clustering based on acoustic features State-of-the-art: 8.47% error 11/17/2018
20
<DOCNO> CNN19980104.1130.0000 </DOCNO>
<DOCTYPE> MISCELLANEOUS TEXT (automatic initial) </DOCTYPE> <DATE_TIME> 01/04/ :30:00.00 </DATE_TIME> <BODY> <TEXT> </TEXT> </BODY> <END_TIME> 01/04/ :30:34.71 </END_TIME> </DOC> <DOCNO> CNN </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> 01/04/ :30:34.71 </DATE_TIME> in northern kentucky are forcing 3,000 people in two states to flee their homes. the fire started early this morning at the cargill company plant in maysville near the ohio river. authorities have been going door-to-door advising people in kentucky and ohio to take shelter in area high schools. the fire is in a building where several fertilizers and chemicals are stored. 11/17/2018
21
officials say all they can do is let the fire burn itself out, because
spraying water on the flames would be too dangerous. <TURN> at the current time, our only way of getting it under control is to stay away from it. we've backed everyone off from the fire by about a mile and a quarter and evacuated homes in that radius and the chief threat at this point is a very small risk of a very large explosion caused by 400 tons of ammonia nitrate stored in the building. foir people have been taken to hospitals. one firefighter was injured and treated on the scene. </TEXT> </BODY> <END_TIME> 01/04/ :31:31.00 </END_TIME> </DOC> <DOC> <DOCNO> CNN </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> 01/04/ :31:31.00 </DATE_TIME> <BODY> <TEXT> authorities in brooklyn, new york, say an explosion at a tire company has 11/17/2018
22
caused at least three buildings to collapse.
it set off a four-alarm fire, which has been contained. officials tell cnn one person was injured. investigators have not determined the cause of the incident. </TEXT> </BODY> <END_TIME> 01/04/ :31:48.11 </END_TIME> </DOC> <DOC> <DOCNO> CNN </DOCNO> <DOCTYPE> NEWS STORY </DOCTYPE> <DATE_TIME> 01/04/ :31:48.11 </DATE_TIME> <BODY> <TEXT> unexpected weather conditions are the rule across much of the united states this weekend. angela astore reports. <TURN> <ANNOTATION> Reporter: </ANNOTATION> it was a nice day to play along the beach -- spend a few hours fishing -- or get in a game of golf -- not uncommon -- unless it's january in chicago. record high temperatures were set yesterday from minnesota to massachusetts. warm air drawn northward from the gulf of mexico was behind the rise in the mercury. 11/17/2018
23
it was a different scene in the northwest, where snow is the story.
but the winter weather didn't stop this man from getting in some warmer pursuits. and he wasn't bothered by the fact that he couldn't see where his golf balls landed. <TURN> it's not really where it's going to land that's important at this point while you are learning. once you've learned, then it is. we'll worry about that when the snow clears. right now, it's probably better that i don't see where they land. 11/17/2018
24
Speaker Identification
Linguistic information to identify speaker types and speaker names (LIMSI ’04) Templates (“<name> has this report from <location>”) Results: 10.9% error on test set But only 10% of segments contain relevant patterns Estimate 25% error on broadcast news if segmentation and clustering is done to id all of each speaker’s segments 11/17/2018
25
Online Turn Identification for SDS
Push-to-talk systems Silence detection Speech detection Barge-in Need more ‘natural’ turn-taking support When are users ready to be interrupted? When do they want to keep the floor? When do they expect the system to backchannel? How can we indicate when the system has finished its turn? 11/17/2018
26
Conclusions Turn-taking models and theories of grounding of considerable potential use in SDS What is the User likely to say next and when? How can we be sure what the User has said and its relationship to what s/he believes to be true? What type of response does s/he expect the system to make? When? Obstacles for practical use: What cues signal when it is appropriate to speak? How do we negotiate a common system/user ground? 11/17/2018
27
Next Class We know a few things we need to accomplish and a bit about the difficulties…. What tools do we have to use in tackling the problems? Components of SDS: Automatic Speech Recognition Text-to-Speech Readings: J&M 22.2 11/17/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.