Speaking while monitoring addressees for understanding Torsten Jachmann Herbert H. Clark and Meredyth A. Krych Seminar „Gaze as function of instructions - and vice versa“
Research Question Speaking and listening in dialog o Unilateral Speakers and listeners act autonomous No interaction o Bilateral Speakers and listeners monitor their respective partner Joint activity What do speakers monitor? How do they use that information?
Grounding Level 1 o Attend to vocalization Level 2 o Identify words, phrases and sentences Level 3 o Understand the meaning Level 4 o Consider answering
Grounding A: Where you there when they erected the new signs? B: Th… which new signs?(Level 3) A: Little notice boards, indicating where you had to go for everything B: No. Bilateral account
Monitoring Voices o Attendance to partners utterances Faces o Gaze and facial expressions as indicator for understanding Workspaces o Region in front of the body o Manual gestures (but also games, etc.)
Monitoring Bodies o Head and torso movement as indicator Shared Scenes o Scenery beyond workspace Signals vs. Symptoms o Signals are constructed to get meaning across o Symptoms are not intentionally created
Least joint effort Opportunistic o Selection of the available methods that take the least effort to produce “Tailored” o Overhearers (not monitored by speaker) may misunderstand utterances
Method Pairs of directors and builders o 76 students (34 male / 42 female) Instructions to build 10 simple Lego Models 2 x 2 design (interactive) o 28 pairs Additional non-interactive condition o 10 pairs Video and audio analyses
Interactive Mixture model o Workspace (between subject) Visible Invisible o Faces (within subject) Visible Invisible No restrictions in time and talk
Non-interactive Only one condition Director records instructions o No time or talk constrains o Prototype can be examined as long as wanted before recording Builders listen to instructions o No constrains on actions Start, stop, rewind
Results Efficiency Turns Gestures and grounding o Deictic expressions o Gestures by addressees o Cross-timing of actions o Timing strategies o Visual monitoring
Efficiency Visibility of workspace improves efficiency
Efficiency Non-interactive Time needed to build much longer (245s “n-i” vs. 183s “i”) Strong drop in accuracy o Inadequate instructions
Turns Fewer SPOKEN turns of builder when workspace is visible
Deictic expressions Mainly unusable when workspace hidden o Joint attention needed o only referring to before mentioned situation
Gestures by addressees Mostly accompanied by deictic utterances (if any) Explicit verdict usually only on such utterances (otherwise continuing)
Cross-timing Gestural signals o Reflect understanding at that moment
Cross-timing Overlapping signals o Usually not in spoken dialog o Start with “sufficient information”
Cross-timing Projecting o Prediction of following actions/instructions
Cross-timing Initiation time o Waiting for partner to be able to attend the following utterance
Cross-timing Time uptake o Responses have to be timed exactly to the action and situation
Timing strategies Self-interruption o Dealing with evidence from the addressee o Usually not continued
Timing strategies Collaborative references o Deictic references rely on addressees actions
Visual monitoring Mainly used when director reaches a problem Eye gaze as support
Conclusion Grounding is fundamental Visible workspace enhances grounding speed In task-oriented dialogs faces are not important Compensation possible (only if any monitoring is available)
Conclusion Updating common ground Increments are determined jointly Much evidence for bilateral account o Addressees provide statement about current understanding o Speakers monitor to update and change utterances
Conclusion Opportunistic process o Offering options o Self-interruptions o Waiting o Instant revision Multi-modal process o Speech and gestures are combined if possible o Speech alone takes more time
Remarks Gaze only important for certain types of tasks Measurement of time maybe outdated (“old” study) No contradicting studies (To some extend commonsense)
Gaze and Turn-Taking Behavior in Casual Conversation Interactions Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida and Seiichi Yamamoto
Differences Three-party dialogue No instructional task Stronger focus on eye gaze
Research Question How well can eye gaze help in predicting turn taking? What is the role of eye gaze when the speaker holds the turn? Is the role of eye gaze as important in three-party dialogs as in two-party dialogue?
Hypothesis In group discussions, eye gaze is important in turn to management (especially in turn holding cases) The speaker is more influential than the other partners in coordinating interactions (selects the next speaker)
Method Three-person conversational eye gaze corpus o Natural conversations o Balanced familiarity (50% familiar; 50% unfamiliar) o Balanced gender (male-only; female-only; mixed)
Method 28 conversations among Japanese students in their early 20’s with three participants each Each conversation about 10 minutes Eye gaze recorded for one participant
Method Eye tracker fixed on table to remain naturalness
Method
Used data Estimated at the last 300ms of an utterance if followed by a 500ms pause
Used data Dialog acts Speech features o Values of F0, etc. Eye gaze
Results
Conclusion Speaker signals whether he intends to give the turn or hold it by using eye gaze o fixating listener vs. focusing attention somewhere Eye gaze in multi-participant conversation as important as in two- participant conversations
Conclusion Eye gaze is used to select next speaker (seems to be correct) Maybe Japanese data interferes with value of speech data o Comparison Study? Listeners focus on speaker not vice versa
Remarks Vague information and data presentation o Although various data exists, interaction of factors is not presented o Some conclusions rely on the before mentioned point Setup only takes one participant in consideration Much of the data was unused o Lack in quality and way of creation
Remarks Study is based on data for another study o Setup is not optimal Realistic design o Yet, contains biasing flaws (situation of the participants, only one eye tracker)
Comparison Clark and Krych present interesting ideas but eye gaze is only rarely handled o How could this be altered? Jokinen et al. focus on eye gaze in a (more or less) natural situation but lack in scientific results and setup o What points and ideas of this setup could be beneficial?