Andrej A. Kibrik (Iling RAS and Lomonosov MSU) ИНТЕГРИРОВАННОЕ ИССЛЕДОВАНИЕ МУЛЬТИКАНАЛЬНОГО ДИСКУРСА AN INTEGRATED STUDY OF MULTICHANNEL DISCOURSE Andrej A. Kibrik (Iling RAS and Lomonosov MSU) Vera I. Podlesskaya (RSUH) Olga V. Fedorova (Lomonosov MSU) Integrated Approach to the Problem of Knowledge Representation RSUH, 10.04.2017 Проект #14-18-03819 1
The dominant practice The actual practice of language studies is largely oriented to writing Linell 1982, “The written language bias in linguistics” The same is true of language studies in psychology Some important exceptions E.A.Zemskaja, O.A.Lapteva, O.B.Sirotinina Discourse psychology Writing is a late, derived, and “unusual” way of language use The dominant practice could be compared to describing trees on the basis of furniture Indeed, some properties of wood are preserved even after the carpenter’s work But if we want real botany, we have to go to the forest and see how trees grow from the ground
Not just speech When we communicate naturally, we not only produce chains of words, but also intonate gesticulate interact with eye gaze etc. These processes are traditionally studied by different academic disciplines But the actual communication process is whole and undivided Hence multimodal approach Gibbon et al. eds. 2000, Kress 2002, Granström et al. eds. 2002, Scollon 2006, Kibrik 2010, Knight 2011, Adolphs & Carter 2013, Müller et al. eds. 2014 …
Spoken multichannel discourse Language? Discourse Vocal / auditory modality Kinetic / visual modality Verbal channel Prosodic channel Gaze channel Facial expressions channel Gesture channel Other modalities Proxemics channel Other channels Intonation Numerous other components Manual gestures Cephalic gestures Corporal gestures Other gestures по-моему движения головы важнее. Во всяком случае мы их обязались размечать на втором этапе, когда еще без мимики и положения тела
LANGUAGE AS IS: A multichannel initiative Goals: Create a resource approaching the actual richness of human discourse Explore natural communication in an integrated way Registered phenomena verbal structure prosody gesture (and other aspects of “body language”) eye gaze
Outline 1. Resource 2. Some preliminary findings Character of interaction: structured vs. unstructured Character of environment: prepared vs. unprepared Pear chats and stories corpus Design Technical solutions Annotation Underlying idea: sharpen tools for the subsequent stage of free conversation 2. Some preliminary findings
Pear chats and stories The Pear Film (Chafe 1980)
Design Listener Narrator Reteller Commentator The Narrator and the Commentator watch the film The Narrator tells the Reteller about the film The Commentator adds details, all three discuss the film The Reteller tells about the film to the Listener, who has just joined the group The Listener writes down the contents of the film telling chat retelling 8 2nd retelling (written) 8
Audio recording Six channels ZOOM H6 Handy Recorder 96 kHz / 24 bit Each participant recorded with a lapel SONY ECM-88B mic, mono Inbuilt mic records all vocal events, stereo Automatic synchronization of all audio files
Video recording: Cover shot GoPro Hero 4 (wide angle) Frame rate: 50 FPS Resolution 2700х1500
Video recording: Individual frontal cameras Industrial high-speed cameras JAI GO-5000M-USB Frame rate: 100 FPS Crucial for analysis of kinetic behavior File format: mjpeg No interframe compression Resolution 1392х1000 No audio
Eye trackers Tobii Glasses II Eye Tracker Sampling rate: 50 Hz Resolution: 1920х1080 Video recording of the scene 25 FPS overimposed eye movements Software that produces temporal coordinates of fixations
The scene Tobii glasses Listener Narrator Reteller Commentator 13 13
Pear chats and stories: Quantitative parameters 24 sessions recorded in the summer of 2015 96 participants in all 18 to 36 yrs old Gender 36 men and 60 women Education 42 persons with higher education 54 students 9 hours About 100 K words
http://multidiscourse.ru http://multidiscourse.ru/annotation/ Examples of mediafiles Annotation Vocal Gestural Manual Cephalic Eye gaze
Vocal transcription Verbal structure Division into elementary discourse units (EDUs) Quanta of talk (Ščerba 1955, Cruttenden 1986, Chafe 1994) Elementary behavioral acts of discourse production Identified on the basis of prosodic criteria: tempo, pausing, etc. Temporal dynamics Pauses Accents Tone in accents Illocutionary characteristics Phase Tempo Emphasis Reduction Tonal register Disfluencies Comments on specific EDUs General characterization Etc., etc.
Scores vocal transcript
Manual gesture transcript (ELAN) Annotation components Gesture chains Gesture boundaries Handedness Gesture phases Stroke boundaries ………
Eye tracking annotation Annotation components Fixations on: Interlocutor face hands body other Environment Durations
Multilayer annotation
Problems we are struggling with Synchronization of all recordings including the problem of various frame rates Gesture annotation: Degree of detail? Gesture vs. posture Gesture vs. adaptor Automatic detection of motion
Some research findings Traditional notions of communication theory must be restated pausing turn-taking distinction between production and comprehension In the kinetic modality postures are not a separate channel but a part in each particular gestural component High degree of coordination between units belonging to different channels: manual gestures and EDUs (Fedorova et al. 2016)
Repairs and gesture (Vera Podlesskaya) In highly interactional context, additional types of repairs compared to strict monologue In particular, other-initiated Long-range repairs: the speaker realizes that an above stated information is inaccurate Marker of discovery Accompanied by an emphatic gesture slap on the knee
Eyetracking in natural communication (Olga Fedorova) Most eyetracking studies accomplished in experimental settings Eye gaze strategies in natural communication fall into several types: General Longer fixations on face (1 to 2 s) compared to fixations on hands (100 to 250 ms) Context-dependent In interactional context, speaker’s fixations on the environment are rarer than in monologue Individual: total duration of fixations on interlocutors’ hands Recording 23: 1.9% Recording 04: 5.2% Recording 06: 11.1% This may be related to individual differences in peripheral vision Or other individual properties
Conclusion Clearly, the way we gesticulate and operate our vision affects how we talk vocally In a better world, the reasonable sequence in the scientific study of language should have been: (1) basic, original use of language: multichannel spoken face-to-face communication (2) derived, secondary uses of language: monomodal, written, grammar But if we cannot reverse the history of linguistics, we attempt to explore the fundamental form of language now – better late than never