Взаимодействие вербального, просодического и визуального каналов в понимании речи А.А. Кибрик (Институт языкознания РАН и МГУ имени М.В.Ломоносова) Ярославль 22 ноября 2012
INTERACTION OF THE VERBAL, PROSODIC, AND VISUAL COMPONENTS in language understanding Andrej A. Kibrik (Institute of Linguistics RAN and Lomonosov Moscow State University) Jaroslavl’ November 22, 2012
3 The mainstream linguistic approach Language consists of hierarchically organized segmental units, such as phonemes, morphemes, words, phrases, and sentences Linguistic form is thus equated with verbal form
4 However Apart from sound, there are other channels (or components) of communication, in the first place through vision (body language - gesture, mimic, gaze, posture, etc.) Also, there are prosodic, that is non-verbal (non-segmental) aspects to sound Imagine prosody-free talk or, vice versa, talk behind a wall
5 Communication channels The verbal component, prosody, and body language all count as distinct communication (or information) channels They all cooperate in getting message from speaker to addressee This is what is sometimes called the multimodal approach Cf. Реформатский 1963: How the non-verbal “text” interacts with the verbal text?
6 Multimodality ‘‘A multimodal approach assumes that the message is ‘spread across’ all the modes of communication. If this is so, then each mode is a partial bearer of the overall meaning of the message. ’’ (Kress 2002). “Any use of language is inescapably multimodal” (Scollon 2006) “Unimpaired communication is, of course, inherently multimodal, with the speech content being modified by prosody and delivered in parallel with facial expression, gesture, posture, and a range of other nonverbal communication methods.” (Alm 2006) “Within biology, experimental psychology, and cognitive neuroscience, a separate rapidly growing literature has clarified that multisensory perception and integration cannot be predicted by studying the senses in isolation.” (Cohen and Oviatt 2006)
7 What is the contribution of different channels? Traditional approach of mainstream linguistics: the verbal channel is so central that prosody and the visual channel are at best downgraded as “paralinguistics” Applied psychology It is often stated that (figures go back to Mehrabian 1971): body language conveys 55% of information prosody conveys 38% of information the verbal component conveys 7% of information «Words may be what men use when all else fails» (Крейдлин 2002: 6) Who is right?
8 Relative contribution of three communication channels? DISCOURSE Vocal channelsVisual channel Verbal channel Prosodic channel
9 Experimental design Isolate the three communication channels Present a sample discourse in all possible variants (2 3 =8) Present each of the eight variants to a group of subjects Assess the degree of understanding in each case Such assessment may lead to estimates of the contributions of communication channels
10 Studies in this line of research Èl’bert 2006, year paper Èl’bert 2007, diploma thesis Reinterpreted and refined in Kibrik and Èl’bert 2008 Molchanova 2008, year paper Molchanova 2009, year paper Molchanova 2010, diploma thesis Reinterpreted and refined in Kibrik 2011
11 Èl’bert 2007, Kibrik and Èl’bert 2008 Russian TV serial “Tajny sledstvija” – “Mysteries of the investigation” Experimental excerpt: 3 min. 20 sec. Preceded by a 8 minutes context (that starts from the beginning of the series) The excerpt fully consists of a conversation, to ensure that we are testing the understanding of discourse rather than of the film in general Two vocal channels have been separated: Verbal: running subtitles Prosodic: superimposed filter creating the “behind a wall” effect Participants: 99 participants, divided into 8 groups Native speakers of Russian Each group comprised 10 to 17 participants
12 Eight experimental groups Group 0: only the context excerpt Groups 1 (one communication channel) Verbal: subtitles, temporally aligned Prosodic: filtered sound Visual: video Groups 2 (two communication channels): Verbal + prosodic = original sound Verbal + visual: subtitles and video Prosodic + visual: filtered sound and video Group 3: original material
13 Group 3: original material
14 Verbal + visual
15 Visual + prosodic
16 Procedure The context and the experimental excerpts were shown to a group of subjects on a large screen Each subject was instructed to watch the context and the experimental excerpt and then answer a set of questions concerned with the experimental excerpt alone Questionnaire was constructed in accordance with the received principles of test tasks (Panchenko 2000) 23 multiple-choice questions in questionnaire A subject was supposed to choose only one answer out of four listed variants What Tamara Stepanovna offers Masha before the beginning of the conversation: a. to take off her coat b. to have a cup of tea c. to have a seat d. to have a drink Percentage of correct answers is used as an assessment of a subject’s degree of understanding
17 Results All three channels are substantially informative Verbal > visual > prosodic Integration of visual and prosodic channels is difficult
18 Molchanova 2010 “Contribution of information channels in understanding spoken discourse: methodological aspects” The following aspects of the prior study have been changed (improved) Stimulus material Prosodic channel Verbal channel Questionnaire Interviewing procedure
19 Stimulus material: discourse type Shortcomings of movies Plot facilitates guessing Possible familiarity with the movie Quasi-natural behavior of actors Solution: natural dialogue Shared activity Figure-guessing game Can be filmed by one camera все 3 канала.avi, 0:19 – 0:57 Remaining problems Hard to remember the sequence of events Many events are similar
20 Stimulus material: speakers Shortcomings of the prior studies Same-sex speakers indistinguishable in the prosody-only version Solutions Different sexes: F0 range is different Additional features Acquainted Not close friends
21 Prosodic channel Shortcomings of the prosodic material as used in previous studies Èl’bert 2007: noisy sound Molchanova 2009: Unnatural, “electronic”, sound Solution: Loudness is decreased radically at all frequencies except for the speaker’s average F0 frequency This has led to the “behind the wall” (or “behind the glass”) effect
22 Visual + prosodic
23 Verbal channel Shortcomings of subtitles Hard to read without punctuation Especially at the rate of speech And especially in the “verbal + visual” condition Solution: spoken prosody-free signal Each word in transcript is replaced by an individually pronounced word All thus elicited words are glued together in the right order
24 Visual + verbal
25 Verbal channel Remaining problem Unnatural input No reduction No intonation etc.
26 Questionnaire Shortcomings of prior studies Èl’bert 2007: gap between Group 0 (38.3%) and Group 3 (87.4%) is insufficient Solution Testing stage Identify trivial questions (high Group 0) Identify unfortunate questions (low Group 3) 30 17 Group 0: 24.7% correct answers Group 3: 91.2% correct answers
27 Interviewing procedure Shortcomings of prior studies Participants of various age and life experience Multiple participants may affect each other’s performance Need for a large room, loud speakers, and big screen Solutions Control for age, gender, geographical origin, social status Remote implementation Stimulus materials at Youtube.com Questionnaire at Googledocs All participants are in similar conditions Comfortable, adjustable conditions No need for audio and video control in large rooms
28 Kibrik and Èl’bert 2008 vs. Molchanova 2010 General picture is remarkably similar All three channels are substantially informative Verbal > visual > prosodic Visual + prosodic dip is even sharper Cleaner results Two channels is much better than one channel Verbal and visual channels integrate well
29 Normalized contribution of three channels Suppose the three channels are independent Sum up all percentages of individual channel contributions and normalize to 100% Identify normalized contribution
30 Normalized contribution of three channels Kibrik and Èl’bert 2008Molchanova 2010 Summed percentages = =154 Normalized contributions Verbal 72%:1.85≈39%59%:1.54≈38% Prosodic 51%:1.85≈28%46%:1.54≈30% Visual 62%:1.85≈33%49%:1.54≈32%
31 Gender differences Molchanova 2010: gender advantages Percentages of correct answers ConditionMenWomenAdvantage Verbal only Women: Visual + prosodic Men: +14.5
32 Conclusions All communicatioin channels are highly significant the traditional linguistic viewpoint is erroneous The verbal channel is the leading one the viewpoint popular in applied psychology is erroneous Information from the prosodic and the visual channels is primarily used through integration with the verbal channel Very similar results have been attained in different studies, in spite of very different methodological details
33 Further questions Auditory or graphic presentation of the “verbal alone” channel? Optimal discourse type? …and: Other suggestions on this approach?
34 Thanks for your attention verbal channel visual channel prosodic channel language