Evaluating the Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination Andrew T. Duchowski COMPUTER SCIENCE CLEMSON UNIVERSITY Abstract Our work serves as an assay of the impact of text chunking on subtitling for the deaf and hard-of-hearing (SDH). Specifically, we aim to evaluate subtitles constructed with different chunking methods to determine whether different text groupings influence comprehension or affect the viewing experience as a whole. The results show that there are indeed disparities in the eye tracking data among the four subtitling methods tested. Methodology References Results The results were analyzed along four metrics, both within-subjects (comparing the four runs shown to each subject) and between-subjects (considering only the data from the first video clip each subject viewed). The metrics were mean durations of fixations, percentage of fixations in the subtitles, percentage of gazepoints in the subtitles, and number of saccadic crossovers (see Figure 3). Within subjects, ANOVA showed a significant difference (p < 0.01) in saccadic crossovers. Pairwise t-tests (no correction) revealed a marginally significant (p < 0.05) difference between the word by word and chunked by phrase clips. There was also a marginally significant difference (p < 0.05) between the word by word and chunked by sentence clips (see Figure 4). Between subjects, a marginally significant difference (p < 0.05) was found between the word for word and chunked by phrase in terms of durations of fixations. There was a nearly significant difference (p = 0.06) between the word for word and chunked by phrase in terms of percentage of gazepoints in the subtitles (see Figure 4). There were no significant differences in preference or comprehension. Discussion The chunked by phrase subtitles seemed to provide the relative best viewing experience, as they had the least number of crossovers and the smallest percentage of gazepoints in the subtitles (meaning viewers spent more time on the video itself), with no difference in comprehension from the other four clips (and no significant difference in preference). Likewise, the word by word subtitles seemed to provide the relative worst quality viewing experience, as it had the most number of crossovers and the largest percentage of gazepoints in the subtitles. The mean fixation duration is indicative of the speed of processing information; therefore, the data imply that the chunked by phrase style was the easiest to take in, and the word by word was the most difficult. Dhevi R. Rajendran COMPUTER SCIENCE RICE UNIVERSITY Introduction & Background This study aims to improve subtitles for the deaf and hard-of-hearing (SDH) with principles of language processing. Text chunking, or the grouping of a block of text into coherent segments (see Figure 1), has been shown in other contexts, such as studies of short-term memory (Baddeley 1993) to increase short-term information retention and speed of information processing. We examine whether these findings also apply to subtitles. Conclusion The study showed that different types of text groupings in subtitles elicit different viewing behaviors. The significance of these differences would be of interest in future research, to truly understand the effects of text chunking. A replication of this study with more participants may produce greater significance by reducing the variance. A replication of this study with deaf and hard-of-hearing participants would be a good way to confirm the results. Though eye tracking over video clips is still a nascent technology, much research has been done to improve subtitling. Most deals with the styles, speed of display, and positioning of the subtitles. Matamala and Orero (2010) explored preferences in colors, positions, and display speeds of subtitles. This study tests an aspect of subtitles that, to our knowledge, is as yet unexplored. 24 university students viewed four video clips in a unique order. The clips had the same video, but different subtitles. The four subtitling methods were: no segmentation (the area for subtitles was filled with as much text as possible); word by word (words showed up one by one); chunked by phrase (phrases showed up one by one); and chunked by sentence (sentences showed up one by one). All clips had no audio to ensure information was taken in visually (see Figure 2). Participants were instructed to watch the clip for content in both the subtitles and videos. After viewing the first clip, they were given a multiple choice quiz on the content. Participants filled out preference questionnaires after viewing each clip and after they had viewed all four. Figure 2: Screenshots of heatmaps overlaid on a single frame of the video clip with no segmentation subtitles (top left), word by word subtitles (top right), subtitles chunked by phrase (bottom left), and subtitles chunked by sentence (bottom right). Because the same clip was shown four times, fatigue or boredom cannot be excluded as a confound of within-subject analysis. The comprehension quiz may have caused participants to view the remaining three clips differently than they would have normally. The questionnaires asking for preferences may not have been reliable, as some questions were slightly ambiguous, and the participants seemed to interpret the questions differently. No subjects were deaf or hard-of-hearing, so some results may not fully generalize to the deaf and hard-of-hearing populations. Study Limitations Figure 1: Subtitle chunked by phrase, as described in methodology. Figure 3: Example of a saccadic crossover. A saccadic crossover occurs when a gazepoint in the subtitles is immediately followed by a gazepoint in the scene, or vice versa. Figure 4: Graphs showing means and standard errors for within-subject saccadic crossovers (left), mean durations of fixations (middle), and percentage of gazepoints in the subtitles (right). Matamala, Anna, and Pilar Orero. Listening to Subtitles. 1st ed. Bern, Switzerland: International Academic Publishers, Baddeley, Alan. The Psychology of Memory. 1st ed. New York, NY: New York University Press, Juan Martínez SPEECH RECOGNITION AND RESPEAKING ZURICH UNIV. OF APP. SCIENCES WINTERTHUR School of Computing, Clemson University