Download presentation
Presentation is loading. Please wait.
Published byLorena Cole Modified over 9 years ago
1
User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research
2
Introduction Time compression: key to browse AV content We focus on informational content Audio time compression algorithms Linear: speed up audio uniformly Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes) How much more do users gain from more complex algorithms?
3
Methodology Conduct user listening test One Linear TC algorithm Two Non-linear TC algorithms Simple: Pause-removal followed by Linear TC Sophisticated: Adaptive TC Compare objective and subjective measurements
4
Time Compression Algorithms
5
Linear Time Compression Classic algorithms Overlap Add (OLA) and Synchronized OLA (SOLA) We use SOLA
6
Non-Linear Time Compression Algorithm 1: Pause removal plus TC Energy and Zero Crossing Rate analysis Leave 150ms untouched Shorten >150ms to 150ms Apply SOLA algorithm PR shortens speech by 10-25%
7
Non-Linear Time Compression (cont.) Algorithm 2: Adaptive TC Mimics people when talking fast Pauses and silences are compressed the most Stressed vowels are compressed the least Consonants are compressed more than vowels Consonants are compressed based on neighboring vowels
8
System Implications Computational complexity Adaptive TC 10x more costly than Linear TC Complexity in client-server implementation Buffer management required for non-linear TC Audio-video synchronization quality
9
User Study Method
10
User Study Goals Highest intelligible speed Comprehension Subjective preference Sustainable speed
11
Experiment Method 24 subjects 4 tasks for each subject 3 time compression algorithms Linear TC using SOLA (Linear) Pause removal plus Linear TC (PR-Lin) Adaptive TC (Adapt) Each test takes approximately 30 minutes
12
Highest Intelligible Speed Task 3 clips from technical talks Find the highest speed when most of words are understandable
13
Comprehension Task 3 clips at 1.5x and 3 clips at 2.5x Clips from TOEFL listening test Answer 4 multiple choice questions
14
Subjective Preference Task 3 pairs of clips at 1.5x 3 pairs of clips at 2.5x Each pair contains the same clip compressed with 2 of the 3 TC algorithms Indicate preference on 3-point scale
15
Sustainable Speed Task 3 clips each 8 minute along Clips from a CD audio book Find the maximum comfortable speed Write a 4-5 sentence summary at the end
16
User Study Results
17
Highest Intelligible Speed Task PR-Lin is significantly better than Adapt (p<.01)
18
Comprehension Task Adapt is better than PR-Lin (p=.083) at 2.5x
19
Preference Task at 1.5x Slight preference for PR-Lin (p=.093) 1.5x Prefer Former Prefer None Prefer Latter Linear vs. PR-Lin 6513 PR-Lin vs. Adapt 1356 Adapt vs. Linear 888
20
Preference Task at 2.5x PR-Lin and Adapt do significantly better than Linear 2.5x Prefer Former Prefer None Prefer Latter Linear vs. PR-Lin 2814 PR-Lin vs. Adapt 4911 Adapt vs. Linear 2130
21
Sustainable Speed Task
22
Conclusions
23
Previous Works Mach1 (Covell et. al. ICASSP 98) Comprehension and preference tasks Comparing Linear and Mach1 (Adapt) at 2.6-4.2x Comprehension scores 17% better w/ Mach1 95% prefers Mach1 to Linear No data on < 2.0x Other works (Harrigan, Omoigui, Li, Foulke) 1.2-1.7x is the sustainable listening speed
24
Conclusions Trade off in TC algorithms is task-related Listening: Linear TC is sufficient Fast Forwarding: Non-linear TC is more suitable Adapt TC is close to the way people talk fast Limit lies in the human-listening and comprehension
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.