User Benefits of Non-Linear Time Compression 1 Liwei He & Anoop Gupta September 21st, 2000 Microsoft Research.

Slides:



Advertisements
Similar presentations
Categories of I/O Devices
Advertisements

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
QoS Impact on User Perception and Understanding of Multimedia Video Clips G. Ghinea and J.P. Thomas Department of Computer Science University of Reading,
Segmentation and Paging Considerations
Project Proposal.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
CLOUD COMPUTING FOR MOBILE USERS: CAN OFFLOADING COMPUTATION SAVE ENERGY? Purdue University.
Using Multiple Synchronized Views Heymo Kou.  What is the two main technologies applied for efficient video browsing? (one for audio, one for visual.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Operating System Support Focus on Architecture
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Streaming Video Gabriel Nell UC Berkeley. Outline Scalable MPEG-4 video – Layered coding method – Integrated transport-decoder buffer model RAP streaming.
A study on the design/development time of E-learning projects in New Zealand. Kumar Laxman, Phd Associate Professor Faculty of Education University of.
Introduction to Streaming © Nanda Ganesan, Ph.D..
Video Streaming for Web Pages Jacqueline Altieri.
Exploiting Virtualization for Delivering Cloud based IPTV Services Speaker : 吳靖緯 MA0G IEEE Conference on Computer Communications Workshops.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Welcome To. Improving Remote File Transfer Speeds By The Solution For: %
Packet and Circuit Switching
Using Multimedia on the Web
1 CMSCD1011 Introduction to Computer Audio Lecture 10: Streaming audio for Internet transmission Dr David England School of Computing and Mathematical.
Streaming Videoconferences Soh Hock Heng National University of Singapore Doug Pearson Internet2 Commons Site Coordinator Training December 3, 2003 National.
User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Virtual Memory.
INF Web Design Using Multimedia on the Web Sound - Part 2.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Raymond S. Pastore, Ph.D..  What is multimedia?  Verbal and Non Verbal representations better for learning than just one (Mayer, 2005)  Modality.
By Sarita Jondhale1 Pattern Comparison Techniques.
1 Lecture 12: Multimedia Not in Web 101 Text  Important Multimedia Issues  Audio  Movies and Video  Multimedia and HTML Documents.
Glencoe Introduction to Multimedia Chapter 9 Video 1 Chapter Video 9  Section 9.1 Video in Multimedia  Section 9.2 Work with Video Contents.
An Introduction to 64-bit Computing. Introduction The current trend in the market towards 64-bit computing on desktops has sparked interest in the industry.
Lector: Aliyev H.U. Lecture №15: Telecommun ication network software design multimedia services. TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES THE DEPARTMENT.
Collaboration and Education Group Anoop GuptaJonathan Grudin David BargeronSteven White Liwei HeYong Rui.
Speech Enhancement Using Spectral Subtraction
Temporal Compression Of Speech: An Evaluation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Simon Tucker and Steve.
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger Chen-Chi Wu, Kuan-Ta Chen, Yu-Chun Chang, and Chin-Laung.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Audio on the Web Teaching OntheNet 2002 Minneapolis, MN June 23-25, 2002.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - VIDEO. In this chapter How digital video differs from conventional analog video How digital video differs from.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
SAM for SQL Workloads Presenter Name.
Chapter Two Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User's Approach Eighth Edition.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
California State University, LA Presented by Amanda Steven StevenAamirObaid.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.
INTRODUCTION TO E-LEARNING. Objectives This chapter contains information on understanding the fundamental concepts of e-learning. In this chapter, e-learning.
Operating Systems Lecture 9 Introduction to Paging Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Human Computer Interaction Lecture 21 User Support
Chapter Two Fundamentals of Data and Signals
Chapter 2 Memory and process management
Jacob R. Lorch Microsoft Research
Human Computer Interaction Lecture 21,22 User Support
ITEC 202 Operating Systems
Web Programming– UFCFB Lecture 8
Chapter 9 – Real Memory Organization and Management
Paper Reading part Seo Seok Jun.
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
O.S Lecture 13 Virtual Memory.
Web Programming– UFCFB Lecture 8
Year 10 Computer Science Hardware - CPU and RAM.
CS703 – Advanced Operating Systems
Presentation transcript:

User Benefits of Non-Linear Time Compression 1 Liwei He & Anoop Gupta September 21st, 2000 Microsoft Research

User Benefits of Non-Linear Time Compression2 Overview ► In comparison to text, audio-video content is much more challenging to browse ► Time-compression has been suggested as a key technology that can support browsing ► Time compression speeds-up the playback of audio-video content without causing the pitch to change ► Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks.

User Benefits of Non-Linear Time Compression3 Non-linear time compression ► In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. ► The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes)  to differentially speed-up segments of speech so that the overall speed-up can be higher

User Benefits of Non-Linear Time Compression4 Overview ► Also we explore what are the actual gains achieved by end-users from these advanced algorithms ► And whether the gains are worth the additional systems complexity. ► Categories:  Time compression, Digital library, Multimedia browsing

User Benefits of Non-Linear Time Compression5 Motivation ► Digital multimedia information on the Internet is growing at an increasing rate  corporations are posting their training materials and talks online  universities are putting up their videotaped courses online  news organizations are making newscasts available ► While the network bandwidth is somewhat of a bottleneck today  The eventual bottleneck really is the limited human time.

User Benefits of Non-Linear Time Compression6 Motivation!  It is highly desirable to have Technologies that let people browse audio-video quickly  The impact of even a 10% increase in browsing speed can be large  people may have different reading rates  We can provide people the ability to speedup or slow- down audio-video content based on their preferences  Also we try to focus on informational content with speech (e.g., talks, lectures, and news) rather than entertainment content (e.g., music videos, soap operas),

User Benefits of Non-Linear Time Compression7 Technology ► Core technology is called time-compression ► Simple forms of time-compression have been used before in hardware device contexts and telephone voic systems ► systems today use linear time-compression  speech content is uniformly time compressed, ► e.g., every 100ms chunk of speech is shortened to 75ms. ► users can save more than 15 minutes on a one-hour lecture.

User Benefits of Non-Linear Time Compression8 non-linear time-compression ► we explore how much additional benefit can be achieved from non-linear time-compression techniques ► We consider two such algorithms:  The first, simpler algorithm combines pause-removal with linear time compression ► It first detects pauses (silence intervals) in the speech ► then shortens or removes the pauses ► Such a procedure can remove 10-25% from normal speech ► It then performs linear time compression on the remaining speech.

User Benefits of Non-Linear Time Compression9 non-linear time-compression ► Algorithm 2 is much more sophisticated  It tries to mimic the compression strategies that people use when they talk fast in natural settings  Also it tries to adapt the compression rate at a fine granularity based on low level features (e.g., phonemes) of human speech.

User Benefits of Non-Linear Time Compression10 Core Questions ► non-linear algorithms, while offering the potential for higher speed-ups, require:  more compute (CPU) cycles  increased complexity in client-server systems for streaming media  may result in a jerky video portion ► core questions we address:  What are the additional benefits of the non-linear algorithms over the simple linear time-compression algorithm implemented in products today?

User Benefits of Non-Linear Time Compression11 Core Questions ► Most people will not listen to speech at such fast rates.  We are interested in understanding people’s preference at more comfortable and sustainable speed-up rates.  if the difference at sustainable speed is large will it be worthwhile to implement these algorithms in products. ► How much better is the more sophisticated algorithm over the simpler non-linear algorithm?  magnitude of differences will again guide our implementation strategy in products

User Benefits of Non-Linear Time Compression12 Linear Time Compression (Linear) ► time-compression is applied consistently across the entire audio stream  with a given speed-up rate, without regard to the audio information contained therein  The most basic technique for achieving time- compressed speech involves taking short fixed length speech segments (e.g., 100ms), and discarding portions of these segments (e.g., dropping 33ms segment to get 1.5-fold compression), and abutting the retained segments.

User Benefits of Non-Linear Time Compression13 Linear Time Compression (Linear) ► Discarding segments and abutting the remnants  produces discontinuities at the interval boundaries and produces audible clicks and other forms of signal distortion ► To improve the quality of the output signal:  a windowing function or smoothing filter–such as a cross fade– can be applied at the junctions of the abutted segments

User Benefits of Non-Linear Time Compression14 Linear Time Compression (Linear) ► A technique called Overlap Add (OLA) yields good quality:

User Benefits of Non-Linear Time Compression15 Linear Time Compression (Linear) ► The technique used in this study is SOLA  It consists of shifting the beginning of a new speech segment over the end of the preceding segment to find the point of highest waveform similarity.  Once this point is found, the frames are overlapped and averaged together  SOLA provides a locally optimal match between successive frames and mitigates the reverberations

User Benefits of Non-Linear Time Compression16 Pause Removal plus Linear Time Compression (PR-Lin) ► Non-linear time compression is an improvement on linear compression:  the content of the audio stream is analyzed  and compression rates may vary from one point in time to another ► Typically, non-linear time compression involves compressing redundancies, i.e.:  pauses or elongated vowels

User Benefits of Non-Linear Time Compression17 Pause Removal plus Linear Time Compression (PR-Lin) ► The PR-Lin algorithm used in this paper, first detects pauses:  It leaves pauses below 150ms untouched, and shortens longer pauses to 150ms  It then applies linear time-compression  variety of measures can be used for detecting pauses even under noisy conditions ► “Energy” and “Zero crossing rate (ZCR)” is used ► Also, in order to adjust changes in the background noise level, a dynamic energy threshold is used

User Benefits of Non-Linear Time Compression18 Pause Removal plus Linear Time Compression (PR-Lin) ► ► If the energy of a frame is below the dynamic threshold and its ZCR is under the fixed threshold, the frame is categorized as a potential-pause frame, otherwise it is labeled as a speech frame.   Contiguous potential-pause frames are marked as real- pause frames when they exceed 150ms. ► ► Pause removal typically shortens the speech by 10-25% before linear time-compression is applied.

User Benefits of Non-Linear Time Compression19 Adaptive Time Compression (Adapt)  A variety of sophisticated algorithms have been proposed for non-linear Adpt. ► i.e. preserving the phoneme transitions in the compressed audio to improve understandability  Audio spectrum is computed first for audio frames of 10ms  If the magnitude of the spectrum difference between two successive frames is above a threshold, they are considered as a phoneme transition and not compressed ► Mach1 makes further improvements and tries to mimic the compression that takes place when people talk fast in natural settings

User Benefits of Non-Linear Time Compression20 Adaptive Time Compression (Adapt)  strategies come from the linguistic studies of natural speech: ► Pauses and silences are compressed the most ► Stressed vowels are compressed the least ► Schwas and other unstressed vowels are compressed by an intermediate amount ► Consonants are compressed based on the stress level of the neighboring vowels ► On average, consonants are compressed more than vowels

User Benefits of Non-Linear Time Compression21 Adaptive Time Compression (Adapt)  Mach1 estimates continuous-valued measures of local emphasis and relative speaking rate.  Together, these two sequences estimate the audio tension: ► the degree to which the local speech segments resist changes in rate.  High tension regions are compressed less and low- tension regions are compressed more aggressively. ► Based on the audio tension, the local target compression rates are computed and used to drive a standard time-scale modification algorithm, such as SOLA.

User Benefits of Non-Linear Time Compression22 Systems Implications of Algorithms ► In deciding between these three algorithms for inclusion in products, there are two considerations: 1.what are the relative benefits (e.g. speed-up rates) achievable 2.what are the costs (e.g. implementation challenges).  We explore the former in the User Study section

User Benefits of Non-Linear Time Compression23 (a) computational complexity  The first issue is computational complexity or CPU requirements. ► The first two algorithms, Linear and PRLin, are easily executed in real-time on any Pentium-class machine using only a small fraction of the CPU ► The Adapt algorithm, in contrast, has 10+ times higher CPU requirements although it can be executed in real-time on modern desktop CPUs.

User Benefits of Non-Linear Time Compression24 (b) complexity of client-server  Assumption:  people will like the time compression feature to be available with streaming-media clients where they can just turn a virtual knob to adjust speed-up ► a key issue has to do with buffer management and flow-control between the client and server. ► The Linear algorithm has the simplest requirements, where the server simply needs to speed-up its delivery at the same rate at which time compression is requested by client ► The nonlinear algorithms (both PR-Lin and Adapt) have much more complex requirements due to the uneven rate of data consumption at the client

User Benefits of Non-Linear Time Compression25 (c) audio-video synch. quality ► With the Linear algorithm, the rendering of video frames is speeded up at the same rate as the speed- up for speech.  While everything happens at higher speed, the video remains smooth and perfect lip synchronization between audio and video can be maintained. ► This task is much more difficult with nonlinear algorithms (PR-Lin and Adapt) i.e. :  consider removal of a 2-second pause from the audio track: ► Option 1: remove the video frames corresponding to those 2 seconds

User Benefits of Non-Linear Time Compression26 (c) audio-video synch. quality ► In this case the video will appear jerky to the end- user, although we will retain lip synchronization between audio and video for subsequent speech.  Option-2 is to make the video transition smoother by keeping some of the video frames from that 2-second interval and removing some later ones: ► but now we will loose the lip synchronization for subsequent speech. There is no perfect solution.  bottom line is that non-linear algorithms add significant complexity to the implementer’s task

User Benefits of Non-Linear Time Compression27 User Study Goals ► Highest intelligible speed  What is the highest speed-up factor at which the user still understands the majority of the content? ► Comprehension  Given the same fixed speed-up factor for all algorithms, what is a user’s relative comprehension? ► Subjective preference  When given the same audio clip compressed using two different techniques at the same speed-up factor, which one does a user prefer? ► Sustainable speed  What is the speed-up factor that end-users will settle on when listening to long pieces of content (e.g., a lecture), still assuming some time pressure?

User Benefits of Non-Linear Time Compression28 Experimental Method ► 24 people participated their study ► variety of background  from professionals in local firms to retirees to homemakers ► All of them had some computer experience ► The listener study was Web based ► All the instructions were presented to the subjects via web pages

User Benefits of Non-Linear Time Compression29 Experimental Method ► The study consisted of four tasks:  Highest Intelligible Speed Task ► find the fastest speed at which the audio was still intelligible  Comprehension Task ► four multiple-choice questions about the conversation  Subjective Preference Task ► The subjects were instructed to compare 6 pairs of clips time  Sustainable Speed Task ► asked them to imagine that they were in a hurry, but still wanted to listen to the clips

User Benefits of Non-Linear Time Compression30 Listener Study Results ► Highest Intelligible Speed  non-linear algorithms do significantly better than Linear ► Comprehension Task  Adapt to do best, followed by PR-Lin and Linear, and the comprehension differences to increase at the higher speed-up factor ► Preference Task  there is slight but non-significant preference for Adapt over PR-Lin ► Sustainable Speed  There is no significant difference between Adapt and PR-Lin

User Benefits of Non-Linear Time Compression31 Concluding Remarks ► Results show that for speed-up factors most likely to be used by people, the more sophisticated non- linear time compression algorithms do not offer a significant advantage. ► Given the substantial implementation complexity associated with these algorithms in client-server streaming-media systems, we may not see them adopted in the near future

User Benefits of Non-Linear Time Compression32 Concluding Remarks ► Based on a preliminary study  the problem is not that the benefits are small because the sophisticated algorithms are not very good.  In fact, end-users cannot distinguish between these algorithms speeding-up speech and a human speaking faster.  Thus delivering significantly larger time- compression benefits to end-users remains an open challenge for researchers.