User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research.

Slides:

Advertisements

Similar presentations

11/4/1999ACM Multimedia 991 Auto-Summarization of Audio-Video Presentations Li-wei He, Elizabeth Sanocki Anoop Gupta, Jonathan Grudin Collaboration and.

Advertisements

How to Present your Work

Improved TF-IDF Ranker

Common Core Standards and the Edmonds School District November 4, 2013.

Clipping Lists & Change Borders: Improving Multitasking Efficiency with Peripheral Information Design Mary Czerwinski George Robertson Desney Tan Microsoft.

On Large-Scale Peer-to-Peer Streaming Systems with Network Coding Chen Feng, Baochun Li Dept. of Electrical and Computer Engineering University of Toronto.

Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.

User Benefits of Non-Linear Time Compression 1 Liwei He & Anoop Gupta September 21st, 2000 Microsoft Research.

Skills: none Concepts: presentation experiments, audio tempo, audio speed, independent variable, dependent variable This work is licensed under a Creative.

Evaluation of Speak Project 2b Due March 24th. Overview Experiments to evaluate performance of your audioconference (proj2) Focus not only on how your.

Evaluation Adam Bodnar CPSC 533C Monday, April 5, 2004.

On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.

Adaptive Delay Concealment for Internet Voice Applications with Packet-Based Time-Scale Modification Fang Liu, JongWon Kim, C.-C. Jay Kuo IEEE ICASSP 2001.

HCI and Usability Issues of Multimedia Internet broadcasting Lecture 3.

R Comparing Presentation Summaries: Slides vs. Reading vs. Listening Liwei He, Elizabeth Sanocki Anoop Gupta, Jonathan Grudin Collaboration and Multimedia.

ICASSP'06 1 S. Y. Kung 1 and M. W. Mak 2 1 Dept. of Electrical Engineering, Princeton University 2 Dept. of Electronic and Information Engineering, The.

Power saving technique for multi-hop ad hoc wireless networks.

Assessing Listening. Problems of Lang. Assessment A problem: performance = competence? In language assessment we intend to assess a person’s competence.

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.

Academic Lecture Comprehension 1 Yoshio Ueno, 2011 Coordinator Office: A Extension: red: required green: useful.

Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.

Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.

Simon Tucker NLP Presentation Efficient user-centred access to multimedia meeting content Simon Tucker and Steve Whittaker University.

CS 218 F 2003 Nov 3 lecture:  Streaming video/audio  Adaptive encoding (eg, layered encoding)  TCP friendliness References: r J. Padhye, V.Firoiu, D.

Algorithm Taxonomy Thus far we have focused on:

The IUSES Educational Toolkit Introduction to IUSES Intelligent Use of Energy at School The sole responsibility for the content of this presentation lies.

Technically Speaking Dr. Sarah Wang Duane Long Outline Importance Elements of a Good Talk Presentation Style.

Raymond S. Pastore, Ph.D..  What is multimedia?  Verbal and Non Verbal representations better for learning than just one (Mayer, 2005)  Modality.

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.

Intelligent and Adaptive Middleware to Improve User-Perceived QoS in Multimedia Applications Pedro M. Ruiz, Juan A. Botia, Antonio Gomez-Skarmeta University.

Assessing Listening.

Collaboration and Education Group Anoop GuptaJonathan Grudin David BargeronSteven White Liwei HeYong Rui.

Cisco Public © 2012 Cisco and/or its affiliates. All rights reserved. 1.

Temporal Compression Of Speech: An Evaluation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 4, MAY 2008 Simon Tucker and Steve.

Effects of Two Advance Organizers on Listening Comprehension in Video Viewing— Pictorial Contextual Cues versus Verbal Contextual Keys Spooky Chang July.

Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.

LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.

Case Examples Meet Maggie and Suzie – both are qualified for SLD under Reading comprehension By digging deeper and understanding their patterns of strengths.

IELTS Pwint Nee Aung. Content Introduction My IELTS Progress How I prepared for IELTS Tips (or) test strategies o Listening o Reading o Writing o Speaking.

Multiple Audio Sources Detection and Localization Guillaume Lathoud, IDIAP Supervised by Dr Iain McCowan, IDIAP.

Kori Inkpen John C. Tang Rajesh Hegde Zhengyou Zhang Sasa Junuzovic Chris Brooks Univ. Saskatchewan In-Meeting Review using Multimodal Accelerated Instant.

HIGHLIGHTS OF CHI 2000 Thomas G. Holzman, Ph.D. (404)

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.

1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.

1 Hypermedia learning and prior knowledge: domain expertise vs. system expertise. Timothy J. F. Mitchell, Sherry Y. Chen & Robert D. Macredie. (2005) Hypermedia.

Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

CS 6763 Assignment 2: Article Review & Critique Maria A. Cordell September 30, 2004.

1 FollowMyLink Individual APT Presentation First Talk February 2006.

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

How to respond to a topic prompt. Introduction Question: What are you going to say in a persuasive speech?

Learning Objective: to understand the idiographic vs nomothetic debate

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

Using Speech Recognition to Predict VoIP Quality

Data Transformation: Normalization

Jacob R. Lorch Microsoft Research

Simon Tucker and Steve Whittaker University of Sheffield

Chapter 25: Advanced Data Types and New Applications

College of Engineering

Paper Reading part Seo Seok Jun.

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Communicating and Adapting Language task

COMP60621 Fundamentals of Parallel and Distributed Systems

The quality of choices determines the quantity of Key words

COMP60611 Fundamentals of Parallel and Distributed Systems

How to Give a Journal Club Talk

Presentation transcript:

User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research

Introduction Time compression: key to browse AV content We focus on informational content Audio time compression algorithms  Linear: speed up audio uniformly  Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes) How much more do users gain from more complex algorithms?

Methodology Conduct user listening test  One Linear TC algorithm  Two Non-linear TC algorithms  Simple: Pause-removal followed by Linear TC  Sophisticated: Adaptive TC Compare objective and subjective measurements

Time Compression Algorithms

Linear Time Compression Classic algorithms  Overlap Add (OLA) and Synchronized OLA (SOLA)  We use SOLA

Non-Linear Time Compression Algorithm 1: Pause removal plus TC  Energy and Zero Crossing Rate analysis  Leave 150ms untouched  Shorten >150ms to 150ms  Apply SOLA algorithm  PR shortens speech by 10-25%

Non-Linear Time Compression (cont.) Algorithm 2: Adaptive TC  Mimics people when talking fast  Pauses and silences are compressed the most  Stressed vowels are compressed the least  Consonants are compressed more than vowels  Consonants are compressed based on neighboring vowels

System Implications Computational complexity  Adaptive TC 10x more costly than Linear TC Complexity in client-server implementation  Buffer management required for non-linear TC Audio-video synchronization quality

User Study Method

User Study Goals Highest intelligible speed Comprehension Subjective preference Sustainable speed

Experiment Method 24 subjects 4 tasks for each subject 3 time compression algorithms  Linear TC using SOLA (Linear)  Pause removal plus Linear TC (PR-Lin)  Adaptive TC (Adapt) Each test takes approximately 30 minutes

Highest Intelligible Speed Task 3 clips from technical talks Find the highest speed when most of words are understandable

Comprehension Task 3 clips at 1.5x and 3 clips at 2.5x Clips from TOEFL listening test Answer 4 multiple choice questions

Subjective Preference Task 3 pairs of clips at 1.5x 3 pairs of clips at 2.5x Each pair contains the same clip compressed with 2 of the 3 TC algorithms Indicate preference on 3-point scale

Sustainable Speed Task 3 clips each 8 minute along Clips from a CD audio book Find the maximum comfortable speed Write a 4-5 sentence summary at the end

User Study Results

Highest Intelligible Speed Task PR-Lin is significantly better than Adapt (p<.01)

Comprehension Task Adapt is better than PR-Lin (p=.083) at 2.5x

Preference Task at 1.5x Slight preference for PR-Lin (p=.093) 1.5x Prefer Former Prefer None Prefer Latter Linear vs. PR-Lin 6513 PR-Lin vs. Adapt 1356 Adapt vs. Linear 888

Preference Task at 2.5x PR-Lin and Adapt do significantly better than Linear 2.5x Prefer Former Prefer None Prefer Latter Linear vs. PR-Lin 2814 PR-Lin vs. Adapt 4911 Adapt vs. Linear 2130

Sustainable Speed Task

Conclusions

Previous Works Mach1 (Covell et. al. ICASSP 98)  Comprehension and preference tasks  Comparing Linear and Mach1 (Adapt) at x  Comprehension scores 17% better w/ Mach1  95% prefers Mach1 to Linear  No data on < 2.0x Other works (Harrigan, Omoigui, Li, Foulke)  x is the sustainable listening speed

Conclusions Trade off in TC algorithms is task-related  Listening: Linear TC is sufficient  Fast Forwarding: Non-linear TC is more suitable Adapt TC is close to the way people talk fast  Limit lies in the human-listening and comprehension