Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.

Slides:



Advertisements
Similar presentations
Introduction to parameter optimization
Advertisements

IP Cablecom and MEDIACOM 2004 Prediction and Monitoring of Quality for VoIP services Quality for VoIP services Vincent Barriac – France Télécom R&D SG12.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Part II Sigma Freud & Descriptive Statistics
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
© 2014 wheresjenny.com ROLE PLAY STAFF IN CALL CENTERS AND TELEMARKETING FIRMS.
Measuring Perceived Quality of Speech and Video in Multimedia Conferencing Applications Anna Watson and M. Angela Sasse Dept. of CS University College.
Item Writing Techniques KNR 279. TYPES OF QUESTIONS Closed ended  Checking yes/no, multiple choice, etc.  Puts answers in categories  Easy to score.
ITU Regional Standardization Forum For Africa Dakar, Senegal, March 2015 QoS/QoE Assessment Methodologies (Subjective and Objective Evaluation Methods)
QoE Assessment in Olfactory and Haptic Media Transmission: Influence of Inter-Stream Synchronization Error Sosuke Hoshino, Yutaka Ishibashi, Norishige.
2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.
Version : 11 December 2008 Workshop on “Monitoring Quality of Service and Quality of Experience of Multimedia Services in Broadband/Internet Networks”
Christian Schmidmer, OPTICOM1 Subjective Quality Testing - Voice & Audio.
Development of protocols WP4 – T4.2 Torino, March 9 th -10 th 2006.
Basic Research Methodologies
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
From Controlled to Natural Settings
Determining the Size of
8th and 9th June 2004 Mainz, Germany Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 1 Vincent Barriac, Jean-Yves.
InterSwyft Technology presentation. Introduction InterSwyft brings secured encrypted transmission of SMS messages for internal and external devices such.
Radio Interference Calculations
1 A Scalable Execution Control Method for Context- dependent Services Wataru Uchida, Hiroyuki Kasai, Shoji Kurakake Network Laboratories, NTT DoCoMo, Inc.
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
ETSI STQ, Taiwan Workshop, February 13th, Recent improvements in transmission quality assessment : Background noise transmission Results of STF.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
ACOE2551 Microprocessors Data Converters Analog to Digital Converters (ADC) –Convert an analog quantity (voltage, current) into a digital code Digital.
Unit 1 Accuracy & Precision.  Data (Singular: datum or “a data point”): The information collected in an experiment. Can be numbers (quantitative) or.
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
GSM voice quality Final report Tunis, 15 th of July 2004 Customer: Tunisiana.
Colombia, September 2013 The importance of models and procedures for planning, monitoring and control in the provision of communications services.
Human Computer Interaction
Usability testing. Goals & questions focus on how well users perform tasks with the product. – typical users – doing typical tasks. Comparison of products.
IP Networking & MEDIACOM 2004 Workshop April 2001 Geneva Characterising End to End Quality of Service in TIPHON Systems Characterising End to End.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Kampala, Uganda, 23 June 2014 Applicability of the ITU-T E.803 Quality of service parameters for supporting service aspects Kwame Baah-Acheamfuor Chairman,
SAS Homework 4 Review Clustering and Segmentation
1 Presented by Jari Korhonen Centre for Quantifiable Quality of Service in Communication Systems (Q2S) Norwegian University of Science and Technology (NTNU)
Department of Communication and Electronic Engineering University of Plymouth, U.K. Lingfen Sun Emmanuel Ifeachor New Methods for Voice Quality Evaluation.
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
SwissQual AG – Your QoS Partner Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 1 8th and 9th June Mainz,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Robust Estimators.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)
AIMS’99 Workshop Heidelberg, May 1999 Assessing Audio Visual Quality P905 - AQUAVIT Assessment of Quality for audio-visual signals over Internet.
Feature Point Detection and Curve Approximation for Early Processing of Free-Hand Sketches Tevfik Metin Sezgin and Randall Davis MIT AI Laboratory.
PART2: VOIP AND CRITICAL PARAMETERS FOR A VOIP DEPLOYMENT Voice Performance Measurement and related technologies 1.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Sample Size Determination
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
112 Emergency Call System in Poland Caller Location Workshop on emergency number 112 Tbilisi, November 2013 Maria Skarzyńska Ministry of Administration.
By Dr Hidayathulla Shaikh. Objectives  At the end of the lecture student should be able to –  Define survey  Mention uses of survey  Discuss types.
Evaluation of Speak Project 2b Due: March 20 th, in class.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
Workshop on Standards for Safe Listening Standardized measurements of headphones/earphones Lars Birger Nielsen WP1/12 Chairman SG12 WP1:Terminals and multimedia.
Microprocessors Data Converters Analog to Digital Converters (ADC)
Data Collection Techniques
Machine Learning with Spark MLlib
Voice Performance Measurement and related technologies
Sample Size Determination
ServiceNow Assessments
– Workshop on Wideband Speech Quality in Terminals and Networks
EXCEL BOOKS 14-1 JOB EVALUATION.
Statistical Methods for Assessing Compliance – case studies Task 3.1B
Presentation transcript:

Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 8th and 9th June Mainz, Germany

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 2 Introduction The goal of the tests presented in this talk is to ensure customer acceptance of audio quality by statistically approved data. Customers rate the sum of  Echo cancellation, noise reduction, automatic gain control, … Contradicting to ancillary conditions of:  Short time (No waste of production capacities)  Low cost Only limited correlation of objective measurements and subjective sound perception. Execute subjective audio quality tests before the release for unrestricted serial production Former results often not reliable due to friendly users and too few tests to guarantee statistical approval Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 3 Presentation Outline Test Design  Laboratory or in-situ tests?  Laboratory test design  Conversational task  Statistical reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 4 Test Design Typical conversation situations for a mobile phone Single Talk Double talk Two different test subject groups Naive users Expert Users Different recommended test methods Absolute category rating Comparative category rating Degraduating category rating Threshold Method Quantal-response detectability tests Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 5 Test Design (ctd.) Naive user tests will be carried out as single talk and double talk. Naive user tests Absolute category rating of overall quality and collecting most annoying properties. Evaluation Trained user tests Comparative category rating of different parameter sets on most annoying properties (in parallel further parameter alteration) Satisfying results? Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook Unrestricted Serial production yes no

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 6 Laboratory or in-situ tests? in-situ +Nothing is more real than reality +More interesting for test persons -Large effort -Difficult controlling -Time intensive Laboratory +Good controlling +Small effort +Reproducible conditions +Easy control of environmental conditions -Some effects have to be neglected -Psychological influence of laboratory environment on test results Laboratory tests are much more cost-effective than in-situ tests. But: How close can reality be rebuilt in laboratories? There should be at least one comparison between laboratory and in-situ. Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 7 Laboratory test design Terminal A: fixed network, hand held, specified, silent office environment (e.g. according to ITU-T P.800) Reproducible playback of previously recorded environmental noises as diffuse sound field Terminal B: mobile or carkit under test Car Noise Babble Noise Silence Single and double talk tests are carried out using different noise levels Roles within the tests are interchanged Rating interview with both test subjects Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 8 Conversational Tasks Properties of short conversation test scenarios (SCTs) Typical conversation tasks  Ordering pizza  Booking a flight Conversation lasts about 2 ½ min  Extended to about 4 min by following interview SCTs are judged as natural by test subjects Greeting Formal structure caller called person Enquiry Question Precision Offer Order Information Treating of Order Discussion of open question Farewell [S. Möller, 2000] Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 9 Statistical Reliability Moments of interest are the mean and the error of the mean Error of the mean is a function of the standard deviation Worst case approximation:  Error of the mean is maximised if supreme and inferior ratings are given with relative frequency of 50%  An error of the mean accounting less than 10 % of the rating interval width is guaranteed after 30 tests 30 tests of 4 min each, resulting in an overall test duration of 2 hours Tests with 3 different background noises at 3 different levels and in silent environment can be carried out in 40 h (1 week) over 2 different networks Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 10 First Test Presentation Internal fair at the beginning of May Non representative, just “testing the test“ Background: babble noise ~70dB(A) Terminal under test:  Known to be too silent (not known by test subjects and experimenter)  Development concluded interview only for the mobile terminal user (19 subjects) Naive user tests with two questions  What is your opinion of the overall quality of the connection you have just been using?  What were the most annoying properties of the connection you have just been using? Results given as  Numbers on a scale from 0 to 120  Predefined answers without technical terms (adding new ones was possible) Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 11 Overall Quality Numbers invisible for test subjects Average overall rating: 74 ± 4  (62 ± 3)% of rating interval width Start value 60 with highest relative frequency To compare the internal scale with standard MOS ratings, a normalisation is required Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent TS Rating

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 12 Overall Quality MOS c : MOS rating intervals with scale labels in the center  Extreme value 5 rated 5 times (>25 %)  Extreme value 1 never assigned Average overall rating: 3.8 ± 0.2  (70 ± 5)% of rating interval width Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent TS Rating MOS c

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 13 Overall Quality MOS l : MOS rating intervals with scale labels at the lower end  Complete range is used  Extreme value 5 rated twice Average overall rating: 3.3 ± 0.2  (58 ± 5)% of rating interval width Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent TS Rating MOS l

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 14 Most Annoying Properties My partner‘s voice was too silent Loud noise during the call I heard my own voice as echo My partner‘s voice was reverberant My partner‘s voice sounded robotic I heard artificial sounds *My partner‘s voice sounded modulated *My partners voice was too deep I heard my partner‘s voice as echo My partner‘s voice was too loud *) Properties added during test About 50% of test subjects regarded the partner‘s voice as too silent (known before, but not by the subjects and the experimenter) 7 of 8 test subjects regarded the environmental noise as annoying property Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 15 Discussion & Outlook A short-time intensive subjective test method and a first test were presented. After ratings of 19 test subjects  the error of the mean overall quality was assessed to about 3 % of rating interval width  statistical approval of being too silent Questions and predefined answers have to be chosen very carefully Scale rating normalisation to MOS is a non trivial problem Next steps:  Comparison of laboratory and in-situ tests  Tests of terminals and car kits currently in development state. Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 16