Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.

Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 8th and 9th June 2004 - Mainz, Germany

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 2 Introduction The goal of the tests presented in this talk is to ensure customer acceptance of audio quality by statistically approved data. Customers rate the sum of  Echo cancellation, noise reduction, automatic gain control, … Contradicting to ancillary conditions of:  Short time (No waste of production capacities)  Low cost Only limited correlation of objective measurements and subjective sound perception. Execute subjective audio quality tests before the release for unrestricted serial production Former results often not reliable due to friendly users and too few tests to guarantee statistical approval Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 3 Presentation Outline Test Design  Laboratory or in-situ tests?  Laboratory test design  Conversational task  Statistical reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 4 Test Design Typical conversation situations for a mobile phone Single Talk Double talk Two different test subject groups Naive users Expert Users Different recommended test methods Absolute category rating Comparative category rating Degraduating category rating Threshold Method Quantal-response detectability tests Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 5 Test Design (ctd.) Naive user tests will be carried out as single talk and double talk. Naive user tests Absolute category rating of overall quality and collecting most annoying properties. Evaluation Trained user tests Comparative category rating of different parameter sets on most annoying properties (in parallel further parameter alteration) Satisfying results? Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook Unrestricted Serial production yes no

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 6 Laboratory or in-situ tests? in-situ +Nothing is more real than reality +More interesting for test persons -Large effort -Difficult controlling -Time intensive Laboratory +Good controlling +Small effort +Reproducible conditions +Easy control of environmental conditions -Some effects have to be neglected -Psychological influence of laboratory environment on test results Laboratory tests are much more cost-effective than in-situ tests. But: How close can reality be rebuilt in laboratories? There should be at least one comparison between laboratory and in-situ. Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 7 Laboratory test design Terminal A: fixed network, hand held, specified, silent office environment (e.g. according to ITU-T P.800) Reproducible playback of previously recorded environmental noises as diffuse sound field Terminal B: mobile or carkit under test Car Noise Babble Noise Silence Single and double talk tests are carried out using different noise levels Roles within the tests are interchanged Rating interview with both test subjects Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 8 Conversational Tasks Properties of short conversation test scenarios (SCTs) Typical conversation tasks  Ordering pizza  Booking a flight Conversation lasts about 2 ½ min  Extended to about 4 min by following interview SCTs are judged as natural by test subjects Greeting Formal structure caller called person Enquiry Question Precision Offer Order Information Treating of Order Discussion of open question Farewell [S. Möller, 2000] Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 9 Statistical Reliability Moments of interest are the mean and the error of the mean Error of the mean is a function of the standard deviation Worst case approximation:  Error of the mean is maximised if supreme and inferior ratings are given with relative frequency of 50%  An error of the mean accounting less than 10 % of the rating interval width is guaranteed after 30 tests 30 tests of 4 min each, resulting in an overall test duration of 2 hours Tests with 3 different background noises at 3 different levels and in silent environment can be carried out in 40 h (1 week) over 2 different networks Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 10 First Test Presentation Internal fair at the beginning of May Non representative, just “testing the test“ Background: babble noise ~70dB(A) Terminal under test:  Known to be too silent (not known by test subjects and experimenter)  Development concluded interview only for the mobile terminal user (19 subjects) Naive user tests with two questions  What is your opinion of the overall quality of the connection you have just been using?  What were the most annoying properties of the connection you have just been using? Results given as  Numbers on a scale from 0 to 120  Predefined answers without technical terms (adding new ones was possible) Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 11 Overall Quality Numbers invisible for test subjects Average overall rating: 74 ± 4  (62 ± 3)% of rating interval width Start value 60 with highest relative frequency To compare the internal scale with standard MOS ratings, a normalisation is required Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent 0 120 TS Rating 138 2103 395 460 5 682 781 860 967 1072 1190 1274 13103 1473 1593 1638 1760 1882 1978

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 12 Overall Quality MOS c : MOS rating intervals with scale labels in the center  Extreme value 5 rated 5 times (>25 %)  Extreme value 1 never assigned Average overall rating: 3.8 ± 0.2  (70 ± 5)% of rating interval width Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent 0 120 TS Rating MOS c 1382 21035 3955 4603 5 3 6824 7814 8603 9673 10724 11905 12744 131035 14734 15935 16382 17603 18824 19784 12345

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 13 Overall Quality MOS l : MOS rating intervals with scale labels at the lower end  Complete range is used  Extreme value 5 rated twice Average overall rating: 3.3 ± 0.2  (58 ± 5)% of rating interval width Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook BadPoorFairGood Excellent 0 120 TS Rating MOS l 1381 21035 3954 4603 5 3 6824 7814 8603 9673 10723 11904 12743 131035 14733 15934 16381 17603 18824 19783 12345

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 14 Most Annoying Properties My partner‘s voice was too silent Loud noise during the call I heard my own voice as echo My partner‘s voice was reverberant My partner‘s voice sounded robotic I heard artificial sounds *My partner‘s voice sounded modulated *My partners voice was too deep I heard my partner‘s voice as echo My partner‘s voice was too loud *) Properties added during test About 50% of test subjects regarded the partner‘s voice as too silent (known before, but not by the subjects and the experimenter) 7 of 8 test subjects regarded the environmental noise as annoying property Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook 1 1 1 1 1 1 8 9

© Siemens, 2004Subjective Audio Quality Assessment, June 2004Page 15 Discussion & Outlook A short-time intensive subjective test method and a first test were presented. After ratings of 19 test subjects  the error of the mean overall quality was assessed to about 3 % of rating interval width  statistical approval of being too silent Questions and predefined answers have to be chosen very carefully Scale rating normalisation to MOS is a non trivial problem Next steps:  Comparison of laboratory and in-situ tests  Tests of terminals and car kits currently in development state. Introduction Presentation Outline Test Design  Laboratory or in- situ tests?  Laboratory test design  Conversational task  Statistical Reliability First Test Presentation  Overall Quality  Most Annoying Properties Discussion & Outlook

Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.

Similar presentations

Presentation on theme: "Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.

Similar presentations

Presentation on theme: "Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals."— Presentation transcript:

Similar presentations

About project

Feedback