7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness.

Slides:

Advertisements

Similar presentations

Acousteen, Herman Steeneken 1 Past, Present and Future of STI Herman J. M. Steeneken (

Advertisements

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Acousteen Herman J.M. Steeneken Subjective Intelligibility Assessment Dr. Herman J.M. Steeneken.

CMP206 – Introduction to Data Communication & Networks Lecture 3 – Bandwidth.

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Advanced Speech Enhancement in Noisy Environments

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.

CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.

Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,

Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,

Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.

Introduction to Image Quality Assessment

1 E-Model & MOS Speaker: Cheng-lin Tsai Adviser: Quincy Wu Date:2009/07/02.

Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.

Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners L.M. Litvak, A.J. Spahr, A.A. Saoji,

1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.

Sound source segregation (determination)

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Introduction to Spectral Estimation

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

TTS Evaluation Julia Hirschberg CS TTS Evaluation Intelligibility Tests Mean Opinion Scores Preference Tests 9/7/20152 Speech and Language Processing.

Topics covered in this chapter

Architectural Acoustics II Indoor Acoustical Phenomena Prof S K Tang.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

1 Auditory, tactile, and vestibular sensory systems n Perceptually relevant characteristics of sound n The receptor system: The ear n Basic sensory characteristics.

Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.

Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.

SPEECH CODING Maryam Zebarjad Alessandro Chiumento.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

Basics of Neural Networks Neural Network Topologies.

From last time …. ASR System Architecture Pronunciation Lexicon Signal Processing Probability Estimator Decoder Recognized Words “zero” “three” “two”

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.

P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,

0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Spectrum.

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Xianggang Putonghua Yanxishe Primary School of Science and Creativity

1 6-Speech Quality Assessment Quality Levels IntelligibilityNaturalness Subjective Tests Objective Tests.

Predicting the Intelligibility of Cochlear-implant Vocoded Speech from Objective Quality Measure(1) Department of Electrical Engineering, The University.

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding

The Story of Wavelets Theory and Engineering Applications

Figures for Chapter 8 Candidacy Dillon (2001) Hearing Aids.

Spread Spectrum and Image Adaptive Watermarking A Compare/Contrast summary of: “Secure Spread Spectrum Watermarking for Multimedia” [Cox ‘97] and “Image-Adaptive.

1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.

Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.

Speech and Singing Voice Enhancement via DNN

PATTERN COMPARISON TECHNIQUES

Fletcher’s band-widening experiment (1940)

Short a sound listening exercises

Copyright © American Speech-Language-Hearing Association

Spread Spectrum Audio Steganography using Sub-band Phase Shifting

Linear Prediction.

A Smartphone App-Based

1-channel 2-channel 4-channel 8-channel 16-channel Original

Ningping Fan, Radu Balan, Justinian Rosca

Homomorphic Speech Processing

Speech Communications

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness

Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps)

Test Types IntelligibilityNaturalness Subjective DRT, MRT MOS, DAM ObjectiveNone. Future ASR systems AI, Global SNR, Seg. SNR, FW-Seg. SNR, Itakura Measure, WSSM

First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) –Selecting between two CVC by different first C –First C should have specific properties –Ex. hop - fop And than - dan Modified Rhyme Test (MRT) –Selecting between CVC’s by different first C –Ex. Cat, bat, rat, mat, fat, sat

First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once

Second Class Subjective Naturalness tests Mean Opinion Score (MOS) –MOS is very applicable and credible –In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) –This test is very complex

Mean Opinion Score (MOS) Scores for MOS are like this ScoreSpeech Quality Not Acceptable Weak Medium Good Excellent

Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: –Signal Quality –Background Quality –Total Quality

Objective Tests These tests can not be used for intelligibility. Because system couldn’t recognize speech intelligibility Objective tests can only be used for speech Naturalness

Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) –Global (Classic) SNR –Segmental SNR –Frequency Weighted Segmental SNR

Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener Bands HZ

Articulation index (Cont’d) Perceptible by user signal : –1- Upper than human hearing threshold –2- Under than human pain threshold –3- Upper than Masking Noise level –In each case one of the states 1 or 3 is prevail

Articulation index (Cont’d) In AI SNR measured isolated in each band

Signal To Noise Ratio(SNR)

Segmental SNR j’th Frame SNR M : Number of frames

Frequency Weighted Segmental SNR K : Number of frequency bands M : Number of frames

Deller Formula

Other Formulas:

Itakura Measure Is the envelope spectrum Use from All-Pole (AR) Model

Itakura Measure (Cont’d) This is based on the spectrum difference between main signal and assessment signal Autoregressive Coefficients Reflection Coefficients Autocorrelation Coefficients

Itakura Measure (Cont’d) m :Index of frame l : Index of coefficients

Itakura Measure (Cont’d) Is the l’th parameter of the frame that conduces m’th sample

Weighted Spectral Slope Measure (WSSM) Is STFT of k’th band of the frame that conduces m’th sample