7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness.

Slides:



Advertisements
Similar presentations
Acousteen, Herman Steeneken 1 Past, Present and Future of STI Herman J. M. Steeneken (
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Acousteen Herman J.M. Steeneken Subjective Intelligibility Assessment Dr. Herman J.M. Steeneken.
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Advanced Speech Enhancement in Noisy Environments
Multiple Access Techniques for wireless communication
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
1/15 KLKSK Pertemuan III Analog & Digital Data Shannon Theorem xDSL.
Continuous Time Signals A signal represents the evolution of a physical quantity in time. Example: the electric signal out of a microphone. At every time.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Spectrum Sensing Based on Cyclostationarity In the name of Allah Spectrum Sensing Based on Cyclostationarity Presented by: Eniseh Berenjkoub Summer 2009.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Introduction to Image Quality Assessment
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Introduction to Spectral Estimation
Noise and SNR. Noise unwanted signals inserted between transmitter and receiver is the major limiting factor in communications system performance 2.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
EBB Chapter 2 SIGNALS AND SPECTRA Chapter Objectives: Basic signal properties (DC, RMS, dBm, and power); Fourier transform and spectra; Linear systems.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
POWER CONTROL IN COGNITIVE RADIO SYSTEMS BASED ON SPECTRUM SENSING SIDE INFORMATION Karama Hamdi, Wei Zhang, and Khaled Ben Letaief The Hong Kong University.
Architectural Acoustics II Indoor Acoustical Phenomena Prof S K Tang.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
From Auditory Masking to Supervised Separation: A Tale of Improving Intelligibility of Noisy Speech for Hearing- impaired Listeners DeLiang Wang Perception.
1 Auditory, tactile, and vestibular sensory systems n Perceptually relevant characteristics of sound n The receptor system: The ear n Basic sensory characteristics.
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
From last time …. ASR System Architecture Pronunciation Lexicon Signal Processing Probability Estimator Decoder Recognized Words “zero” “three” “two”
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness.
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Spectrum.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
1 6-Speech Quality Assessment Quality Levels IntelligibilityNaturalness Subjective Tests Objective Tests.
Predicting the Intelligibility of Cochlear-implant Vocoded Speech from Objective Quality Measure(1) Department of Electrical Engineering, The University.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
Figures for Chapter 8 Candidacy Dillon (2001) Hearing Aids.
Sound Chapter Properties of Sound  Sound waves are caused by vibrations and carry energy through a medium.  The speed of sound depends on the.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Estimation of Doppler Spectrum Parameters Comparison between FFT-based processing and Adaptive Filtering Processing J. Figueras i Ventura 1, M. Pinsky.
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
The cat.
Speech and Singing Voice Enhancement via DNN
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
PATTERN COMPARISON TECHNIQUES
Copyright © American Speech-Language-Hearing Association
Spread Spectrum Audio Steganography using Sub-band Phase Shifting
Pei Qi ECE at UW-Madison
SystemView First Steps
A Review in Quality Measures for Halftoned Images
Ningping Fan, Radu Balan, Justinian Rosca
Speech Perception (acoustic cues)
Speech Communications
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness

Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps)

Test Types IntelligibilityNaturalness Subjective DRT, MRT MOS, DAM ObjectiveNone. Future ASR systems AI, Global SNR, Seg. SNR, FW-Seg. SNR, Itakura Measure, WSSM

First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) –Selecting between two CVC by different first C –First C should have specific properties –Ex. hop - fop And than - dan Modified Rhyme Test (MRT) –Selecting between CVC’s by different first C –Ex. Cat, bat, rat, mat, fat, sat

First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once

Second Class Subjective Naturalness tests Mean Opinion Score (MOS) –MOS is very applicable and credible –In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) –This test is very complex

Mean Opinion Score (MOS) Scores for MOS are like this ScoreSpeech Quality Not Acceptable Weak Medium Good Excellent

Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: –Signal Quality –Background Quality –Total Quality

Objective Tests These tests can not be used for intelligibility. Because system couldn’t recognize speech intelligibility Objective tests can only be used for speech Naturalness

Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) –Global (Classic) SNR –Segmental SNR –Frequency Weighted Segmental SNR

Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener Bands HZ

Articulation index (Cont’d) Perceptible by user signal : –1- Upper than human hearing threshold –2- Under than human pain threshold –3- Upper than Masking Noise level –In each case one of the states 1 or 3 is prevail

Articulation index (Cont’d) In AI SNR measured isolated in each band

Signal To Noise Ratio(SNR)

Segmental SNR j’th Frame SNR N : Number of frames M: Frame length Usually averaged over “good frames” “good frames”: having SNRs of higher than -10dB and Saturated at +30dB

Frequency Weighted Segmental SNR F : Number of frequency bands N : Number of frames Siemens Formula:

Frequency Weighted Segmental SNR Deller Formula

Frequency Weighted Segmental SNR Other Formulas:

The Final Formula The right formula for fw-seg SNR is thus:

The Final Formula Where –M is the number of frames –j is the frame index –k is the frequency band index –w is the weight of the kth band of the jth frame –w j,k is the weight of the kth band of the jth frame –E s,k and E e,k are the energies of the kth band of signal and noise respectively

Itakura Measure Is the envelope spectrum Use from All-Pole (AR) Model

Itakura Measure (Cont’d) This is based on the spectrum difference between main signal and assessment signal Autoregressive Coefficients Reflection Coefficients Autocorrelation Coefficients

Itakura Measure (Cont’d) m :Index of frame l : Index of coefficients

Itakura Measure (Cont’d) Is the l’th parameter of the frame that conduces m’th sample

Weighted Spectral Slope Measure (WSSM) Is STFT of k’th band of the frame that conduces m’th sample