7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests IntelligibilityNaturalness
Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps)
Test Types IntelligibilityNaturalness Subjective DRT, MRT MOS, DAM ObjectiveNone. Future ASR systems AI, Global SNR, Seg. SNR, FW-Seg. SNR, Itakura Measure, WSSM
First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) –Selecting between two CVC by different first C –First C should have specific properties –Ex. hop - fop And than - dan Modified Rhyme Test (MRT) –Selecting between CVC’s by different first C –Ex. Cat, bat, rat, mat, fat, sat
First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once
Second Class Subjective Naturalness tests Mean Opinion Score (MOS) –MOS is very applicable and credible –In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) –This test is very complex
Mean Opinion Score (MOS) Scores for MOS are like this ScoreSpeech Quality Not Acceptable Weak Medium Good Excellent
Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: –Signal Quality –Background Quality –Total Quality
Objective Tests These tests can not be used for intelligibility. Because system couldn’t recognize speech intelligibility Objective tests can only be used for speech Naturalness
Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) –Global (Classic) SNR –Segmental SNR –Frequency Weighted Segmental SNR
Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener Bands HZ
Articulation index (Cont’d) Perceptible by user signal : –1- Upper than human hearing threshold –2- Under than human pain threshold –3- Upper than Masking Noise level –In each case one of the states 1 or 3 is prevail
Articulation index (Cont’d) In AI SNR measured isolated in each band
Signal To Noise Ratio(SNR)
Segmental SNR j’th Frame SNR N : Number of frames M: Frame length Usually averaged over “good frames” “good frames”: having SNRs of higher than -10dB and Saturated at +30dB
Frequency Weighted Segmental SNR F : Number of frequency bands N : Number of frames Siemens Formula:
Frequency Weighted Segmental SNR Deller Formula
Frequency Weighted Segmental SNR Other Formulas:
The Final Formula The right formula for fw-seg SNR is thus:
The Final Formula Where –M is the number of frames –j is the frame index –k is the frequency band index –w is the weight of the kth band of the jth frame –w j,k is the weight of the kth band of the jth frame –E s,k and E e,k are the energies of the kth band of signal and noise respectively
Itakura Measure Is the envelope spectrum Use from All-Pole (AR) Model
Itakura Measure (Cont’d) This is based on the spectrum difference between main signal and assessment signal Autoregressive Coefficients Reflection Coefficients Autocorrelation Coefficients
Itakura Measure (Cont’d) m :Index of frame l : Index of coefficients
Itakura Measure (Cont’d) Is the l’th parameter of the frame that conduces m’th sample
Weighted Spectral Slope Measure (WSSM) Is STFT of k’th band of the frame that conduces m’th sample