University of Plymouth United Kingdom {L.Sun; ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality.

University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality Prediction for VoIP Networks

ICC 2002, New York, USA2 Outline Introduction Simulation system Perceived speech quality analysis –Impact of loss on speech quality –Impact of talkers on speech quality Perceived speech quality prediction using Neural Network (NN) method Conclusions and future work

ICC 2002, New York, USA3 Introduction Speech quality Measurement –Subjective method (Mean Opinion Score -- MOS) –Objective methods Intrusive methods (e.g. ITU P.862 PESQ) Nonintrusive methods (e.g. E-model, NN model) Why do we need to predict speech quality? –For online monitoring VoIP call –For Quality of Service (QoS) control for VoIP applications

ICC 2002, New York, USA4 How to predict speech quality? E-model –All impairments are mapped to R-scale (R  MOS) –Principle: "Psychological factors on the psychological scale are additive" –Static and computational model. NN-model –To learn the non-linear relationships between network impairments and perceived speech quality –To adapt to dynamic IP network conditions.

ICC 2002, New York, USA5 Previous work NN databases are based on subjective test only As subjective test is time consuming, costly and stringent, available databases are limited and cannot cover all the possible scenarios Only a limited number of subjects attended MOS tests Limited number of codecs Talker dependency has not been considered.

ICC 2002, New York, USA6 Main objectives of work To undertake a fundamental investigation of the impact of packet loss on perceived speech quality using an objective measurement algorithm (e.g. PESQ) To investigate the impact of different talkers on perceived speech quality To develop a robust NN model for speech quality prediction based on PESQ.

ICC 2002, New York, USA7 Simulation system structure Reference speech encoder loss simulator decoder Degraded speech quality measure (PESQ) Measured MOS NN model p arameter extraction Predicted MOS Reference speech is from a speech database Simulated VoIP system

ICC 2002, New York, USA8 Loss Simulator Network packet loss + late arrival loss due to jitter Unconditional loss probability (ulp, or average loss rate), ulp = p / (p + 1 – q) Conditional loss probability (clp), clp = q to reflect burst loss features 2 state Gilbert Model to simulate packet loss characteristics 0 p 1 - p q 1 - q No-loss Loss 1

ICC 2002, New York, USA9 Impact of loss on speech quality How do packet loss and loss burstiness affect speech quality? How does packet size affect speech quality? How does codec affect speech quality?  Using PESQ to calculate perceived MOS score  Average over 300 different random "seeds" to reduce the impact from different loss locations

ICC 2002, New York, USA10 Bursty loss analysis (G.729)

ICC 2002, New York, USA11 Bursty loss analysis (G.723.1)

ICC 2002, New York, USA12 Bursty loss effect clp has an obvious impact on the perceived speech quality even for the same average loss rate (ulp) When burst loss increases (clp increasing), the MOS score decreases and the variation of the MOS score also increases.  Identify ulp and clp as input parameters related to loss for NN analysis

ICC 2002, New York, USA13 Impact of packet size (G.729)

ICC 2002, New York, USA14 Impact of packet size (G.723.1)

ICC 2002, New York, USA15 Impact of packet size on quality Packet size has, in general, no obvious influence on speech quality for a given loss rate. Variation in speech quality for the same network loss rate depends on packet size and codec. Variation in quality due to loss location is the main obstacle in the prediction of speech quality  To consider loss only during active talkspurt frames (not for silence frames or SID frames).

ICC 2002, New York, USA16 Impact of talker on speech quality To investigate whether difference in talker (male or female) has an effect on perceived speech quality TIMIT data set and ITU data set are used for investigation

ICC 2002, New York, USA17 Talker Dependency For 3 male and 3 female samples

ICC 2002, New York, USA18 Talker Dependency (cont.) For 6 mixed male and female samples

ICC 2002, New York, USA19 Impact of talker on MOS Impact of different talkers on perceived speech quality appears to depend mainly on the gender of the talker (male or female). The quality for the female talker tends to be worse than that of the male talker for the same network impairments.  Identify gender (male and female) as one of the input parameters for NN analysis.

ICC 2002, New York, USA20 Quality prediction based on NN Developed a neural network model (using Stuttgart Neural Network Simulator). Identified four variables as inputs to NN –Codec type (G.729, G.723.1 and AMR) –Gender (male and female) –Unconditional loss probability  ulp (VAD) –Conditional loss probability  clp(VAD) One output (MOS)

ICC 2002, New York, USA21 NN structure (for a 4-5-1 net) 1 2 3 4 1 2 3 4 5 1 MOS Gender Codec type ulp(VAD) clp(VAD) a three-layer, feed-forward, neural network architecture standard Backpropagation learning algorithm

ICC 2002, New York, USA22 NN database generation Codec: G.729, G.723.1 (6.3Kb/s), AMR (12.2Kb/s) Gender: Male and female ulp : 0, 10, 20, 30 and 40% clp : 10, 50 and 90% Packet size: 1 to 5  A total of 362 samples (patterns) were generated based on PESQ. 70% were chosen as the training set and 30% as the test dataset.

ICC 2002, New York, USA23 NN training process Simulated VoIP system Reference speech Degraded speech  Network, Codec & Speech parameters Predicted MOS Q uality measure (PESQ) Measured MOS Backprop + -

ICC 2002, New York, USA24 Predicted MOS vs Measured MOS Train:  = 0.967, r = 0.12 Test:  = 0.952, r = 0.15

ICC 2002, New York, USA25 Validation of the NN model Generated a validation dataset from other talkers and different network loss conditions (total 210 samples) Obtained  = 0.946, r = 0.19 for the validation dataset using a trained 4-5-1 neural network.  This suggested that the neural network model works well for speech quality prediction in general.

ICC 2002, New York, USA26 Conclusions Investigated the impact of packet loss, codec and talker on perceived speech quality based on PESQ The loss pattern, loss burstiness and the gender of the talker have an impact on speech quality. Packet size has, in general, no obvious influence on speech quality, but the deviation in speech quality depends on packet size and codec. Based on codec, bursty loss rate and gender of the talker, a NN model was developed successfully for speech quality prediction.

ICC 2002, New York, USA27 Future work Extended to conversational speech quality prediction to cater for the impact from delay. Use real VoIP trace data instead of generated data from Gilbert loss model. Use more robust neural networks. Application to QoS Control in VoIP systems.

University of Plymouth United Kingdom {L.Sun; ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality.

Similar presentations

Presentation on theme: "University of Plymouth United Kingdom {L.Sun; ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Plymouth United Kingdom {L.Sun; ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality.

Similar presentations

Presentation on theme: "University of Plymouth United Kingdom {L.Sun; ICC 2002, New York, USA1 Lingfen Sun Emmanuel Ifeachor Perceived Speech Quality."— Presentation transcript:

Similar presentations

About project

Feedback