Impact of Packet Loss Location on Perceived Speech Quality Lingfen Sun Graham Wade, Benn Lines Emmanuel Ifeachor University of Plymouth, U.K. {L.F.Sun@jack.see.plym.ac.uk} {j.wade,B.Lines,E.Ifeachor@plym.ac.uk} IPTEL'2001, New York, USA
Outline Introduction Codec's internal concealment and convergence time Perceptual speech quality measurement Simulation system Loss location with perceived quality Loss location with convergence time Conclusions and future work IPTEL'2001, New York, USA
Introduction End-to-end speech transmission quality SCN IP Network Gateway End-to-end speech transmission quality IP network performance (e.g. packet loss and jitter) Gateway/terminal (codec + loss/jitter compensation) Impact of packet loss on perceived speech quality Loss pattern (e.g. burst/random) Loss location (codec's concealment) IPTEL'2001, New York, USA
Introduction (cont.) Previous research on loss location Questions: Concealment performance is speech content related (e.g. voiced/unvoiced) Analysis based on MSE or SNR for limited codec Perceptual objective methods only to assess overall quality under stochastic loss simulations Questions: How does a packet loss location affect perceived speech quality ? How does a packet loss location affect codec's convergence time (for loss constraint)? IPTEL'2001, New York, USA
Codec's internal concealment What is codec's concealment? When a loss occurs, the decoder interpolates the parameters for the lost frame from parameters of previous frames. Which codec has concealment algorithm? G.729/G.723.1/AMR (main VoIP codecs) CELP analysis-by-synthesis What are the limitations of concealment algorithms? During unvoiced(u) or voiced(v) During u/v IPTEL'2001, New York, USA
Codec's convergence time What is convergence time? The time taken by decoder to resynchronize its state with encoder after a loss occurs. It is also called resynchronization time. For set up loss constraint distance between two consecutive losses for new packet loss metrics What is the relationship between convergence time with loss location, codec type and packet size? IPTEL'2001, New York, USA
Perceptual quality measurement Reference signal System/network under test Objective perceptual quality test Objective MOS Degraded signal Transform the signal into the psychophysical representation approximating human perception Calculating their perceptual difference Mapping to objective MOS (Mean Opinion Score) Algorithms: PSQM/PSQM+/MNB/EMBSD/PESQ IPTEL'2001, New York, USA
Simulation System encoder decoder convengence time analysis loss Reference speech Degraded speech without loss Bitstream encoder decoder convengence time analysis Degraded speech with loss loss simulation decoder perceptual quality measure Reference speech Perceptual speech quality analysis with loss location Convergence time analysis with loss location IPTEL'2001, New York, USA
Speech test sentence Speech test sentence is about 6 seconds. First talkspurt (about 1.34 second, above waveform) is used for loss location analysis. Four voiced segments, V(1) to V(4), which can be decided by pitch delay in G.729 codec IPTEL'2001, New York, USA
Pitch delay from G.729 codec V(1) V(2) V(3) V(4) IPTEL'2001, New York, USA
Loss location with perceived quality Each time only one packet loss is created Loss position moves from left to right one frame by one frame Overall perceptual quality is measured from PSQM/PSQM+, MNB and EMBSD Packet size: 1 to 4 frames/packet Codec: G.729/G.723.1/AMR How does a loss location affect perceived speech quality ? IPTEL'2001, New York, USA
Loss position with quality (1) reference speech PSQM+ degraded speech PSQM IPTEL'2001, New York, USA
Loss position with quality (2) reference speech PSQM+ degraded speech PSQM IPTEL'2001, New York, USA
Loss position with quality (3) reference speech PSQM+ degraded speech PSQM IPTEL'2001, New York, USA
Loss position with quality (4) reference speech Loss position degraded speech PSQM+ PSQM IPTEL'2001, New York, USA
Overall PSQM+ vs loss location (G.729) IPTEL'2001, New York, USA
Overall MNB vs loss location (G.729) IPTEL'2001, New York, USA
Overall EMBSD vs loss location (G.729) IPTEL'2001, New York, USA
Overall PSQM+ vs loss location (G.723.1) IPTEL'2001, New York, USA
Loss location with perceived quality Loss location affects perceived quality. The loss at unvoiced speech segment has no obvious impact on perceived quality. The loss at the beginning of the voiced segment has the most severe impact on perceived quality. PSQM+ yields the most detailed result comparing to MNB/EMBSD IPTEL'2001, New York, USA
Convergence time based on MSE IPTEL'2001, New York, USA
Convergence time based on PSQM+ IPTEL'2001, New York, USA
Convergence time based on PSQM+ IPTEL'2001, New York, USA
Loss location with convergence time Convergence time is almost the same for different packet size Convergence time for a loss at unvoiced segments appears stable Convergence time shows a good linear relationship for loss at the voiced segments maximum at the beginning linear descending Up bound to the end of voiced segments IPTEL'2001, New York, USA
Conclusions and future work Investigated the impact of loss locations on perceived speech quality Investigated the impact of loss locations on convergence time The results will be helpful to develop a perceptually relevant packet loss metric. Future work will focus on more extensive analysis of the impact of packet loss on speech content IPTEL'2001, New York, USA