A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based speech coders is offset by packet losses. Concealment must be applied to the missing packets, which reduces quality for two main reasons : not all missing packets can be concealed, especially when concealment uses only the past signal onsets, transients the concealment error can propagate over several frames, even frames received correctly culprit : desynchronisation of the excitation content (LTP) We propose to compare two approaches for alleviating this problem : Adding redundancy to increase the robustness of a baseline predictive encoder (G.729) Using a speech coding model which does not have interframe dependencies ( iLBC ) To be compared, solutions should have comparable bit rates 2. ADDED REDUNDANCY versus FRAME INDEPENDENCE 6. LISTENING TEST RESULTS 7. CONCLUSIONS R (kbps) D (ms) PROPOSED APPROACHES FOR ADDING REDUNDANCY 4. EFFECT ON ERROR PROPAGATION5. SUBJECTIVE EXPERIMENT A formal listening test was conducted to compare the different solutions for increasing the robustness in case of missing packets. The main features of this test are : clean speech, narrowband, IRS filtered 4 male, 4 female speakers 32 naive listeners listening using binaural headphones following guidelines of ITU-T Rec. P conditions in total, including MNRU and other reference conditions 0 – 20% random packet losses, synchronized between iLBC and G ms packet 3rd Packet lost G.729 synthesis G error at decoder G error at decoder G error at decoder G error at decoder G error at decoder iLBC error at decoder (compared to iLBC synthesis without frame loss) 20 ms frame encoded in « absolute » G : Consider only G.729 at 8 kbps (baseline predictive coder) and add redundancy to obtain bit rates similar to iLBC at 15.2 kbps. 20 ms packet (two G.729 frames) P k-1 P k P k+1 F 2k-2 F 2k-1 F 2k F 2k+1 F 2k+2 F 2k+3 G.729 frame packet G G G iLBC G G (Point size proportional to quality at 10 % FER) G : Content of each 20-ms packet : Bit rate and algorithmic delay F 2k-2 F 2k-1 F 2k …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k+2 F 2k+3 F 2k+4 G / G : F 2k-2 F 2k-1 F’ 2k-3 …… P k-1 PkPk P k+1 F 2k F 2k+1 F’ 2k-1 F 2k+2 F 2k+3 F’ 2k+1 F’ 2k-4 F’ 2k-2 F’ 2k F 2k-2 F 2k-1 F 2k-3 …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k-1 F 2k+2 F 2k+3 F 2k+1 F 2k-4 F 2k-2 F 2k G : F 2k-2 F 2k-1 …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k+2 F 2k+3 In G and G.729-3, F’ k denotes F k but without the 18 LSF bits and pitch parity bit (hence, frame F’ k has 19 bits less than frame F k ). The missing ISFs have to be extrapolated at the decoder when a missing frame occurs. G and G differ at the decoder : G : Decode packet P k when it arrives (do not wait for packet P k+1 ). If packet P k is missing, then apply concealment followed by resynchronisation of filter memories using F’ 2k and F’ 2k+1 that are received when packet P k+1 arrives. Then, start decoding packet P k+1. G : Decode packet P k only after packet P k+1 has arrived (additional delay of 20 ms). If packet P k was missing, then just use F’ 2k and F’ 2k+1 that are added as redundancy in packet P k+1. No concealment is applied in this case. G : At the decoder, wait for packet P k+1 before decoding packet P k. G : Every missing 20-ms packet implies that two consecutive 10-ms frames of G.729 are lost. Concealment and propagation introduce large artefacts. G : Every missing 20-ms packet reduces to a single 10-ms frame loss in G.729. Concealment is more optimal, and propagation is reduced. G : Concealment followed by approximate resynchronisation of filter memories. G : Limited concealment (there would be no concealment if F’ was equal to F). G : No effective loss in all single packet losses. ILBC : Concealment, but limited error propagation (only due to post-filtering at decoder to smooth frame transitions). From the test results, we can make the following conclusions : In clean channel conditions, iLBC at 15.2 kbps has equivalent quality to G.729 at 8 kbps (i.e. a much higher bit rate is necessary in a « frame- independent » coder to increase both the quality in clean channel and frame loss conditions). extreme example = G.711 at 64 kbps The best quality in frame loss conditions was achieved by using a low-rate CELP coder with added redundancy and delay (G.729-4), with a total bit rate close to iLBC (16 kbps compared to 15.2 kbps) The approaches studied to increase robustness represent only a subset of all possible combinations. Only solutions based on a standard CELP-coder (G.729) were considered, with some of them not optimal (ex.: G.729-2). Improved results could be expected by designing a solution without the constraint of using standard core codecs. The G.729 RTP payload can already support solutions G and G Roch Lefebvre,Philippe Gournay University of Sherbrooke Sherbrooke, Quebec, Canada Redwan Salami VoiceAge Corp. Montreal, Quebec, Canada % FER Quality (robustness to frame loss) 0 Codec_P Codec_FI or Codec_P + R Codec_P + R + Delay Codec_P R Redundancy Codec_FI Total payload bit rate Approach 1 : Use a lower bit rate, predictive (CELP) coder, and add channel redundancy to improve robustness to missing frames. Approach 2 : Use a higher bit rate, non-predictive or « frame- independent » codec, to improve robustness to missing frames in the core codec itself. Anticipated gains in quality 10 ms frame Long-term prediction Long-term prediction Past excitation Codec_P : G.729 (CELP-based) Codec_FI : iLBC (Freame-independent)