Packet loss concealment using audio morphing

Packet loss concealment using audio morphing
STQ Workshop, Sophia-Antipolis, February 11th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM SA, Lannion, FRANCE ² France Telecom R&D, Lannion, FRANCE

Motivation In packet data networks, excess traffic leads to delays or loss in delivery of information. In voice communication, long delays are intolerable and network delay budgets have strong influence on the design of packet voice systems. To increase the tolerance of packet voice systems to lost packets some techniques have been developed. These techniques do not use the a posteriori information of the next packet that indicates and detects the lost of one or several frames. However those techniques are not adapted for long lost periods (>15ms) because of the non long-term stationnarity of speech signal. This a posteriori information is generally available because of the playout buffer management and real time network protocol. The technique proposed uses the knowledge of the frame received after the last lost one, the models of the last received frames, and a model interpolation to synthesized the missing signal.

Outline Introduction Morphing audio principle
Voiced / Unvoiced strategy Modelisation and Interpolation Blocks concatenation and smoothing Some results of concealed signal Comparisons and performances Configuration Results Conclusion

Morphing audio principle
Context of lost : Previous Frame Frame A Missing Signal Next Frame Frame B Voiced/Unvoiced strategy Pitch estimation Frame A : P0 Frame B : P1 UV V Frame B Frame A P0 , P1 P0 , P1 = P0 P0 = P1 , P1 Unvoiced signal When missing signal is defined as unvoiced, Frame A is copied to missing signal or comfort noise is generated

Modelisation and Interpolation: P0 and P1 are used to estimate the number of necessary intermediate blocks (NbBloc) and the size of these blocks (SizeBloc). We model the last pitch period vector (X0) of the Frame A (ModP0) and the first pitch period vector (X1) of the Frame B (ModP1). DCT (Dicret Cosinus Transform) is used to model X0 and X1. Resolution is 120 points at 8kHz of sample frequency. Intermediate blocks, , are used in order to transform, in a continuous way, the model vector ModP0 to the model vector ModP1 with linear interpolation of model parameters. 1 IDCT : Inverse Discrete Cosinus Transform.

Blocks concatenation and smoothing Each block is then copied in the synthesis frame. …. Smoothing Frame A Frame B Synthesis Frame Smoothing between blocks is realized according to:

Some results of concealed signal Nb sample Original frame Conceal frame Nb sample Case of voiced frames of a female speech signal (30ms of missing signal)

Some results of concealed signal Original frame Nb sample Conceal frame Nb sample Behaviour of the morphing technique during a transition frame (30ms) for male speech signal. We can notice that the concealed speech to noise transition is more voiced than original frame. In an enhanced morphing technique the voiced duration could be controlled.

Comparisons and performances
Ten subjects were participating to an informal test: they were asked to listen to coded speech signals that have been corrected by different concealment techniques Configuration Two speech coders (G.711 and G.723.1) were independently tested, The size frame is 30ms; Five concealment techniques : Previous Frame Copy: PFC, double Sided Periodic Substitution: DSPS1, ITU-T recommended technique defined for each specific coder: G.711 and G.723.1, GFEC technique2 and Audio Morphing; Two series of rate were defined: 5 % and 10 %. The losses can appear by burst, but are usually isolated ; The number of sentences was 15 (8 female and 7 male speech files) 1 : J. Tang, "Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing Speech in Packet Voice Communications," IEEE Computers and Communications, pp , 1991. 2 : B. Kövesi, D. Massaloux, "Method of Packet Errors Cancellation Suitable for any Speech and Sound Compression Scheme", ETSI STQ Workshop, February 2003, Sophia-Antipolis

Results for G.711 codec 0,00 1,00 2,00 3,00 4,00 5,00 6,00 7,00 Taux 5% Taux 10% Rate 5% Rate 10% Score (/15) PFC FECG711 DSPS GFEC MORPHING

Results for G codec Rate 5% Rate 10% Score (/15) PFC FECG723 DSPS GFEC MORPHING

Conclusion Proposed technique improves the quality of the frame correction for strong lost rate (5 % and 10 %); Morphing audio adds latency (Frame B is required), but is acceptable for application of VoIP; Another modelisation are possible and voiced condition can be controlled to improve restitution quality

Packet loss concealment using audio morphing

Similar presentations

Presentation on theme: "Packet loss concealment using audio morphing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Packet loss concealment using audio morphing

Similar presentations

Presentation on theme: "Packet loss concealment using audio morphing"— Presentation transcript:

Similar presentations

About project

Feedback