Minjie Xie, Dave Lindbergh, and Peter Chu ITU-T G.722.1 ANNEX C A NEW LOW-COMPLEXITY 14 KHZ AUDIO CODING STANDARD Minjie Xie, Dave Lindbergh, and Peter Chu ICASSP 2006
G.722.1C: First ITU-T Super-wideband Audio Coding Standard Audio bandwidth: 14 kHz Sample rate: 32 kHz Bit rate: 24, 32, and 48 kbit/s Algorithm: Transform coding (Siren14TM) Frame size: 20 ms Algorithmic delay: 40 ms Complexity: <11 WMOPS (encoder+decoder) Very high audio quality Suitable for video and teleconferencing and Internet streaming Available on royalty-free licensing terms ICASSP 2006
Overview of Main G.722.1 Mode Wideband coding standard approved by ITU-T in 1998 Provides 50-7000 Hz audio bandwidth at 24 and 32 kbit/s Based on transform coding, using a Modulated Lapped Transform (MLT) Operates on frames of 20 ms corresponding to 320 samples at a 16 kHz sampling rate A Look-ahead of 20 ms due to 50% overlap between frames Total algorithmic delay of 40 ms Very low computational complexity (about 5.3 WMOPS) ICASSP 2006
G.722.1C : Extension Mode of G.722.1 Audio signal sampled at 32 kHz Double the audio bandwidth from 7 kHz to 14 kHz Same algorithmic steps as the main mode of G.722.1 Same frame size as G.722.1 – 20 ms Total algorithmic delay of 40 ms ICASSP 2006
Block Diagram of the G.722.1C Encoder ICASSP 2006
Block Diagram of the G.722.1C Decoder ICASSP 2006
Encoder of G.722.1 Annex C Double the MLT transform length from 320 to 640 samples Double the number of frequency regions from 14 to 28 Double the Huffman coding tables for encoding quantized region power indices Double the threshold for adjusting the number of available bits from 320 to 640 ICASSP 2006
Decoder of G.722.1 Annex C Double the number of frequency regions from 14 to 28 Double the threshold for adjusting the number of available bits from 320 to 640 Extend the centroid table for reconstruction of MLT coefficients Double the IMLT transform length from 320 to 640 samples ICASSP 2006
Computational Complexity and Memory Requirements of G.722.1C Bit rate (kbit/s) Encoder (WMOPS) Decoder Enc.+Dec. 24 4.5 5.3 9.7 32 4.8 5.5 10.3 48 5.1 5.9 10.9 Memory requirements RAM (K bytes) 18 ROM (K bytes) 30 ICASSP 2006
Computational Complexity of G.722.1C versus the 3GPP Audio Codecs Bit rate (kbit/s) G.722.1C (WMOPS) eAAC+ AMR-WB+ 24 9.7 40.8 80.1 32 10.3 42.6 86.7 ICASSP 2006
Algorithmic Delay of G.722.1C versus the 3GPP Audio Codecs (ms) eAAC+ AMR-WB+ 40.0 129.9[1] 113. 8[2] Note 1: Without bit-reservoir (see 3GPP TR 26.936 V6.1.0) Note 2: ISF = 25.6 kHz (see 3GPP TR 26.936 V6.1.0) ICASSP 2006
ITU-T Subjective Characterization Tests Subjective tests performed by France Telecom according to a test plan designed by ITU-T SG12 SQEG Characterization test Phase 1 : Speech - ACR for clean speech and DCR for noisy speech Characterization test Phase 2 : Music and mixed content - MUSHRA method Reference codec : MPEG-4 AAC-LD PCEnc/DecPro Additional reference Codecs : 3GPP eAAC+ and AMR-WB+ Requirements : Not worse than the reference codec for a 99% confidence interval ICASSP 2006
ITU-T Subjective Test Results (Phase 1) (MOS) ICASSP 2006
ITU-T Subjective Test Results (Phase 1) (DMOS) ICASSP 2006
ITU-T Subjective Test Results (Phase 1) (DMOS) ICASSP 2006
ITU-T Subjective Test Results (Phase 1) (DMOS) ICASSP 2006
ITU-T Subjective Test Results (Phase 2) (MUSHRA) ICASSP 2006
ITU-T Subjective Test Results (Phase 2) (MUSHRA) ICASSP 2006
ITU-T Subjective Test Results (Phase 2) (MUSHRA) ICASSP 2006
Conclusion G.722.1C met all performance requirements Phase 1 (clean and noisy speech) - 24 kbit/s: Better than AAC-LD and Not Worse than eAAC+ - 32 kbit/s: Better than AAC-LD, Not Worse than eAAC+, and Not Worse than AMR-WB+ in most of tests - 48 kbit/s: Not Worse than AAC-LD at 48 and 64 kbit/s Phase 2 (music and mixed content) - 24 kbit/s: Better than AAC-LD - 32 kbit/s: Better than AAC-LD - 48 kbit/s: Better than AAC-LD at 48 and 64 kbit/s Executables, audio samples, and more information available at : http://www.polycom.com/Siren14 ICASSP 2006
Acknowledgment The authors would like to acknowledge Claude Lamblin, ITU-T Q.10/SG16 Rapporteur, and Catherine Quinquis, ITU-T Q.7/SG12 Rapporteur, for their great work guiding this project to a completion. In addition, the authors would like to thank the speech quality experts and staff who performed the subjective characterization tests at France Telecom. ICASSP 2006