Download presentation
Published byJerome Parks Modified over 9 years ago
1
EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec) Name: Yashas Prakash Student ID : Instructor: Dr. K. R. Rao Date:
2
List of acronyms AAC - Advanced audio coding
ATSC - Advanced television systems committee AES Audio Engineering Society EBU - European broadcasting union FLVQ - Fast lattice vector quantization HE-AAC High efficiency advanced audio coding HRQ - Higher rate lattice vector quantization IMDCT Inverse modified discrete cosine transform ISO International organization for standardization ITU International telecommunication union JAES Journal of the Audio Engineering Society LC Low complexity LRQ - Lower rate lattice vector quantization LFE Low frequencies enhancement LTP Long term prediction MDCT Modified discrete cosine transform MPEG Moving picture experts group SBR - Spectral band replication SMR - Symbolic music representation SRS - Sample rate scalable TDA - Time domain aliased WMOPS - Weighted millions operations per second
3
Introduction to codecs
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. It can be thought of as a compressor/de-compressor or encoder/decoder Codec programs are required for the media player to play audio/video files. A codec encodes a data stream or signal for transmission, storage or encryption and decodes it for playback or editing. Codecs are used in videoconferencing, streaming media and video editing applications.
4
Introduction to G.719 codec [1]
G.719 is an ITU-T standard audio codec providing high quality, moderate bit rate (32 to 128 kbit/s) wideband (20 Hz - 20 kHz audio bandwidth, 48 kHz audio sample rate) audio coding at low computational load [1]. It was produced through a collaboration between Polycom and Ericsson. G.719 incorporates elements of Polycom's Siren22 codec (22 kHz) and Ericsson codec technology, as well as Polycom's Siren7 and Siren14 codecs (G and G Annex C), which have been used in videoconferencing systems for many years [1].
5
Advantages and Applications of G.719 [3]
The algorithm is designed to provide 20 Hz - 20 kHz audio bandwidth using a 48kHz sample rate, operating at kbps [3]. This codec features very high audio quality and low computational complexity and is suitable for use in applications such as videoconferencing, teleconferencing, and streaming audio over the Internet [3].
6
OVERVIEW OF THE G.719 CODEC [1]
The G.719 codec is a low-complexity transform-based audio codec and can provide an audio bandwidth of 20 Hz to 20 kHz at kbps. The codec features very high audio quality and extremely low computational complexity compared to other state-of-the-art audio coding algorithms. G.719 is optimized for both speech and music. It is based on transform coding with adaptive time-resolution, adaptive bit-allocation and low complexity lattice vector quantization [1]. The computational complexity is quite low (18 floating-point MIPS) for an efficient high-quality compressor [1].
7
G.719 Contd.. The codec operates on 20 ms frames, and the algorithmic delay end-to-end is 40 ms [2]. The encoder input and decoder output are sampled at 48 kHz [2].
8
Block diagram of G.719 encoder
Block diagram of the G.719 encoder [1].
9
ADAPTIVE TIME-FREQUENCY-TRANSFORM
The adaptive time-frequency transform is based on the detection of a transient sounds [3]. In the case of transient sounds, the time-frequency transform will increase its time resolution and allows a better representation of the rapid changes in the input signal characteristics [3].
10
G.719 Decoder Block diagram of the G.719 decoder [1].
11
Complexity in G.719 Complexity is a paramount parameter for a codec. Complex codecs require more powerful and more expensive digital signal processors (DSPs) to run on [1]. This increases the product cost and power consumption, which limits the codec usability [1]. The fixed-point C-code implementation of G.719, which is an integral part of the recommendation by ITU-T, is based on a set of instructions that mimics a generic DSP instruction set [1].
12
An overview of AAC codec [9]
Advanced audio coding(AAC) scheme was a joint development by Dolby, Fraunhoffer, AT&T, Sony and Nokia [9]. It is a digital audio compression scheme for medium to high bit rates which is not backward compatible with moving picture experts group (MPEG) audio standards [9]. AAC is a second generation coding scheme which is used for stereo and multichannel signals. When compared to the perceptual coders, AAC provides more flexibility and uses more coding tools [12].
13
AAC codec contd., The AAC encoding follows a modular approach and the standard defines four profiles which can be chosen based on factors like complexity of bitstream to be encoded, desired performance and output [9]. Low complexity (LC) Main profile (MAIN) Sample-rate scalable (SRS) Long term prediction (LTP)
14
An overview of the HE-AAC codec [9]
High efficiency advanced audio codec is a lossy data compression scheme. It is an extension of low complexity AAC optimized for low bit rate operations such as streaming audio. HEAAC uses spectral band replication (SBR) technology to enhance the compression efficiency in frequency domain. Scientific testing by the European Broadcasting Union has indicated that HE-AAC at 48 kbit/s was ranked as "Excellent" quality using the MUSHRA scale [9]. Testing indicates that material decoded from 64 kbit/s HE-AAC does not yet have similar audio quality to material decoded from MP3 at 128 kbit/s using high quality encoders.
15
Block diagram of the SBR encoder [15]
16
Block diagram of the SBR decoder [15]
17
Subjective performance of G.719 [1]
18
Explanation for the subjective performance of G.719 codec
DMOS: Degradation mean opinion score:- It is defined as user’s view of the quality of the network. It is a subjective measurement where listener would sit in a quiet room and score call quality as they are perceived. Requirements : The talker should be seated in a quite room and the reverberation time is less than 500ms, Room noise level should be below 30dba. DMOS Ratings: 5=excellent, 4=good, 3=fair, 2=poor, 1=annoying Experiment 1: speech Experiment 2: mixed content and music(speech music and noise) The reference test vector used in these experiments are of MPEG audio format. Studying the above graphs: In experiment 1 the G.719 codec performed better at all bit rates In experiment 2 the G.719 codec better than the reference codec at lowest bit-rate and is almost the same as the reference for all other bit rates. An additional subjective listening test for the G.719 codec was conducted later to evaluate the quality of the codec at rates higher than those described in the ITU-T test plan. Because the quality expectation of the codec at these high rates is high, a pre-selection of critical items, for which the quality at the lower bit rate range was most degraded, was conducted prior to testing. The test results are shown in Figure 7. It has been proven that transparency was reached for critical material at 128 kbps.
19
Framework of G.719 audio codec
The G.719 framework is defined by the transformation of time domain signals into frequency domain spectra. The transform is a modulated lapped transform(MLT) performed differently on the mode selection based on the transient detection. The MLT consists of windowing followed by modified DCT. The transient mode consists of further time segmentation into four sub-frames to improve the time resolution. The transient are detected from time-domain signal in order to select a fine time resolution for transients as well as for stationary signal. The switching between the stationary and transient is instantaneous and does not require the usage of transient window The MLT is applied to block of two consecutive frame which is explained in this slide. The signal is sampled at 48khz with a window function, then the reference signal should be 25khz, which satisfies the nyquist criterion, this reference signal is an overlapping signal. Due to the large frequency spread of the rectangular window the freq analysis can be contaminated by aliasing. In order to reduce the frequency spread and suppress the aliasing effect windows with sharp discontinuities are used.
20
Frame buffering and windowing with overlap [16]
21
Explanation for windowing of sub-frames
In transient mode of G.719 the time aliased signal block is reversed in time and divided into four sub-frames The reversion recreates the temporal coherence of the input signal that was destroyed by time domain aliasing. The first and the last sub-frames are windowed by half sine windows with a fourth of zero padding while second and third sub-frames are windowed with ordinary sine window. The overlap between windowed sub-frames is 50% and each segment is MDCT transformed. The transform lengths are equal in the stationary and transient mode of G.719.
22
Windowing of sub-frames in transient mode [16]
23
Steps for implementation
Use a C-compiler such as DevC++ to compile the code. Any C-compiler can be used to generate the executable files. The encoder code is executed to get encoder.exe file which is used to for encoding the input test_vectors of 32,48 and 64kbps. The decoder code is executed to get decoder.exe file which is used to decode the the encoded test_vectors which are of 32,48 and 64kbps respectively. The encoded and the decoded files are compared with each other in the console to check if decoded file and the original test_vector was the same.
24
Console commands for execution
The console commands to encode a test_vector at 32kbps is as follows:- encoder.exe –r –i *input file path\test_vector.raw –o *output file path\test_32000_en.bs The console commands for the decoder at the same bit rate is as follows:- Note: The input file here is the encoded file at bps - decoder.exe –r –i *path of the input file\test_32000_en.bs –o *specific path of the output file\test_32000_dec.raw Note: It is advisable to keep the encoded and decoded files in the same root folder as it would it be very helpful to compare the files the sound frames are encoded and decoded. Type console command :- comp test_32000_dec.raw test_vector.raw This command gives us the validation that the decoded file is infact same as the test_vector.raw through which the file was encoded. The screen shots of the above commands implemented in console is shown.
25
Instructions to implement the encoder and decoder
26
Implemented encoder
27
Implementation of decoder
28
Comparison of decoded sequence with default test vector
29
References [1] M. Xie, P. Chu, A. Taleb and M. Briand, " A new low-complexity full band (20kHz) audio coding standard for high-quality conversational applications ", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp , Oct [2] A. Taleb and S. Karapetkov, " The first ITU-T standard for high-quality conversational fullband audio coding ", IEEE communications magazine, vol.47, pp , Oct [3] J. Wang, B. Chen, H. He, S. Zhao and J. Kuang, " An adaptive window switching method for ITU-T G.719 transient coding in TDA domain", IEEE International Conference on Wireless, Mobile and Multimedia Networks, pp , Jan [4] J. Wang, N. ning, X. ji and J. kuang, " Norm adjustment with segmental weighted SMR for ITU-T G.719 audio codec ", IEEE International Conference on Multimedia and Signal Processing, vol.2, pp , May
30
References [5] K. Brandenburg and M. Bosi, “ Overview of MPEG audio: current and future standards for low-bit-rate audio coding ” JAES, vol.45, pp.4-21, Jan/Feb [6] A/52 B ATSC Digital Audio Compression Standard: [7] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its application in broadcasting ”, International broadcasting convention, 2003. [8] M. Dietz and S. Meltzer, “CT-AACPlus – a state of the art audio coding scheme”, Coding Tecnologies, EBU Technical review, July [9] ISO/IEC IS , “ Information technology – Generic coding of moving pictures and associated audio information Part 7: advanced audio coding (AAC) ”, 1997.
31
References [10] M. Bosi and R. E. Goldberg, “ Introduction to digital audio coding standards ”, Norwell, MA, Kluwer, 2003. [11] H. S. Malvar, “ Signal processing with lapped transforms ”, Artech House, Norwood, MA, 1992. [12] D. Meares, K. Watanabe and E. Scheirer, “ Report on the MPEG-2 AAC stereo verification tests ”, ISO/IEC JTC1/SC29/WG11, Feb [13] Super (c) v.2012.build.50: A simplified universal player encoder and renderer, A graphic user interface to FFmpeg, Mencoder, Mplayer, x264, Musepack, Shorten audio, True audio, Wavpack, Libavcodec library and Theora/vorbis real producers plugin: [14] T. Ogunfunmi and M. Narasimha, “ Principles of speech coding ”, Boca Raton, FL: CRC Press, 2010. [15] P. Ekstrand, " Bandwidth extension of audio signals by spectral band replication ", IEEE Workshop on model based processing and coding of audio, pp.53-58, Nov [16] T. Johnson, " Stereo coding for ITU-T G.719 codec ", Uppsala university, May
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.