EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio.

Slides:



Advertisements
Similar presentations
MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Advertisements

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Audio Coding Team Member: ChungMing Yan, Chun Tong.
MPEG Audio Formats Jason Leung Wednesday, February 5, 2014.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
2nd Workshop on Wideband Speech Quality - June Perceptual Wideband Audio Quality Assessments Using PEAQ Christian Schmidmer Opticom GmbH, Erlangen.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Audiovisual digital documents Adolf Knoll National Library of the Czech Republic
1 © NOKIA Audio Codecs Audio Codecs Miikka Vilermo Nokia Research Center – Audio Visual Systems Laboratory.
2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.
SWE 423: Multimedia Systems
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
MPEG-4 Cedar Wingate MUMT 621 Slide Presentation I Professor Ichiro Fujinaga September 24, 2009.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
An Overview of Perceptual Audio Coding and MPEG AAC
MPEG-2 Digital Video Coding Standard
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
EE 5359 Multimedia Processing Project Study and implementation of G
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
MPEG-2 Standard By Rigoberto Fernandez. MPEG Standards MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (International Standards.
 Coding efficiency/Compression ratio:  The loss of information or distortion measure:
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
1 Seminar Presentation Multimedia Audio / Video Communication Standards Instructor: Dr. Imran Ahmad By: Ju Wang November 7, 2003.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Dhatchaini Rajendran Student ID: Date :
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
IMPLEMENTATION OF H.264/AVC, AVS China Part 7 and Dirac VIDEO CODING STANDARDS Under the guidance of Dr. K R. Rao Electrical Engineering Department The.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
AIMS’99 Workshop Heidelberg, May 1999 Assessing Audio Visual Quality P905 - AQUAVIT Assessment of Quality for audio-visual signals over Internet.
Guerino Mazzola (Fall 2015 © ): Introduction to Music Technology IIIDigital Audio III.5 (F Oct 30) MP3 and other digital audio file formats.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Minjie Xie, Dave Lindbergh, and Peter Chu
Transcoding based optimum quality video streaming under limited bandwidth *Michael Medagama, **Dileeka Dias, ***Shantha Fernando *Dialog-University of.
Overview of Digital Video Compression Multimedia Systems and Standards S2 IF Telkom University.
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs Dhatchaini Rajendran Student ID: Date :
By: Santosh Kumar Muniyappa ( ) Guided by: Dr. K. R. Rao Final Report Multimedia Processing (EE 5359)
Report on MPEG activities (WP4) Schema 5 th Technical Committee Meeting Ipswich, February 2004 Josep R. Casas, UPC.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
Opus SW codec RTLAB Ki Eun Seong. What is the Opus Codec? Real-time interactive audio codec Targets interactive audio over the internet Aims to be royalty-free,
MP3 and MP4 Audio By: Krunal Tailor
CSI-447: Multimedia Systems
CS644 Advanced Topics in Networking
Nokia Research Center – Audio Visual Systems Laboratory
Standards Presentation ECE 8873 – Data Compression and Modeling
MPEG-1 Overview of MPEG-1 Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec)   Name: Yashas Prakash Student ID :1000803680 Instructor: Dr. K. R. Rao Date: 04-25-2012

List of acronyms AAC - Advanced audio coding ATSC - Advanced television systems committee AES - Audio Engineering Society EBU - European broadcasting union FLVQ - Fast lattice vector quantization HE-AAC - High efficiency advanced audio coding HRQ - Higher rate lattice vector quantization IMDCT - Inverse modified discrete cosine transform ISO - International organization for standardization ITU - International telecommunication union JAES - Journal of the Audio Engineering Society LC - Low complexity LRQ - Lower rate lattice vector quantization LFE - Low frequencies enhancement LTP - Long term prediction MDCT - Modified discrete cosine transform MPEG - Moving picture experts group SBR - Spectral band replication SMR - Symbolic music representation SRS - Sample rate scalable TDA - Time domain aliased WMOPS - Weighted millions operations per second

Introduction to codecs A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. It can be thought of as a compressor/de-compressor or encoder/decoder Codec programs are required for the media player to play audio/video files. A codec encodes a data stream or signal for transmission, storage or encryption and decodes it for playback or editing. Codecs are used in videoconferencing, streaming media and video editing applications.

Introduction to G.719 codec [1] G.719 is an ITU-T standard audio codec providing high quality, moderate bit rate (32 to 128 kbit/s) wideband (20 Hz - 20 kHz audio bandwidth, 48 kHz audio sample rate) audio coding at low computational load [1]. It was produced through a collaboration between Polycom and Ericsson. G.719 incorporates elements of Polycom's Siren22 codec (22 kHz) and Ericsson codec technology, as well as Polycom's Siren7 and Siren14 codecs (G.722.1 and G.722.1 Annex C), which have been used in videoconferencing systems for many years [1].

Advantages and Applications of G.719 [3] The algorithm is designed to provide 20 Hz - 20 kHz audio bandwidth using a 48kHz sample rate, operating at 32 - 128 kbps [3]. This codec features very high audio quality and low computational complexity and is suitable for use in applications such as videoconferencing, teleconferencing, and streaming audio over the Internet [3].

OVERVIEW OF THE G.719 CODEC [1] The G.719 codec is a low-complexity transform-based audio codec and can provide an audio bandwidth of 20 Hz to 20 kHz at 32 - 128 kbps. The codec features very high audio quality and extremely low computational complexity compared to other state-of-the-art audio coding algorithms. G.719 is optimized for both speech and music. It is based on transform coding with adaptive time-resolution, adaptive bit-allocation and low complexity lattice vector quantization [1]. The computational complexity is quite low (18 floating-point MIPS) for an efficient high-quality compressor [1].

G.719 Contd.. The codec operates on 20 ms frames, and the algorithmic delay end-to-end is 40 ms [2]. The encoder input and decoder output are sampled at 48 kHz [2].

Block diagram of G.719 encoder Block diagram of the G.719 encoder [1].

ADAPTIVE TIME-FREQUENCY-TRANSFORM The adaptive time-frequency transform is based on the detection of a transient sounds [3]. In the case of transient sounds, the time-frequency transform will increase its time resolution and allows a better representation of the rapid changes in the input signal characteristics [3].

G.719 Decoder Block diagram of the G.719 decoder [1].

Complexity in G.719 Complexity is a paramount parameter for a codec. Complex codecs require more powerful and more expensive digital signal processors (DSPs) to run on [1]. This increases the product cost and power consumption, which limits the codec usability [1]. The fixed-point C-code implementation of G.719, which is an integral part of the recommendation by ITU-T, is based on a set of instructions that mimics a generic DSP instruction set [1].

An overview of AAC codec [9] Advanced audio coding(AAC) scheme was a joint development by Dolby, Fraunhoffer, AT&T, Sony and Nokia [9]. It is a digital audio compression scheme for medium to high bit rates which is not backward compatible with moving picture experts group (MPEG) audio standards [9]. AAC is a second generation coding scheme which is used for stereo and multichannel signals. When compared to the perceptual coders, AAC provides more flexibility and uses more coding tools [12].

AAC codec contd., The AAC encoding follows a modular approach and the standard defines four profiles which can be chosen based on factors like complexity of bitstream to be encoded, desired performance and output [9]. Low complexity (LC) Main profile (MAIN) Sample-rate scalable (SRS) Long term prediction (LTP)

An overview of the HE-AAC codec [9] High efficiency advanced audio codec is a lossy data compression scheme. It is an extension of low complexity AAC optimized for low bit rate operations such as streaming audio. HEAAC uses spectral band replication (SBR) technology to enhance the compression efficiency in frequency domain. Scientific testing by the European Broadcasting Union has indicated that HE-AAC at 48 kbit/s was ranked as "Excellent" quality using the MUSHRA scale [9]. Testing indicates that material decoded from 64 kbit/s HE-AAC does not yet have similar audio quality to material decoded from MP3 at 128 kbit/s using high quality encoders.

Block diagram of the SBR encoder [15]

Block diagram of the SBR decoder [15]

Subjective performance of G.719 [1]

Explanation for the subjective performance of G.719 codec DMOS: Degradation mean opinion score:- It is defined as user’s view of the quality of the network. It is a subjective measurement where listener would sit in a quiet room and score call quality as they are perceived. Requirements : The talker should be seated in a quite room and the reverberation time is less than 500ms, Room noise level should be below 30dba. DMOS Ratings: 5=excellent, 4=good, 3=fair, 2=poor, 1=annoying Experiment 1: speech Experiment 2: mixed content and music(speech music and noise) The reference test vector used in these experiments are of MPEG audio format. Studying the above graphs: In experiment 1 the G.719 codec performed better at all bit rates In experiment 2 the G.719 codec better than the reference codec at lowest bit-rate and is almost the same as the reference for all other bit rates. An additional subjective listening test for the G.719 codec was conducted later to evaluate the quality of the codec at rates higher than those described in the ITU-T test plan. Because the quality expectation of the codec at these high rates is high, a pre-selection of critical items, for which the quality at the lower bit rate range was most degraded, was conducted prior to testing. The test results are shown in Figure 7. It has been proven that transparency was reached for critical material at 128 kbps.

Framework of G.719 audio codec The G.719 framework is defined by the transformation of time domain signals into frequency domain spectra. The transform is a modulated lapped transform(MLT) performed differently on the mode selection based on the transient detection. The MLT consists of windowing followed by modified DCT. The transient mode consists of further time segmentation into four sub-frames to improve the time resolution. The transient are detected from time-domain signal in order to select a fine time resolution for transients as well as for stationary signal. The switching between the stationary and transient is instantaneous and does not require the usage of transient window The MLT is applied to block of two consecutive frame which is explained in this slide. The signal is sampled at 48khz with a window function, then the reference signal should be 25khz, which satisfies the nyquist criterion, this reference signal is an overlapping signal. Due to the large frequency spread of the rectangular window the freq analysis can be contaminated by aliasing. In order to reduce the frequency spread and suppress the aliasing effect windows with sharp discontinuities are used.

Frame buffering and windowing with overlap [16]

Explanation for windowing of sub-frames In transient mode of G.719 the time aliased signal block is reversed in time and divided into four sub-frames The reversion recreates the temporal coherence of the input signal that was destroyed by time domain aliasing. The first and the last sub-frames are windowed by half sine windows with a fourth of zero padding while second and third sub-frames are windowed with ordinary sine window. The overlap between windowed sub-frames is 50% and each segment is MDCT transformed. The transform lengths are equal in the stationary and transient mode of G.719.

Windowing of sub-frames in transient mode [16]

Steps for implementation Use a C-compiler such as DevC++ to compile the code. Any C-compiler can be used to generate the executable files. The encoder code is executed to get encoder.exe file which is used to for encoding the input test_vectors of 32,48 and 64kbps. The decoder code is executed to get decoder.exe file which is used to decode the the encoded test_vectors which are of 32,48 and 64kbps respectively. The encoded and the decoded files are compared with each other in the console to check if decoded file and the original test_vector was the same.

Console commands for execution The console commands to encode a test_vector at 32kbps is as follows:- encoder.exe –r 32000 –i *input file path\test_vector.raw –o *output file path\test_32000_en.bs The console commands for the decoder at the same bit rate is as follows:- Note: The input file here is the encoded file at 32000 bps - decoder.exe –r 32000 –i *path of the input file\test_32000_en.bs –o *specific path of the output file\test_32000_dec.raw Note: It is advisable to keep the encoded and decoded files in the same root folder as it would it be very helpful to compare the files the sound frames are encoded and decoded. Type console command :- comp test_32000_dec.raw test_vector.raw This command gives us the validation that the decoded file is infact same as the test_vector.raw through which the file was encoded. The screen shots of the above commands implemented in console is shown.

Instructions to implement the encoder and decoder

Implemented encoder

Implementation of decoder

Comparison of decoded sequence with default test vector

References [1] M. Xie, P. Chu, A. Taleb and M. Briand, " A new low-complexity full band (20kHz) audio coding standard for high-quality conversational applications ", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.265-268, Oct. 2009. [2] A. Taleb and S. Karapetkov, " The first ITU-T standard for high-quality conversational fullband audio coding ", IEEE communications magazine, vol.47, pp.124-130, Oct. 2009. [3] J. Wang, B. Chen, H. He, S. Zhao and J. Kuang, " An adaptive window switching method for ITU-T G.719 transient coding in TDA domain", IEEE International Conference on Wireless, Mobile and Multimedia Networks, pp.298-301, Jan. 2011. [4] J. Wang, N. ning, X. ji and J. kuang, " Norm adjustment with segmental weighted SMR for ITU-T G.719 audio codec ", IEEE International Conference on Multimedia and Signal Processing, vol.2, pp.282-285, May. 2011.

References [5] K. Brandenburg and M. Bosi, “ Overview of MPEG audio: current and future standards for low-bit-rate audio coding ” JAES, vol.45, pp.4-21, Jan/Feb. 1997. [6] A/52 B ATSC Digital Audio Compression Standard: http://www.atsc.org/cms/standards/a_52b.pdf [7] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its application in broadcasting ”, International broadcasting convention, 2003. [8] M. Dietz and S. Meltzer, “CT-AACPlus – a state of the art audio coding scheme”, Coding Tecnologies, EBU Technical review, July. 2002. [9] ISO/IEC IS 13818-7, “ Information technology – Generic coding of moving pictures and associated audio information Part 7: advanced audio coding (AAC) ”, 1997.

References [10] M. Bosi and R. E. Goldberg, “ Introduction to digital audio coding standards ”, Norwell, MA, Kluwer, 2003. [11] H. S. Malvar, “ Signal processing with lapped transforms ”, Artech House, Norwood, MA, 1992. [12] D. Meares, K. Watanabe and E. Scheirer, “ Report on the MPEG-2 AAC stereo verification tests ”, ISO/IEC JTC1/SC29/WG11, Feb. 1998. [13] Super (c) v.2012.build.50: A simplified universal player encoder and renderer, A graphic user interface to FFmpeg, Mencoder, Mplayer, x264, Musepack, Shorten audio, True audio, Wavpack, Libavcodec library and Theora/vorbis real producers plugin: www.erightsoft.com [14] T. Ogunfunmi and M. Narasimha, “ Principles of speech coding ”, Boca Raton, FL: CRC Press, 2010. [15] P. Ekstrand, " Bandwidth extension of audio signals by spectral band replication ", IEEE Workshop on model based processing and coding of audio, pp.53-58, Nov. 2002. [16] T. Johnson, " Stereo coding for ITU-T G.719 codec ", Uppsala university, May 2011.