Jingming Xu Multimedia Communications Lab University of Waterloo

Slides:

Advertisements

Similar presentations

Jingming Xu Multimedia Communications Lab University of Waterloo

Advertisements

Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.

Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.

Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms Braidotti Enrico (Farina Simone)

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

Time-Frequency Analysis Analyzing sounds as a sequence of frames

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.

August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.

AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.

SWE 423: Multimedia Systems

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Spatial and Temporal Data Mining

MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Scalable Wavelet Video Coding Using Aliasing- Reduced Hierarchical Motion Compensation Xuguang Yang, Member, IEEE, and Kannan Ramchandran, Member, IEEE.

Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,

Computer Vision – Compression(2) Hanyang University Jong-Il Park.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 8 – JPEG Compression (Part 3) Klara Nahrstedt Spring 2012.

 Coding efficiency/Compression ratio:  The loss of information or distortion measure:

MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.

Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.

: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.

Klara Nahrstedt Spring 2011

Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.

CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.

Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.

8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.

Compression There is need for compression: bandwidth constraints of multimedia applications exceed the capability of communication channels Ex. QCIF bit.

Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp

Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.

IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.

Page 11/28/2016 CSE 40373/60373: Multimedia Systems Quantization  F(u, v) represents a DCT coefficient, Q(u, v) is a “quantization matrix” entry, and.

STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.

Project Proposal Audio Compression Variants

From Error Control to Error Concealment Dr Farokh Marvasti Multimedia Lab King’s College London.

Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,

Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.

Presentation III Irvanda Kurniadi V. ( )

MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD

[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.

Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.

JPEG Compression What is JPEG? Motivation

The Johns Hopkins University

Data Compression.

Objective and Subjective Audio Assessment of MP3 Players’ Quality

Scalable Speech Coding for IP Networks: Beyond iLBC

Digital Communications Chapter 13. Source Coding

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

Subject Name: File Structures

Data Compression.

Video Compression - MPEG

Equalization in a wideband TDMA system

Huffman Coding, Arithmetic Coding, and JBIG2

Data Compression CS 147 Minh Nguyen.

2018/9/16 Distributed Source Coding Using Syndromes (DISCUS): Design and Construction S.Sandeep Pradhan, Kannan Ramchandran IEEE Transactions on Information.

Mohamed Chibani, Roch Lefebvre and Philippe Gournay

Scalable Speech Coding for IP Networks: Beyond iLBC

MPEG4 Natural Video Coding

Foundation of Video Coding Part II: Scalar and Vector Quantization

JPEG Still Image Data Compression Standard

MPEG-1 Overview of MPEG-1 Standard

Greedy Algorithms TOPICS Greedy Strategy Activity Selection

Image Coding and Compression

Govt. Polytechnic Dhangar(Fatehabad)

DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.

Scalable light field coding using weighted binary images

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

Jingming Xu Multimedia Communications Lab University of Waterloo Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility Welcome Ladies and Gentlemen. Thanks for coming to my master’s seminar. I am Jingming Xu. I did my master’s with Dr. En-hui Yang at the Multimedia Communications Lab. Today, I will talk about… If you have any comments or questions during my talk, I will be happy to answer them at the end of the presentation. Jingming Xu Multimedia Communications Lab University of Waterloo

Outline Introduction and motivation MP3, AAC, and Two-nested-loop Search Rate-distortion optimization for MP3 Rate-distortion optimization for AAC Conclusions and Future Research Introduction to audio coding and MPEG standards, and motivation of our research in ….. Then I will give a short review of MP3 and AAC audio coding standards, with emphasis on quantization and entropy coding constraints. State-of-the-art MP3 and AAC quantization and entropy coding scheme, and its problems. Based on the standard constraints, we will develop …. , in each case, I will provide simulation results with comparison to state-of-the-art… Rate-distortion optimization for MP3. Rate-distortion optimization for AAC. Concluding remarks. September 16th, 2005 2

Introduction Audio coding - different from universal data compression Long term correlations Multi-channel correlations Subject to natural noises Subjective perceptual quality judgement Audio coding methods - for both lossy and lossless Linear prediction Time-frequency mapping (DCT, FFT, MDCT, etc.) Parameter coding …. Audio signals have their characteristics such as …., which makes universal data compression schemes inefficient when applied directly on audio. During the past 30 years, people have come up with many methods specifically for both lossy and lossless audio coding, such as … Many of them have become international or industry standards, among which, Mpeg is certainly the most popular one. September 16th, 2005 3

Introduction (2) MPEG - the most successful audio coding standard series so far MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions The development of MPEG audio standard can be roughly divided into 2 phases …. The first phase, …., MP3 MPEG-4 even supports a new vector quantization scheme TwinTQ, and bitrate scalability. However, since those extensions are beyond the scope of our research, we simply denote both MPEG-2 AAC and MPEG-4 AAC as AAC. MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular “MP3” September 16th, 2005 4

Introduction (3) Motivations MP3 and AAC leave structured encoding blocks design open for performance enhancement. The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard-constrained flexibility for best rate-distortion tradeoff. The huge success of MP3 and AAC in the digital audio industry. Like many other multimedia compression standards, …. As we will later, …. …. There is still room we can exploit. The huge success of MP3 and AAC in the digital audio industry also motivates our research. September 16th, 2005 5

Introduction (4) Quality evaluation of compressed audio Most widely used objective measure - noise-to-mask ratio Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116) Triple sources A, B, C with hidden reference, double blind 5-grade impairment score scale Two quality evaluation methods used in our research …. NMR is the ratio of noise energy in this band to its perceptual masking threshold, here w_{i} is the inverse of the masking threshold. During the test, listener is free to listen to sources A, B, or C. Source A is known to be the reference signal. However, source B and C may be either the reference signal or the test signal. The assignment is determined randomly in that neither the listener nor the test administrator should know beforehand. After listening, listener is asked to rate sources B and C relative to source A according to a continuous 5-grade impairment scale. September 16th, 2005 6

MP3 and AAC audio coding standards Encoding process Window switching Stereo coding Pre-processing in AAC: gain control, prediction, noise shaping and substitution, etc. A high-level block diagram of MP3 encoding process is shown in … Time domain audio samples are first fed into a T/F mapping block which converts them into spectral coefficients. They are also fed into a psychoacoustic model which generates control information for T/F mapping (window switching), quantization and entropy coding. Under the psychoacoustic modeling control, spectral coefficients are quantized, entropy coded, and packed up with format information and control information. T/F mapping option: window switching Quantization and entropy coding option: separate/joint channel coding September 16th, 2005 7

MP3 and AAC audio coding standards (2) Quantization and entropy coding in MP3 Scale factor bands and non-uniform quantization scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream The whole spectrum is divided into a fixed number of scale factor bands The non-uniform quantizer, corresponding to the de-quantizer defined in MP3, can be formulated as, Where global_gain is …. After quantization, the scale factor values for one frame are broken down into four parts in the bitstream for efficient storage. Occupy September 16th, 2005 8

MP3 and AAC audio coding standards (3) Quantization and entropy coding in MP3 Huffman coding 34 fixed Huffman codebooks Huffman coding region division: Each region is coded with a different codebook that best matches the statistics of that region. big_value, count_1, zero, …. After quantization, the quantized spectrum is Huffman encoded. …. The region division fashion is generally open to design. Except for the big_value subdivision in short windows. For AAC, there are lots of … Those audio pre-processing tools are optionally applied before the quantization and entropy coding block. The coding efficiency resulting from the adoption of these tools is signal dependent, and thus they usually operate under the control of psychoacoustic model. September 16th, 2005 9

MP3 and AAC audio coding standards (4) Quantization and entropy coding in AAC Non-uniform quantizer: same as in MP3 scale_factor values are differentially encoded relatively to the one of the preceding band by fixed Huffman codebook Huffman coding 12 fixed Huffman codebooks Huffman coding region division: Section boundaries can only be at the scale factor band boundaries For each section, the length of the section in scale factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits. AAC uses the same …. In nature, the codebook index of each band is also differentially encoded solely relative to the one of the preceding band except for band 0: if they are the same, no bit needs to be transmitted at all; otherwise, it costs a fixed number of bits. September 16th, 2005 10

Two-nested-loop Search algorithm Outer Loop Inner Loop Given a target data rate, the task of its outer loop is to amplify the scale factor for “distorted” band so that NMR is less than 1. Since the amplified parts of the spectrum need more bits for encoding, but the number of available bits is constant, the inner loop changes the global quantizer step size until the given spectrum can be encoded by available bits. In all, this mechanism shifts bits from spectral regions where they are not required to those where they are required. September 16th, 2005 11

Two-nested-loop Search algorithm (2) Problems in TNLS Quantization, scale factor adaption and Huffman coding are considered separately. Has no convergence guarantee Does not target at minimizing the overall distortion Disregards the inter-band correlations of scale factors and Huffman codebook selection in AAC However, they actually work together to determine the rate-distortion performance. Optimization on only one of these factors in one step may force sub-optimal selection of the other factors in the following steps and degrade the performance of the whole system in the end. The best parameters so far have to be stored during each iteration and restored as output after the final termination. And the iteration process has to be terminated according to predefined conditions without knowing the optimality of the result. In our research, we aim at directly attack these problems … September 16th, 2005 12

Rate-distortion optimization for MP3 Problem formulation Lagrangian RD cost minimization - quantized coefficients - scale factors We formulate the rate-distortion optimization problem as the minimization of the actual Lagrangian RD cost …. By incorporating all coding factors in the …. stage - Huffman coding region division - Huffman codebook selection - non-uniform de-quantizer defined in MP3 - noise-to-mask ratio September 16th, 2005 13

Rate-distortion optimization for MP3 (2) Problem formulation Soft-decision quantization In conventional hard-decision quantization, is solely determined by given , i.e., . However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore, . The key point in our problem formulation is to further consider the quantized coefficients as optimization variable, leading to the so-called soft decision quantization. September 16th, 2005 14

Rate-distortion optimization for MP3 (3) Fixed-slope graph-based iterative RD optimization Step 1: Initialize a set of scale factors from the given frame of spectrum with a HCB selection fashion . Set t=0, and specify a tolerance as the convergence criterion. Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division fashion throughout a standard-constrained graph, where and achieve the minimum Denote by . Based on the problem formulation, we propose a fixed-slope graph-based iterative algorithm for RD optimization in mp3 Since the maximal allowed coefficient amplitude is closely related to the Huffman codebook region in which that coefficient lies given fixed Huffman codebook selection, the coding gains from the quantized spectrum and the region division are exploited jointly in Step 2, by an efficient graph-based optimal path search algorithm. September 16th, 2005 15

Rate-distortion optimization for MP3 (4) The directed graph is constructed based on the MP3 quantization and entropy coding constraints for long window. A simpler version exists for short window. Each layer corresponds to a HCB region and each state in one layer stands for two neighboring coefficients to be encoded using the HCB selected for that region. Two special states, frame_begin and frame_end, are used to take care of the start and the end of the frame, respectively. Assign each transition a cost resulting from minimizing the decomposed cost on the state which that transition goes to by adapting the corresponding quantized coefficients, the minimization then becomes the problem to search for the path with minimal accumulated cost through the graph. This graph-based search is a full dynamic programming and always gives the optimal solution. Graph Search for MP3 Quantized Spectrum and Region Division September 16th, 2005 16

Rate-distortion optimization for MP3 (5) Fixed-slope graph-based iterative RD optimization Step 3: Given , and , update to , so that achieves the minimum Step 4: Given , and , update to , so that Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until , then output , , and . Step3 update scale factors, Note that MP3 has adaptive storage for scale factors: R(q) is usually determined by a few largest scale factors in the frame, and there is no close-form formula to calculate optimal ones. Therefore, a full search of all scale factor storage fashions within the standard needs to be applied. Step4 update Huffman codebook selection, For each region, the Huffman codebook that gives the minimum codeword length is selected. Step5 repeat …. Until convergence occurs. September 16th, 2005 17

Rate-distortion optimization for MP3 (6) Simulation results: ANMR (implementation based on ISO MP3 reference codec) We implement our optimization algorithm based on ISO reference codec and the most advanced state-of-the-art MP3 codec LAME respectively. In each case, we use the original output as our initialization. For ISO reference codec, we see that the joint optimization algorithm successfully improves the coding efficiency with at least 0.8dB distortion reduction for bitrates above 64 kbit/s. violin.wav spme50_1.wav September 16th, 2005 18

Rate-distortion optimization for MP3 (7) Simulation results: ANMR (implementation based on LAME3.96.1 Best-quality mode) At least 0.6dB distortion reduction for bitrates above 64 kbit/s for LAME. In reference codec, the best parameters so far are not stored during each outer loop, which also leads to a great advantage for our joint optimization algorithm, especially in high bitrates. In both cases, the proposed joint optimization algorithm plays a less important role at low bitrates than high bitraes, since there are less non-zero quantized coefficients that can be optimized on using soft-decision quantization. violin.wav spme50_1.wav September 16th, 2005 19

Rate-distortion optimization for MP3 (8) Simulation results: ITU listening test (80kb/s) Furthermore, in listening tests, we notice that even though the proposed optimization algorithm yields roughly 0.7 gain for ISO reference codec in both music and speech cases, it still lags behind the LAME encoder alone. We believe it’s mainly the initial parameters we directly derive from the original reference codec output to blame. The iterative optimization progress most likely ends at inferior local optimality from this initialization. September 16th, 2005 20

Rate-distortion optimization for MP3 (9) Remarks The iteration process may only achieve local optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance. The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS. One can adaptively adjust the value of , to meet rate or distortion constraints in real audio compression applications. The iterative process, which can be viewed as a steepest descent minimization approach, may only achieve local optimality considering its discrete parameter space which is most likely non-convex. Specifically, we incorporate all quantization and entropy coding variables in our optimization framework, directly target at minimizing the overall perceptual distortion, and guarantee convergence to global/local optimum. September 16th, 2005 21

Rate-distortion optimization for AAC Problem formulation Lagrangian RD cost minimization - scale factor sequence - Huffman codebook index sequence Suggest we use Viterbi algorithm to solve the optimization problem. first-order inter-band dependency -> Dynamic programming (Viterbi algorithm) September 16th, 2005 22

Rate-distortion optimization for AAC (2) Fixed-slope trellis-based RD optimization Step 1: Build up trellis structure. For each state , = 0,1,…., -1, = 0,1,…., -1, = 0,1,…., -1, in the trellis, find the best to minimize its decomposed RD cost Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm Step 3: Backtrack the optimal , and as final output Trellis: We have totally N stages, where N denotes the number of scale factor bands of one frame. And each stage is represented by (si; hi). There are Ns*Nh possible representations or states for each (si; hi), corresponding to the combination of Ns possible values of si and Nh possible values of hi. Step1: …. Given fixed scale factor j and Huffman codebook k. Based on state cost got in step 1 and state transition cost …. September 16th, 2005 23

Rate-distortion optimization for AAC (3) Trellis Structure for AAC Quantization and Entropy Coding September 16th, 2005 24

Rate-distortion optimization for AAC (4) Simulation results: ANMR Implementation based on ISO AAC reference codec Also compared with Aggarwal’s approach (Steps 2, 3 only) We see that our proposed algorithm successfully improves the coding efficiency of ISO reference codec with at least 2.2dB distortion reduction for bitrates above 96 kbit/s, and more distortion reduction for bitrates below. The side-information, including the scale factors and Huffman codebook indexs, counts for a larger portion of the total bitrate in low bitrates, which leads to a great advantage for Viterbi over traditional TNLS scheme used in ISO reference codec. While, soft-decision quantization (Step1) plays a more important role at high bitrates. The distortion reduction from soft-decision quantization grows up to roughly the same level as that from Viterbi when bitrate reaches 192 kbit/s. violin.wav spme50_1.wav September 16th, 2005 25

Rate-distortion optimization for AAC (5) Simulation results: ITU listening test (64kb/s) Even though pure Viterbi already yields roughly 1.25 improvement against TNLS in both music and speech cases, joint soft-decision quantization and Viterbi can still push for another 0.25 gain. September 16th, 2005 26

Rate-distortion optimization for AAC (6) Remarks The fixed-slope trellis-based algorithm we proposed achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints. Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame. Please refer to the proof of “global optimality” in the thesis. Since AAC supports M/S stereo coding, TNS, Prediction, LTP and PNS on a band-to-band basis …. September 16th, 2005 27

Conclusions and Future Research Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm. Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement. Substantial performance improvement against the state-of-the-art encoders is achieved with complete decoder compatibility in each case. …. It’s also important to emphasize that …. The additional computation complexity due to the proposed optimization is only incurred at the encoder. September 16th, 2005 28

Conclusions and Future Research (2) Real-time implementations Extension to scalable AAC Joint pre-processing and optimization for AAC Optimal lossy audio compression without syntax constraints Optimal settings for transform (e.g. block lengths), quantization (e.g. stepsizes) and prediction Joint design of quantization and entropy coding …. The proposed MP3 RD optimization, implemented by using pure C code, runs 8 times slower than real-time on a 1.7Ghz CPU, and the proposed AAC rate-distortion optimization, also pure C code based, takes even more time. Joint RD optimization schemes for each transmission layer so that the optimal RD performance for the entire system can be achieved. To make the complexity affordable in real audio compression applications, an immediate challenge is then how to design certain joint perceptual modeling and signal analysis methods to sort all possible pre-processing decision candidates. The successes of our RD optimization for lossy audio compression with complete MP3 and AAC decoder compatibility also give rise to the research proposal of developing optimal lossy audio compression algorithms without any standard constraints, where many fundamental design problems are open, such as, September 16th, 2005 29

Questions?