MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms Braidotti Enrico (Farina Simone)
Multi-Threading LAME MP3 Encoder
Data Compression CS 147 Minh Nguyen.
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
A Performance Analysis of the ITU-T Draft H.26L Video Coding Standard Anthony Joch, Faouzi Kossentini, Panos Nasiopoulos Packetvideo Workshop 2002 Department.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
SWE 423: Multimedia Systems
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
Department of Computer Engineering University of California at Santa Cruz Data Compression (3) Hai Tao.
Compilation Techniques for Multimedia Processors Andreas Krall and Sylvain Lelait Technische Universitat Wien.
Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
A Geometric-Primitives-Based Compression Scheme for Testing Systems-on-a-Chip Aiman El-Maleh 1, Saif al Zahir 2, Esam Khan 1 1 King Fahd University of.
Methods of Image Compression by PHL Transform Dziech, Andrzej Slusarczyk, Przemyslaw Tibken, Bernd Journal of Intelligent and Robotic Systems Volume: 39,
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms Farina Simone (Braidotti Enrico)
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket.
JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
A hardware-Friendly Wavelet Entropy Codec for Scalable video Hendrik Eeckhaut ELIS-PARIS Ghent University Belgium.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Aug 25, 2005 page1 Aug 25, 2005 Integration of Advanced Video/Speech Codecs into AccessGrid National Center for High Performance Computing Speaker: Barz.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Introduction to MMX, XMM, SSE and SSE2 Technology
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
NISC set computer no-instruction
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Fundamentals of Programming Languages-II
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.
EECS 583 – Class 22 Research Topic 4: Automatic SIMDization - Superword Level Parallelism University of Michigan December 10, 2012.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Data Reuse in Embedded Processors Peter Trenkle CPE631 Project Presentation.
Parallel Computing Presented by Justin Reschke
SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.
Fast Algorithms for Discrete Wavelet Transform
Data Compression.
Applications of Multirate Signal Processing
Embedded Systems Design
Data Compression.
Digital Signal Processors
High Performance Computing (CS 540)
STUDY AND IMPLEMENTATION
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Esam Ali Khan M.S. Thesis Defense
MPEG-1 Overview of MPEG-1 Standard
Image Coding and Compression
Samuel Larsen Saman Amarasinghe Laboratory for Computer Science
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom Mobile Technology

Abstract  An application’s execution time depends on the processor architecture and clock frequency, the computational complexity of the algorithm, the choice of compiler and optimization options, and it also depends on how well the programmer explicitly and implicitly exploits processor architecture. This article quantifies the influence of these factors for an MP3 decoder through experimental results

Outline  What’s the problem?  MP3 decoder overview  MP3 decoder implementations  Performance comparison  Experiment results  Conclusion

What’s the problem?  What factors can influence the application’s execution time?  Executing processor’s architecture and clock frequency  The computational complexity of the algorithm  The compiler  The programmer’s skill  But how much influence do these factors exert on overall performance?

MP3 decoder overview ( 1 )

MP3 decoder overview ( 2 )  Preprocessing  Finds frames in the bitstream  Extracts their compressed audio data and information Huffman tables, scale factors  Requantization  Reconstruct the original frequency line samples xr i by using scale factors extracted form preprocessing  xr i = sign(is i ) |is i | 4/3 × 2 Cj/4

MP3 decoder overview ( 3 )  Huffman decoding  Huffman encoding is a lossless coding scheme  Decoding process is based in several Huffman tables for mapping Huffman code to symbols  Total 17 different tables  The significant part of the processing handling the compressed audio bitstream Searching Huffman tables

MP3 decoder overview ( 4 )  Reordering  The encoder reorder short blocks to make the Huffman coding more efficiently  The decoder reverses this reordering  Stereo decoding  To exploit redundancies between different stereo channels  When using single channel or dual channel, no stereo processing is necessary

MP3 decoder overview ( 5 )  Alias reduction  In the encoder, it is necessary to negate the alias effects of the polyphase filter bank  Consist of eight butterfly calculations for each pair of adjacent subbands  IMDCT

MP3 decoder overview ( 6 )  Frequency inversion  To compensate for frequency inversions, this stage negate every odd sample in all odd subbands  Synthesis polyphase filter bank

MP3 decoder implementations ( 1 )  Standard version  Implement MP3 following documentations  Using only the tables specified in the standard  Basic version  Improving on the standard version  Replace some instructions by other with few clock cycles EX : replace floating-point division by multiplicands and some integer multiply instruction by shift  Replace computationally intensive library functions with tables  Library functions, using special processor instructions, replace slower high-level programmer code  Using loop unrolling to improve some loops

MP3 decoder implementations ( 2 )  SIMD version  Improving on the basic version using SIMD extensions  MP3 is based on vector operations, so it can achieve benefit from SIMD instructions Requantization, stereo processing, IMDCT, and synthesis filter bank  Using SIMD for improving memory initializations and block transfers

MP3 decoder implementations ( 3 )  Algorithm version  Improving basic version with algorithm  Synthesis polyphase filter bank Konstantinides’ method reduces the number of operations by transforming the matrixing operation to a 32 DCT and some reorder operation  IMDCT Marovich’s method Reduce IMDCT to a fast DCT and some data copying operations  Huffman decoding A tree-clustering algorithm can speed up the search process

MP3 decoder implementations ( 3 )  Algorithm-SIMD version  Based on SIMD version combined with the SIMD implementation  Using IMDCT and synthesis algorithm and clustering Huffman-decoding

Performance comparison ( 0 )  Optimization operations

Performance comparison ( 1 )  O2  Include classical optimizations that are processor independent  Include inline function expansion  G6  This switch optimizes code for Pentium Pro, PII, and PIII, generating code that is compatible with earlier processors  G7  This switch optimizes code for Pentium IV, generating code that is compatible with earlier processors  QxK  Allow vectorization using the SSE and MMX instruction included in PIII and P4  Arch:SSE  Using SSE and cmov instructions

Performance comparison ( 2 )  Test platform  Test MP3 file  Note  We measure processor clock cycle instead of time, so the result are independent of the processor clock frequency

Experiment results ( 1 )

Experiment results ( 2 )

Experiment results ( 3 )

Conclusion  Exploiting architecture features can be as important as choosing the right algorithms  Programmer can exploit architecture features to a higher degree than compiler  Optimization choice depends on the application