1/75 Embedded Audio Coder Jin Li 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream.

Slides:



Advertisements
Similar presentations
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advertisements

T.Sharon-A.Frank 1 Multimedia Compression Basics.
Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms Braidotti Enrico (Farina Simone)
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
SWE 423: Multimedia Systems
CABAC Based Bit Estimation for Fast H.264 RD Optimization Decision
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Department of Computer Engineering University of California at Santa Cruz Data Compression (3) Hai Tao.
1/88 DCT Transform Decoder. 2/88 Image (512x512) Subsample (128x128) Manipulation Reposition : (256,256)-(384,384) Compress (JPEG) D array.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
Spatial and Temporal Data Mining
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Losslessy Compression of Multimedia Data Hao Jiang Computer Science Department Sept. 25, 2007.
Notes by Shufang Wu Embedded Block Coding with Optimized Truncation - An Image Compression Algorithm Notes by Shufang Wu
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
Statistical Multiplexer of VBR video streams By Ofer Hadar Statistical Multiplexer of VBR video streams By Ofer Hadar.
Variable Bit Rate Video Coding April 18, 2002 (Compressed Video over Networks: Chapter 9)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Still Image Conpression JPEG & JPEG2000 Yu-Wei Chang /18.
Lossy Compression Based on spatial redundancy Measure of spatial redundancy: 2D covariance Cov X (i,j)=  2 e -  (i*i+j*j) Vertical correlation   
Fundamentals Rawesak Tanawongsuwan
 Coding efficiency/Compression ratio:  The loss of information or distortion measure:
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
10/6/2015 3:12 AM1 Data Encoding ─ Analog Data, Digital Signals (5.3) CSE 3213 Fall 2011.
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Project Proposal Audio Compression Variants
Encoding Stored Video for Streaming Applications IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 2, FEBRUARY 2001 I.-Ming.
Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Presentation III Irvanda Kurniadi V. ( )
MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
JPEG Compression What is JPEG? Motivation
CSI-447: Multimedia Systems
IMAGE COMPRESSION.
The Johns Hopkins University
FHTW Wavelet Based Video Compression Using Long Term Memory Motion-Compensated Prediction and Context-based Adaptive Arithmetic Coding D.Marpe, H.L.Cycon,
Data Compression.
Video Compression - MPEG
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Standards Presentation ECE 8873 – Data Compression and Modeling
UNIT IV.
Judith Molka-Danielsen, Oct. 02, 2000
MPEG-1 Overview of MPEG-1 Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

1/75 Embedded Audio Coder Jin Li

2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion

3/75 Introduction

4/75 Introduction – Audio Compression Audio Waveform... Bitstream

5/75 EAC vs. Other Compression Existing audio compression schemes MP3, AAC, MPEG4 audio, WMA, Real Audio, … Why research for a new audio codec?

6/75 Media vs. File Compression File compression Every bit is important, has to be compressed losslessly Media compression Exact bit/value is not important, distortion is tolerable Amount of media is huge, high compression ratio is required Media needs adaptation

7/75 Key Features of EAC Not only good compression performance But also flexible bitstream syntax The compressed bitstream may be manipulated for  Different bitrate  Different # of audio channels  Different audio sampling rate Versatile Lossless Low delay Streaming/storage application

8/75 EAC Encoder Encoder... Master Bitstream Companion File

9/75 Parser Except header, application bitstream is a subset of the master bitstream (parsing is fast) May be changed according to the required bitrate, # of audio channels, and audio sampling rate Parser... Master Bitstream Companion File... Application Bitstream

10/75 EAC Decoder Encoder... Bitstream Speaker (Direct Sound).wav file

11/75 Embedded Audio Coder - Algorithm Description

12/75 Frame Work - Encoder Transform Entropy coder Bitstream Assembly... Transform Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R

13/75 Audio Transform Input: audio sample Output: transform coefficient Goal: convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic characteristics Enable audio sampling rate change

14/75 Lossy vs Lossless Mode MLT(SW) Audio Quantization Lossy mode Reversible MLT(SW) Audio Lossless mode

15/75 Lossy (Float) Pass

16/75 MLT - Modulated Lapped Transforms Spatial Response Frequency Domain

17/75 MLT with Window Switching Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and only if  Energy is bigger than a certain threshold  Energy within the 8 subframes (256 samples) differs more than T a  There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by T b

18/75 Band Separation Audio (44.1kHz sampling) MLT with window switching Band separation 0   

19/75 Synthesis (Half Sampling) Audio (22.05kHz sampling) MLT with window switching Band separation 0  

20/75 Synthesis (Quarter Sampling) Audio (11.025kHz sampling) MLT with window switching Band separation 0 

21/75 Quantizer Input: coefficient Output: quantized coefficient Goal: convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

22/75 Quantizer Scalar quantizer with a deadzone Quantized MagnitudeSign 0 

23/75 Lossless (Integer) Pass

24/75 Key to Achieve Lossless Break the MLT into small steps Make every step reversible Definition of reversible transform Integer input, integer output The transform should have a determinant of 1 (donot expand data volume)

25/75 MLT Framework Pre-Rotate Complex FFT Post Rotation DCT IV Window Lapped Transform Pre-Rotate -l Complex FFT -l Post Rotation -l Inv Window-l Forward MLT Inverse MLT

26/75 Window Operation x(n) x(-n-1) Complex Rotate

27/75 Pre-Rotation Complex Rotate –  /32 x w (0) x w (1) x w (2) x w (3) x w (4) x w (5) x w (6) x w (7) Complex Rotate –5  /32 Complex Rotate –9  /32 Complex Rotate –13  /32 x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)

28/75 FFT (4 Point Complex) x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7) x c (0) x c (1) x c (2) x c (3) - - e -j  /2 - - y c (0) y c (1) y c (2) y c (3) y p (0) y p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)

29/75 Post-Rotation Conjugate Rotate –0  y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) Conjugate Rotate –  /8 Conjugate Rotate –2  /8 Conjugate Rotate –3  /8 y p (0) y p (1) y p (2) y p (3) y p (4) y p (5) y p (6) y p (7)

30/75 Reversible MLT Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

31/75 Reversible Unit Transform

32/75 Entropy Coder Input: quantized coefficients Output: embedded coded bitstream with R-D performance curve Goal: Compression Embedded bitstream for future manipulation

33/75 Frame Grouping Time slot Frame

34/75 Entropy Coder D R Bitstream R-D curve

35/75 Entropy Coder Embedded coding Implicit psychoacoustic masking Context modeling Arithmetic coding Implementation concerns

36/75 A block of coefficients Next View graph

37/75 Bits of Coefficients Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 coefficient

38/75 Conventional Coding First Second Third Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w

39/75 Embedded Coding Sign b 1 b 2 b 3 b 4 b 5 b 6 b FirstSecondThird w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 40 Range

40/75 Audio Masking Frequency Critical Band Neighboring Band Noise Level Signal Masking Threshold Maximum Mask Signal-to mask ratio Noise-to mask ratio

41/75 Psychoacoustic Masking Traditional approach (explicit masking, all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding approach) according to the masking Encode the transform coefficients Note Mask modifies the coding content

42/75 Implicit Psychoacoustic Masking Key Mask modifies the coding order, the content is the same Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

43/75 Embedded Coding with Implicit Psychoacoustic Masking Sign b 1 b 2 b 3 b 4 b 5 b 6 b First w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 0 Range Coefficient: Significant Insignificant Mask

44/75 Embedded Coding with Implicit Psychoacoustic Masking Sign b 1 b 2 b 3 b 4 b 5 b 6 b FirstSecond w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 48 Range Coefficient: Significant Insignificant

45/75 Context Modeling Context Zero coding  Significant statuses of neighbor coefficients Refinement  Whether it is the 1 st refinement pass  Significant statuses of neighbor coefficients Sign  Neighbor signs

46/75 After Implicit Psychoacoustic Masking & Context Modeling Bit: …… Ctx: …… Automatically generated To be encoded

47/75 Arithmetic Coding – Illustration (QM Coder used) What is arithmetic coding P 0 P0P0 1-P 1 P1P1 1-P 2 P2P2 S 0 =0 S 1 =1 S 2 = Coding result: (Shortest binary bitstream ensures that interval B= to C= is (B,C)  A ) A B C

48/75 Entropy Coder (Summary) D R Bitstream R-D curve

49/75 Speed Up Issues Context Modeling Use stored context Update context when a coefficient becomes significant Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask R-D curve calculation Lookup table calculation of distortion Context entropy coder QM coder Run-length Rice coder

50/75 Bitstream Assembly Input : Bitstream R-D curve Output : Assembled bitstream Companion file... Bitstream assembling

51/75 EAC Bitstream Syntax Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes) EAC marker Global Header Timeslot Head Body Timeslot Head Body Timeslot Head Body

52/75 Companion File Global Header Timeslot Head R-D curve Timeslot Head R-D curve Timeslot Head R-D curve

53/75 Rate-Distortion Optimized Assembling (Single Timeslot) D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 r1r1 r2r2 r3r3 r4r4...

54/75 Rate-Distortion Optimized Assembling (Multiple Timeslots) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve

55/75 Allocated Bytes Per Timeslots Allocated bytes for a certain timeslot B i = Buf i-1 – Buf i + Rate trans * Time Where B i : allocated bytes for timeslot i Buf i : buffer occupancy level at timeslot i Rate trans : coding (network) rate per second Time : time duration of the timeslot

56/75 Optimization Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate level with a sliding window ) Buffer occupancy constraint Search for the allocated # of bytes for the current timeslot

57/75 Search (R-D slope) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Underflow (too many bytes) Overflow (too few bytes) Waste bytes

58/75 Multiple Timeslots – Constant Bitrate Buffer Occupancy (Bytes) Time (timeslots) Illegal Region

59/75 Multiple Timeslots – Internet Streaming (Slow Start) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve

60/75 Multiple Timeslots – Internet Streaming (Normal) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region

61/75 Modular Software Design MLT(SW) Quantizer Entropy coder Bitstream Assembly... MLT(SW) Quantizer Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R

62/75 Modular Software Design Highly modularized pipeline design Quantizer, entropy coder can be used for image/video compression as well Probe and data input can be inserted into any part of the program Data flow driven (with necessary memory regulator ) No long delay No need for large memory Memory and computation efficient Working memory preallocated

63/75 Experimental Results

64/75 EAC – Highly Efficient (NMR) Results based on the average of 16 MPEG4 test clips The smaller the NMR, the better EAC WMA MP4TwinVQ 8kbps16kbps32kbps48kbpsCodec

65/75 EAC – Lossless Results based on the average of 16 MPEG4 test clips 1.32WinZip 2.72 Monkey’s Audio 2.72EAC Compression Ratio Codec

66/75 EAC (Versatile) Versatile Real time 2-way communication (Low delay mode) Storage device (Pocket PC, Xbox) Internet streaming

67/75 EAC (Low Delay Mode) Reducing frame size Timeslot = 1 frame Fixed length timeslot bitstream Delay = 2 frame Ignore encoding/decoding delay) Network transmission time (if modem line, delay = 3 frames )

68/75 EAC (Low Delay Mode) Encoder Frame = i-1 i i+1 Start Encoding Frame i MLT, Quantizer, Entropy Bitstream Start Decoding Frame i Entropy, Quantizer network Playable here

69/75 EAC – Flexible Bitstream Syntax Flexible bitstream syntax Parser may reassemble the bitstream 1000x real time Change  bit rate,  # of audio channels,  audio sampling rate

70/75 EAC – Software Software Encoder: 8x realtime (Stereo, 44.1kHz sampling) Decoder: 20x realtime Parser: 1000x realtime

71/75 EAC - Encoder Audio Encoder Stereo,128kbps Companion file

72/75 EAC - Parser Parser Companion file Stereo,128kbps Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling Server

73/75 EAC - Decoder Decoder Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling

74/75 Comparison Original MP4 TwinVQ WMA EAC MP3

75/75 Conclusions An embedded audio coder is developed Highly efficient Versatile  Low delay, constant bitrate, streaming Flexible bitstream  Parsing for bitrate, # of audio channels, audio sampling rate Good prototype available  realtime execution, small memory footprint