1/75 Embedded Audio Coder Jin Li
2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion
3/75 Introduction
4/75 Introduction – Audio Compression Audio Waveform... Bitstream
5/75 EAC vs. Other Compression Existing audio compression schemes MP3, AAC, MPEG4 audio, WMA, Real Audio, … Why research for a new audio codec?
6/75 Media vs. File Compression File compression Every bit is important, has to be compressed losslessly Media compression Exact bit/value is not important, distortion is tolerable Amount of media is huge, high compression ratio is required Media needs adaptation
7/75 Key Features of EAC Not only good compression performance But also flexible bitstream syntax The compressed bitstream may be manipulated for Different bitrate Different # of audio channels Different audio sampling rate Versatile Lossless Low delay Streaming/storage application
8/75 EAC Encoder Encoder... Master Bitstream Companion File
9/75 Parser Except header, application bitstream is a subset of the master bitstream (parsing is fast) May be changed according to the required bitrate, # of audio channels, and audio sampling rate Parser... Master Bitstream Companion File... Application Bitstream
10/75 EAC Decoder Encoder... Bitstream Speaker (Direct Sound).wav file
11/75 Embedded Audio Coder - Algorithm Description
12/75 Frame Work - Encoder Transform Entropy coder Bitstream Assembly... Transform Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R
13/75 Audio Transform Input: audio sample Output: transform coefficient Goal: convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic characteristics Enable audio sampling rate change
14/75 Lossy vs Lossless Mode MLT(SW) Audio Quantization Lossy mode Reversible MLT(SW) Audio Lossless mode
15/75 Lossy (Float) Pass
16/75 MLT - Modulated Lapped Transforms Spatial Response Frequency Domain
17/75 MLT with Window Switching Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than T a There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by T b
18/75 Band Separation Audio (44.1kHz sampling) MLT with window switching Band separation 0
19/75 Synthesis (Half Sampling) Audio (22.05kHz sampling) MLT with window switching Band separation 0
20/75 Synthesis (Quarter Sampling) Audio (11.025kHz sampling) MLT with window switching Band separation 0
21/75 Quantizer Input: coefficient Output: quantized coefficient Goal: convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
22/75 Quantizer Scalar quantizer with a deadzone Quantized MagnitudeSign 0
23/75 Lossless (Integer) Pass
24/75 Key to Achieve Lossless Break the MLT into small steps Make every step reversible Definition of reversible transform Integer input, integer output The transform should have a determinant of 1 (donot expand data volume)
25/75 MLT Framework Pre-Rotate Complex FFT Post Rotation DCT IV Window Lapped Transform Pre-Rotate -l Complex FFT -l Post Rotation -l Inv Window-l Forward MLT Inverse MLT
26/75 Window Operation x(n) x(-n-1) Complex Rotate
27/75 Pre-Rotation Complex Rotate – /32 x w (0) x w (1) x w (2) x w (3) x w (4) x w (5) x w (6) x w (7) Complex Rotate –5 /32 Complex Rotate –9 /32 Complex Rotate –13 /32 x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)
28/75 FFT (4 Point Complex) x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7) x c (0) x c (1) x c (2) x c (3) - - e -j /2 - - y c (0) y c (1) y c (2) y c (3) y p (0) y p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)
29/75 Post-Rotation Conjugate Rotate –0 y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) Conjugate Rotate – /8 Conjugate Rotate –2 /8 Conjugate Rotate –3 /8 y p (0) y p (1) y p (2) y p (3) y p (4) y p (5) y p (6) y p (7)
30/75 Reversible MLT Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
31/75 Reversible Unit Transform
32/75 Entropy Coder Input: quantized coefficients Output: embedded coded bitstream with R-D performance curve Goal: Compression Embedded bitstream for future manipulation
33/75 Frame Grouping Time slot Frame
34/75 Entropy Coder D R Bitstream R-D curve
35/75 Entropy Coder Embedded coding Implicit psychoacoustic masking Context modeling Arithmetic coding Implementation concerns
36/75 A block of coefficients Next View graph
37/75 Bits of Coefficients Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 coefficient
38/75 Conventional Coding First Second Third Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w
39/75 Embedded Coding Sign b 1 b 2 b 3 b 4 b 5 b 6 b FirstSecondThird w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 40 Range
40/75 Audio Masking Frequency Critical Band Neighboring Band Noise Level Signal Masking Threshold Maximum Mask Signal-to mask ratio Noise-to mask ratio
41/75 Psychoacoustic Masking Traditional approach (explicit masking, all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding approach) according to the masking Encode the transform coefficients Note Mask modifies the coding content
42/75 Implicit Psychoacoustic Masking Key Mask modifies the coding order, the content is the same Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
43/75 Embedded Coding with Implicit Psychoacoustic Masking Sign b 1 b 2 b 3 b 4 b 5 b 6 b First w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 0 Range Coefficient: Significant Insignificant Mask
44/75 Embedded Coding with Implicit Psychoacoustic Masking Sign b 1 b 2 b 3 b 4 b 5 b 6 b FirstSecond w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 48 Range Coefficient: Significant Insignificant
45/75 Context Modeling Context Zero coding Significant statuses of neighbor coefficients Refinement Whether it is the 1 st refinement pass Significant statuses of neighbor coefficients Sign Neighbor signs
46/75 After Implicit Psychoacoustic Masking & Context Modeling Bit: …… Ctx: …… Automatically generated To be encoded
47/75 Arithmetic Coding – Illustration (QM Coder used) What is arithmetic coding P 0 P0P0 1-P 1 P1P1 1-P 2 P2P2 S 0 =0 S 1 =1 S 2 = Coding result: (Shortest binary bitstream ensures that interval B= to C= is (B,C) A ) A B C
48/75 Entropy Coder (Summary) D R Bitstream R-D curve
49/75 Speed Up Issues Context Modeling Use stored context Update context when a coefficient becomes significant Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask R-D curve calculation Lookup table calculation of distortion Context entropy coder QM coder Run-length Rice coder
50/75 Bitstream Assembly Input : Bitstream R-D curve Output : Assembled bitstream Companion file... Bitstream assembling
51/75 EAC Bitstream Syntax Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes) EAC marker Global Header Timeslot Head Body Timeslot Head Body Timeslot Head Body
52/75 Companion File Global Header Timeslot Head R-D curve Timeslot Head R-D curve Timeslot Head R-D curve
53/75 Rate-Distortion Optimized Assembling (Single Timeslot) D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 r1r1 r2r2 r3r3 r4r4...
54/75 Rate-Distortion Optimized Assembling (Multiple Timeslots) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve
55/75 Allocated Bytes Per Timeslots Allocated bytes for a certain timeslot B i = Buf i-1 – Buf i + Rate trans * Time Where B i : allocated bytes for timeslot i Buf i : buffer occupancy level at timeslot i Rate trans : coding (network) rate per second Time : time duration of the timeslot
56/75 Optimization Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate level with a sliding window ) Buffer occupancy constraint Search for the allocated # of bytes for the current timeslot
57/75 Search (R-D slope) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Underflow (too many bytes) Overflow (too few bytes) Waste bytes
58/75 Multiple Timeslots – Constant Bitrate Buffer Occupancy (Bytes) Time (timeslots) Illegal Region
59/75 Multiple Timeslots – Internet Streaming (Slow Start) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve
60/75 Multiple Timeslots – Internet Streaming (Normal) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region
61/75 Modular Software Design MLT(SW) Quantizer Entropy coder Bitstream Assembly... MLT(SW) Quantizer Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R
62/75 Modular Software Design Highly modularized pipeline design Quantizer, entropy coder can be used for image/video compression as well Probe and data input can be inserted into any part of the program Data flow driven (with necessary memory regulator ) No long delay No need for large memory Memory and computation efficient Working memory preallocated
63/75 Experimental Results
64/75 EAC – Highly Efficient (NMR) Results based on the average of 16 MPEG4 test clips The smaller the NMR, the better EAC WMA MP4TwinVQ 8kbps16kbps32kbps48kbpsCodec
65/75 EAC – Lossless Results based on the average of 16 MPEG4 test clips 1.32WinZip 2.72 Monkey’s Audio 2.72EAC Compression Ratio Codec
66/75 EAC (Versatile) Versatile Real time 2-way communication (Low delay mode) Storage device (Pocket PC, Xbox) Internet streaming
67/75 EAC (Low Delay Mode) Reducing frame size Timeslot = 1 frame Fixed length timeslot bitstream Delay = 2 frame Ignore encoding/decoding delay) Network transmission time (if modem line, delay = 3 frames )
68/75 EAC (Low Delay Mode) Encoder Frame = i-1 i i+1 Start Encoding Frame i MLT, Quantizer, Entropy Bitstream Start Decoding Frame i Entropy, Quantizer network Playable here
69/75 EAC – Flexible Bitstream Syntax Flexible bitstream syntax Parser may reassemble the bitstream 1000x real time Change bit rate, # of audio channels, audio sampling rate
70/75 EAC – Software Software Encoder: 8x realtime (Stereo, 44.1kHz sampling) Decoder: 20x realtime Parser: 1000x realtime
71/75 EAC - Encoder Audio Encoder Stereo,128kbps Companion file
72/75 EAC - Parser Parser Companion file Stereo,128kbps Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling Server
73/75 EAC - Decoder Decoder Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling
74/75 Comparison Original MP4 TwinVQ WMA EAC MP3
75/75 Conclusions An embedded audio coder is developed Highly efficient Versatile Low delay, constant bitrate, streaming Flexible bitstream Parsing for bitrate, # of audio channels, audio sampling rate Good prototype available realtime execution, small memory footprint