Download presentation
Presentation is loading. Please wait.
Published byLouisa Weaver Modified over 9 years ago
2
1/75 Embedded Audio Coder Jin Li
3
2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion
4
3/75 Introduction
5
4/75 Introduction – Audio Compression Audio Waveform... Bitstream
6
5/75 EAC vs. Other Compression Existing audio compression schemes MP3, AAC, MPEG4 audio, WMA, Real Audio, … Why research for a new audio codec?
7
6/75 Media vs. File Compression File compression Every bit is important, has to be compressed losslessly Media compression Exact bit/value is not important, distortion is tolerable Amount of media is huge, high compression ratio is required Media needs adaptation
8
7/75 Key Features of EAC Not only good compression performance But also flexible bitstream syntax The compressed bitstream may be manipulated for Different bitrate Different # of audio channels Different audio sampling rate Versatile Lossless Low delay Streaming/storage application
9
8/75 EAC Encoder Encoder... Master Bitstream Companion File
10
9/75 Parser Except header, application bitstream is a subset of the master bitstream (parsing is fast) May be changed according to the required bitrate, # of audio channels, and audio sampling rate Parser... Master Bitstream Companion File... Application Bitstream
11
10/75 EAC Decoder Encoder... Bitstream Speaker (Direct Sound).wav file
12
11/75 Embedded Audio Coder - Algorithm Description
13
12/75 Frame Work - Encoder Transform Entropy coder Bitstream Assembly... Transform Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R
14
13/75 Audio Transform Input: audio sample Output: transform coefficient Goal: convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic characteristics Enable audio sampling rate change
15
14/75 Lossy vs Lossless Mode MLT(SW) Audio Quantization Lossy mode Reversible MLT(SW) Audio Lossless mode
16
15/75 Lossy (Float) Pass
17
16/75 MLT - Modulated Lapped Transforms Spatial Response Frequency Domain
18
17/75 MLT with Window Switching Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than T a There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by T b
19
18/75 Band Separation Audio (44.1kHz sampling) MLT with window switching Band separation 0
20
19/75 Synthesis (Half Sampling) Audio (22.05kHz sampling) MLT with window switching Band separation 0
21
20/75 Synthesis (Quarter Sampling) Audio (11.025kHz sampling) MLT with window switching Band separation 0
22
21/75 Quantizer Input: coefficient Output: quantized coefficient Goal: convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
23
22/75 Quantizer Scalar quantizer with a deadzone Quantized MagnitudeSign 0
24
23/75 Lossless (Integer) Pass
25
24/75 Key to Achieve Lossless Break the MLT into small steps Make every step reversible Definition of reversible transform Integer input, integer output The transform should have a determinant of 1 (donot expand data volume)
26
25/75 MLT Framework Pre-Rotate Complex FFT Post Rotation DCT IV Window Lapped Transform Pre-Rotate -l Complex FFT -l Post Rotation -l Inv Window-l Forward MLT Inverse MLT
27
26/75 Window Operation x(n) x(-n-1) Complex Rotate
28
27/75 Pre-Rotation Complex Rotate – /32 x w (0) x w (1) x w (2) x w (3) x w (4) x w (5) x w (6) x w (7) Complex Rotate –5 /32 Complex Rotate –9 /32 Complex Rotate –13 /32 x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)
29
28/75 FFT (4 Point Complex) x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7) x c (0) x c (1) x c (2) x c (3) - - e -j /2 - - y c (0) y c (1) y c (2) y c (3) y p (0) y p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)
30
29/75 Post-Rotation Conjugate Rotate –0 y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) Conjugate Rotate – /8 Conjugate Rotate –2 /8 Conjugate Rotate –3 /8 y p (0) y p (1) y p (2) y p (3) y p (4) y p (5) y p (6) y p (7)
31
30/75 Reversible MLT Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
32
31/75 Reversible Unit Transform
33
32/75 Entropy Coder Input: quantized coefficients Output: embedded coded bitstream with R-D performance curve Goal: Compression Embedded bitstream for future manipulation
34
33/75 Frame Grouping Time slot 1 2 3 4 5 6 7 8 Frame
35
34/75 Entropy Coder D R Bitstream R-D curve
36
35/75 Entropy Coder Embedded coding Implicit psychoacoustic masking Context modeling Arithmetic coding Implementation concerns
37
36/75 A block of coefficients 45000 -74-1300 21040 14023 0000 3040 0350 0000 010 -4330 0010 0000 -4500 -180019 40230 000 Next View graph
38
37/75 Bits of Coefficients 0101101 01011 01+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 coefficient 45 -74 21 14 -4 -18 4
39
38/75 Conventional Coding First Second Third 01011010101101+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- 01011010101101+1001010- 0010101+ Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 46 -74 22 0 0 0 0 0
40
39/75 Embedded Coding 0 1- 0 0 0 0 0 0 1+ 0 0 0 0 0 0 0 0 0 1+ 0 0 1- 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 01011010101101+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- FirstSecondThird w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 40 Range 32..47 -72 -79..-64 16..31 24 -31..31 0 0 -24 -31..31 0 0
41
40/75 Audio Masking Frequency Critical Band Neighboring Band Noise Level Signal Masking Threshold Maximum Mask Signal-to mask ratio Noise-to mask ratio
42
41/75 Psychoacoustic Masking Traditional approach (explicit masking, all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding approach) according to the masking Encode the transform coefficients Note Mask modifies the coding content
43
42/75 Implicit Psychoacoustic Masking Key Mask modifies the coding order, the content is the same Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
44
43/75 Embedded Coding with Implicit Psychoacoustic Masking 0 1- 0 0 0 0 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 00 1- 0 0 0 0 0 0 First w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 0 Range -63..63 -96 -127..-64 -63..63 0 0 0 0 -127..127 0 0 Coefficient: Significant Insignificant Mask
45
44/75 Embedded Coding with Implicit Psychoacoustic Masking 0 1- 0 0 0 0 0 0 1+ 0 0 0 0 0 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 0101+ 10- 00 00 00 00 00 00 FirstSecond w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 48 Range 32..63 -96 -127..-64 -31..31 0 0 0 0 -63..63 0 0 Coefficient: Significant Insignificant
46
45/75 Context Modeling Context Zero coding Significant statuses of neighbor coefficients Refinement Whether it is the 1 st refinement pass Significant statuses of neighbor coefficients Sign Neighbor signs
47
46/75 After Implicit Psychoacoustic Masking & Context Modeling 45000 -74-1300 21040 14023 0000 3040 0350 0000 010 -4330 0010 0000 -4500 -180019 40230 000 Bit: 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 …… Ctx: 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 …… Automatically generated To be encoded
48
47/75 Arithmetic Coding – Illustration (QM Coder used) What is arithmetic coding 0 1 1-P 0 P0P0 1-P 1 P1P1 1-P 2 P2P2 S 0 =0 S 1 =1 S 2 =0 0.100 Coding result: (Shortest binary bitstream ensures that interval B=0.100 0000000 to C=0.100 1111111 is (B,C) A ) A B C
49
48/75 Entropy Coder (Summary) D R Bitstream R-D curve
50
49/75 Speed Up Issues Context Modeling Use stored context Update context when a coefficient becomes significant Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask R-D curve calculation Lookup table calculation of distortion Context entropy coder QM coder Run-length Rice coder
51
50/75 Bitstream Assembly Input : Bitstream R-D curve Output : Assembled bitstream Companion file... Bitstream assembling
52
51/75 EAC Bitstream Syntax Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes) EAC marker Global Header Timeslot Head Body Timeslot Head Body Timeslot Head Body
53
52/75 Companion File Global Header Timeslot Head R-D curve Timeslot Head R-D curve Timeslot Head R-D curve
54
53/75 Rate-Distortion Optimized Assembling (Single Timeslot) D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 r1r1 r2r2 r3r3 r4r4...
55
54/75 Rate-Distortion Optimized Assembling (Multiple Timeslots) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve
56
55/75 Allocated Bytes Per Timeslots Allocated bytes for a certain timeslot B i = Buf i-1 – Buf i + Rate trans * Time Where B i : allocated bytes for timeslot i Buf i : buffer occupancy level at timeslot i Rate trans : coding (network) rate per second Time : time duration of the timeslot
57
56/75 Optimization Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate level with a sliding window ) Buffer occupancy constraint Search for the allocated # of bytes for the current timeslot
58
57/75 Search (R-D slope) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Underflow (too many bytes) Overflow (too few bytes) Waste bytes
59
58/75 Multiple Timeslots – Constant Bitrate Buffer Occupancy (Bytes) Time (timeslots) Illegal Region
60
59/75 Multiple Timeslots – Internet Streaming (Slow Start) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve
61
60/75 Multiple Timeslots – Internet Streaming (Normal) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region
62
61/75 Modular Software Design MLT(SW) Quantizer Entropy coder Bitstream Assembly... MLT(SW) Quantizer Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R
63
62/75 Modular Software Design Highly modularized pipeline design Quantizer, entropy coder can be used for image/video compression as well Probe and data input can be inserted into any part of the program Data flow driven (with necessary memory regulator ) No long delay No need for large memory Memory and computation efficient Working memory preallocated
64
63/75 Experimental Results
65
64/75 EAC – Highly Efficient (NMR) Results based on the average of 16 MPEG4 test clips The smaller the NMR, the better 6.695.682.80-.22EAC 8.475.563.250.40WMA 7.487.005.714.48MP4TwinVQ 8kbps16kbps32kbps48kbpsCodec
66
65/75 EAC – Lossless Results based on the average of 16 MPEG4 test clips 1.32WinZip 2.72 Monkey’s Audio 2.72EAC Compression Ratio Codec
67
66/75 EAC (Versatile) Versatile Real time 2-way communication (Low delay mode) Storage device (Pocket PC, Xbox) Internet streaming
68
67/75 EAC (Low Delay Mode) Reducing frame size Timeslot = 1 frame Fixed length timeslot bitstream Delay = 2 frame Ignore encoding/decoding delay) Network transmission time (if modem line, delay = 3 frames )
69
68/75 EAC (Low Delay Mode) Encoder Frame = i-1 i i+1 Start Encoding Frame i MLT, Quantizer, Entropy Bitstream Start Decoding Frame i Entropy, Quantizer network Playable here
70
69/75 EAC – Flexible Bitstream Syntax Flexible bitstream syntax Parser may reassemble the bitstream 1000x real time Change bit rate, # of audio channels, audio sampling rate
71
70/75 EAC – Software Software Encoder: 8x realtime (Stereo, 44.1kHz sampling) Decoder: 20x realtime Parser: 1000x realtime
72
71/75 EAC - Encoder Audio Encoder Stereo,128kbps Companion file
73
72/75 EAC - Parser Parser Companion file Stereo,128kbps Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling Server
74
73/75 EAC - Decoder Decoder Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling
75
74/75 Comparison Original MP4 TwinVQ WMA EAC MP3
76
75/75 Conclusions An embedded audio coder is developed Highly efficient Versatile Low delay, constant bitrate, streaming Flexible bitstream Parsing for bitrate, # of audio channels, audio sampling rate Good prototype available realtime execution, small memory footprint
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.