Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/75 Embedded Audio Coder Jin Li 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream.

Similar presentations


Presentation on theme: "1/75 Embedded Audio Coder Jin Li 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream."— Presentation transcript:

1

2 1/75 Embedded Audio Coder Jin Li

3 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design Experimental results & demos Conclusion

4 3/75 Introduction

5 4/75 Introduction – Audio Compression Audio Waveform... Bitstream

6 5/75 EAC vs. Other Compression Existing audio compression schemes MP3, AAC, MPEG4 audio, WMA, Real Audio, … Why research for a new audio codec?

7 6/75 Media vs. File Compression File compression Every bit is important, has to be compressed losslessly Media compression Exact bit/value is not important, distortion is tolerable Amount of media is huge, high compression ratio is required Media needs adaptation

8 7/75 Key Features of EAC Not only good compression performance But also flexible bitstream syntax The compressed bitstream may be manipulated for  Different bitrate  Different # of audio channels  Different audio sampling rate Versatile Lossless Low delay Streaming/storage application

9 8/75 EAC Encoder Encoder... Master Bitstream Companion File

10 9/75 Parser Except header, application bitstream is a subset of the master bitstream (parsing is fast) May be changed according to the required bitrate, # of audio channels, and audio sampling rate Parser... Master Bitstream Companion File... Application Bitstream

11 10/75 EAC Decoder Encoder... Bitstream Speaker (Direct Sound).wav file

12 11/75 Embedded Audio Coder - Algorithm Description

13 12/75 Frame Work - Encoder Transform Entropy coder Bitstream Assembly... Transform Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R

14 13/75 Audio Transform Input: audio sample Output: transform coefficient Goal: convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic characteristics Enable audio sampling rate change

15 14/75 Lossy vs Lossless Mode MLT(SW) Audio Quantization Lossy mode Reversible MLT(SW) Audio Lossless mode

16 15/75 Lossy (Float) Pass

17 16/75 MLT - Modulated Lapped Transforms Spatial Response Frequency Domain

18 17/75 MLT with Window Switching Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and only if  Energy is bigger than a certain threshold  Energy within the 8 subframes (256 samples) differs more than T a  There are at least two neighbor subframes, where the energy of the former subframe is greater than the latter subframe by T b

19 18/75 Band Separation Audio (44.1kHz sampling) MLT with window switching Band separation 0   

20 19/75 Synthesis (Half Sampling) Audio (22.05kHz sampling) MLT with window switching Band separation 0  

21 20/75 Synthesis (Quarter Sampling) Audio (11.025kHz sampling) MLT with window switching Band separation 0 

22 21/75 Quantizer Input: coefficient Output: quantized coefficient Goal: convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding

23 22/75 Quantizer Scalar quantizer with a deadzone Quantized MagnitudeSign 0 

24 23/75 Lossless (Integer) Pass

25 24/75 Key to Achieve Lossless Break the MLT into small steps Make every step reversible Definition of reversible transform Integer input, integer output The transform should have a determinant of 1 (donot expand data volume)

26 25/75 MLT Framework Pre-Rotate Complex FFT Post Rotation DCT IV Window Lapped Transform Pre-Rotate -l Complex FFT -l Post Rotation -l Inv Window-l Forward MLT Inverse MLT

27 26/75 Window Operation x(n) x(-n-1) Complex Rotate

28 27/75 Pre-Rotation Complex Rotate –  /32 x w (0) x w (1) x w (2) x w (3) x w (4) x w (5) x w (6) x w (7) Complex Rotate –5  /32 Complex Rotate –9  /32 Complex Rotate –13  /32 x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)

29 28/75 FFT (4 Point Complex) x p (0) x p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7) x c (0) x c (1) x c (2) x c (3) - - e -j  /2 - - y c (0) y c (1) y c (2) y c (3) y p (0) y p (1) x p (2) x p (3) x p (4) x p (5) x p (6) x p (7)

30 29/75 Post-Rotation Conjugate Rotate –0  y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) Conjugate Rotate –  /8 Conjugate Rotate –2  /8 Conjugate Rotate –3  /8 y p (0) y p (1) y p (2) y p (3) y p (4) y p (5) y p (6) y p (7)

31 30/75 Reversible MLT Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation

32 31/75 Reversible Unit Transform

33 32/75 Entropy Coder Input: quantized coefficients Output: embedded coded bitstream with R-D performance curve Goal: Compression Embedded bitstream for future manipulation

34 33/75 Frame Grouping Time slot 1 2 3 4 5 6 7 8 Frame

35 34/75 Entropy Coder D R Bitstream R-D curve

36 35/75 Entropy Coder Embedded coding Implicit psychoacoustic masking Context modeling Arithmetic coding Implementation concerns

37 36/75 A block of coefficients 45000 -74-1300 21040 14023 0000 3040 0350 0000 010 -4330 0010 0000 -4500 -180019 40230 000 Next View graph

38 37/75 Bits of Coefficients 0101101 01011 01+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 coefficient 45 -74 21 14 -4 -18 4

39 38/75 Conventional Coding First Second Third 01011010101101+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- 01011010101101+1001010- 0010101+ Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 46 -74 22 0 0 0 0 0

40 39/75 Embedded Coding 0 1- 0 0 0 0 0 0 1+ 0 0 0 0 0 0 0 0 0 1+ 0 0 1- 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 01011010101101+ 1001010- 0010101+ 0001110+ 0000100- 0010010- 0000100+ 0000001- FirstSecondThird w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 40 Range 32..47 -72 -79..-64 16..31 24 -31..31 0 0 -24 -31..31 0 0

41 40/75 Audio Masking Frequency Critical Band Neighboring Band Noise Level Signal Masking Threshold Maximum Mask Signal-to mask ratio Noise-to mask ratio

42 41/75 Psychoacoustic Masking Traditional approach (explicit masking, all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding approach) according to the masking Encode the transform coefficients Note Mask modifies the coding content

43 42/75 Implicit Psychoacoustic Masking Key Mask modifies the coding order, the content is the same Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process

44 43/75 Embedded Coding with Implicit Psychoacoustic Masking 0 1- 0 0 0 0 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 00 1- 0 0 0 0 0 0 First w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 0 Range -63..63 -96 -127..-64 -63..63 0 0 0 0 -127..127 0 0 Coefficient: Significant Insignificant Mask

45 44/75 Embedded Coding with Implicit Psychoacoustic Masking 0 1- 0 0 0 0 0 0 1+ 0 0 0 0 0 0 0 Sign b 1 b 2 b 3 b 4 b 5 b 6 b 7 0101+ 10- 00 00 00 00 00 00 FirstSecond w0w1w2w3w4w5w6w7w0w1w2w3w4w5w6w7 Value 48 Range 32..63 -96 -127..-64 -31..31 0 0 0 0 -63..63 0 0 Coefficient: Significant Insignificant

46 45/75 Context Modeling Context Zero coding  Significant statuses of neighbor coefficients Refinement  Whether it is the 1 st refinement pass  Significant statuses of neighbor coefficients Sign  Neighbor signs

47 46/75 After Implicit Psychoacoustic Masking & Context Modeling 45000 -74-1300 21040 14023 0000 3040 0350 0000 010 -4330 0010 0000 -4500 -180019 40230 000 Bit: 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 …… Ctx: 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 …… Automatically generated To be encoded

48 47/75 Arithmetic Coding – Illustration (QM Coder used) What is arithmetic coding 0 1 1-P 0 P0P0 1-P 1 P1P1 1-P 2 P2P2 S 0 =0 S 1 =1 S 2 =0 0.100 Coding result: (Shortest binary bitstream ensures that interval B=0.100 0000000 to C=0.100 1111111 is (B,C)  A ) A B C

49 48/75 Entropy Coder (Summary) D R Bitstream R-D curve

50 49/75 Speed Up Issues Context Modeling Use stored context Update context when a coefficient becomes significant Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask R-D curve calculation Lookup table calculation of distortion Context entropy coder QM coder Run-length Rice coder

51 50/75 Bitstream Assembly Input : Bitstream R-D curve Output : Assembled bitstream Companion file... Bitstream assembling

52 51/75 EAC Bitstream Syntax Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes) EAC marker Global Header Timeslot Head Body Timeslot Head Body Timeslot Head Body

53 52/75 Companion File Global Header Timeslot Head R-D curve Timeslot Head R-D curve Timeslot Head R-D curve

54 53/75 Rate-Distortion Optimized Assembling (Single Timeslot) D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 D1D1 R1R1 D2D2 R2R2 D3D3 R3R3 D4D4 R4R4 r1r1 r2r2 r3r3 r4r4...

55 54/75 Rate-Distortion Optimized Assembling (Multiple Timeslots) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve

56 55/75 Allocated Bytes Per Timeslots Allocated bytes for a certain timeslot B i = Buf i-1 – Buf i + Rate trans * Time Where B i : allocated bytes for timeslot i Buf i : buffer occupancy level at timeslot i Rate trans : coding (network) rate per second Time : time duration of the timeslot

57 56/75 Optimization Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate level with a sliding window ) Buffer occupancy constraint Search for the allocated # of bytes for the current timeslot

58 57/75 Search (R-D slope) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Underflow (too many bytes) Overflow (too few bytes) Waste bytes

59 58/75 Multiple Timeslots – Constant Bitrate Buffer Occupancy (Bytes) Time (timeslots) Illegal Region

60 59/75 Multiple Timeslots – Internet Streaming (Slow Start) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region Buffer-Occupancy Curve

61 60/75 Multiple Timeslots – Internet Streaming (Normal) Buffer Occupancy (Bytes) Time (timeslots) Illegal Region

62 61/75 Modular Software Design MLT(SW) Quantizer Entropy coder Bitstream Assembly... MLT(SW) Quantizer Entropy coder Bitstream Assembly Audio Bitstream L+R(or mono) L-R

63 62/75 Modular Software Design Highly modularized pipeline design Quantizer, entropy coder can be used for image/video compression as well Probe and data input can be inserted into any part of the program Data flow driven (with necessary memory regulator ) No long delay No need for large memory Memory and computation efficient Working memory preallocated

64 63/75 Experimental Results

65 64/75 EAC – Highly Efficient (NMR) Results based on the average of 16 MPEG4 test clips The smaller the NMR, the better 6.695.682.80-.22EAC 8.475.563.250.40WMA 7.487.005.714.48MP4TwinVQ 8kbps16kbps32kbps48kbpsCodec

66 65/75 EAC – Lossless Results based on the average of 16 MPEG4 test clips 1.32WinZip 2.72 Monkey’s Audio 2.72EAC Compression Ratio Codec

67 66/75 EAC (Versatile) Versatile Real time 2-way communication (Low delay mode) Storage device (Pocket PC, Xbox) Internet streaming

68 67/75 EAC (Low Delay Mode) Reducing frame size Timeslot = 1 frame Fixed length timeslot bitstream Delay = 2 frame Ignore encoding/decoding delay) Network transmission time (if modem line, delay = 3 frames )

69 68/75 EAC (Low Delay Mode) Encoder Frame = i-1 i i+1 Start Encoding Frame i MLT, Quantizer, Entropy Bitstream Start Decoding Frame i Entropy, Quantizer network Playable here

70 69/75 EAC – Flexible Bitstream Syntax Flexible bitstream syntax Parser may reassemble the bitstream 1000x real time Change  bit rate,  # of audio channels,  audio sampling rate

71 70/75 EAC – Software Software Encoder: 8x realtime (Stereo, 44.1kHz sampling) Decoder: 20x realtime Parser: 1000x realtime

72 71/75 EAC - Encoder Audio Encoder Stereo,128kbps Companion file

73 72/75 EAC - Parser Parser Companion file Stereo,128kbps Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling Server

74 73/75 EAC - Decoder Decoder Stereo, 16kbps Mono, 8kbps Stereo, 16kbps, Slow start Mono, 8kbps, 11kHz sampling

75 74/75 Comparison Original MP4 TwinVQ WMA EAC MP3

76 75/75 Conclusions An embedded audio coder is developed Highly efficient Versatile  Low delay, constant bitrate, streaming Flexible bitstream  Parsing for bitrate, # of audio channels, audio sampling rate Good prototype available  realtime execution, small memory footprint


Download ppt "1/75 Embedded Audio Coder Jin Li 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream."

Similar presentations


Ads by Google