Multiplexing/De-multiplexing Dirac Video with AAC Audio bit-stream Presented By, Ashwini Urs Under guidance of Dr.K.R.Rao
Motivation Exponential rise in demand for video streaming over internet and mobile devices. To counter the bandwidth limitations, an efficient video and audio compression scheme is essential. Exploiting the open source benefits of Dirac video codec.
Outline Applications Overview of Dirac video codec Overview of AAC audio codec Multiplexing process De-multiplexing process Results Conclusions Future work References
Areas of Application Digital video broadcasting : DVB –T (Terrestrial), DVB – S (Satellite), DVB-C (cable). [10] [21] IPTV.[24] ATSC- M/H (Mobile/Handheld). [16][17] [18]
Block diagram of Multiplexing /De-multiplexing
Overview of Dirac video codec Dirac [31] is a hybrid video codec developed by British Broadcasting Corporation (BBC). [1] Dirac is an open technology, meaning that it can be used without payment of licensing fees. Dirac uses modern techniques like, wavelets for transform and arithmetic coding for entropy coding. [2] Dirac video codec has two software implementations, namely, Dirac research and Schrodinger. Dirac research is used in this thesis. [28]
Features of Dirac Dirac video codec achieves state of the art performance - good quality at low bitrates, leading to lower costs. [26] The applications of Dirac range from high definition television (HDTV) to web streaming due to its flexibility. [1] Performance of Dirac for HD media seems as apt as H.264, due to absence of large and intolerable variations between the two codecs [2]. With industry plans for “On demand” TV and enormous demand for streaming over internet, open platform technologies like Dirac have become highly significant.[26]
Figure.1 Block diagram of Dirac encoder [1]
Figure.2 Block diagram of Dirac decoder
Dirac Bit- stream syntax The Dirac bit stream begins with a sequence header followed by the byte stream data in the form of frames and ends with end of a sequence. The start of the frame data is signaled by the parse info header. Parse info header shall contain 4 byte code i.e. 0x42, 0x42,0x43 and 0x44. [23] This is followed by parse code which indicates the type of picture : intra (I), predictive (P) or bi-directional (B) frame. Next parse offset consists of 4 bytes signaling the data from first byte of current parse info header to the first byte of the next parse info header, if present. It shall be zero, if there is no further parse info header. [23] Previous parse offset consists of 4 bytes signaling the data from first byte of current parse info header to the first byte of the previous parse info header, if present. It shall be zero, if there is no further parse info header. [23]
Table.1 Parse codes. [23]
Overview of Advanced audio coder (AAC) AAC [4][37] consists of three profiles, namely: main profile, low-complexity profile and scalable sampling rate (SSR) profile. The key feature of low-complexity profile is, it eliminates the use of prediction tool and reduces the complexity for the temporal noise shaping tool Supports Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz). [4][37] Superior performance at bit rates > 64 kbps and at bit rates reaching as low as 16 kbps.
Features of AAC AAC provides improved compression over earlier audio codecs and provides higher-quality audio at low bit rates. [4] AAC can accommodate two basic bit stream formats: Audio data interchange format (ADIF) and Audio data transport stream (ADTS) . [4] The feature of ADTS is that it has one header for each frame followed by raw block of data.
Figure.3 Block Diagram AAC encoder [4]
Figure.4 Block Diagram AAC decoder [4]
Table.2 ADTS bit stream header format [36] [37] Field Number of bits Description syncword 12 Always ‘111111111111’ ADTS fixed header ID 1 0:MPEG-4 1: MPEG-2 layer 2 Always”00” protection_absent profile Explained below sampling_frequency_index 4 private_bit channel_configuration 3 original/copy home copyright_identification_bit ADTS variable header copyright_identification_start aac_frame_length 13 Length of frame including header (in bytes) adts_buffer_fullness 11 0x7FF indicates VBR no_raw_data_blocks_in_frame crc_check 16 Only if protection_absent ==0 raw_data_blocks Variable size Table.2 ADTS bit stream header format [36] [37]
Table.3 AAC profile bits expansion. [36] [37] Bits ID == 1 (MPEG -2 profile) ID ==0 (MPEG-4 object type) 00(0) Main profile 01(1) Low complexity profile 10(2) Scalable sampling rate 11(3) reserved Table.3 AAC profile bits expansion. [36] [37]
Figure.5 Block diagram of ATSC transreception showing MPEG-2 Transport stream(TS) multiplexer [20]
Factors to be considered during multiplexing data to a single stream Each one of video/audio elementary stream data needs to be prioritized equally. This is essential to maintain buffer fullness at the receiver’s end. Information in the form of time stamps is embedded to the PES header which will assist in maintaining lip sync. Finally, since the multiplexed data will be transmitted in an error prone network, a provision for error detection and correction methods is to be made.
Figure.6 Encapsulation of PES from elementary streams [9]
Table.6 PES header description [19] Header fields Size (Bytes) Description PES packet start prefix 3 0x000001 Video/audio stream ID 1 Unique for Video/audio PES packet length 2 Unbounded when exceeds 65536 bytes and value of 0 is indicated under such circumstances. Time stamps Frame numbers used a s time stamps Table.6 PES header description [19]
Figure.7 Splitting of PES packet to several TS packets [9]
Time stamps Frame numbers are used as time stamps. Time stamp information is embedded in the PES header of each frame. Time stamp information helps to achieve synchronization during playback.
Advantages of using frame numbers over program clock reference (PCR) Less complex. Can be easily incorporated in software and is well suited. Savings in PES header bytes. (In case of PCR, time information needs to be sent periodically) Synchronization can be effectively achieved due to absence of clock jitters and inaccurate clock samples.
Figure.8 MPEG transport stream and standard structure of MPEG-TS packet [12] [24].
Figure.9 Structure of TS packet header used in this thesis. AFC : Adaptation field control
Table.7 TS packet header Header fields Size Sync byte 8 bits PUSI 1bit AFC CC 4 bits PID (Unique for video/audio) 10 bits Byte offset (optional) (If AFC ==1 && LPES < 185 bytes) Stuffing bytes Added in case of byte offset Table.7 TS packet header
Example of multiplexing process Consider, A video PES packet of 500 bytes @ frame rate = 25 frames per second (fps). Number of TS packets for PES = 500/185 = 2.7 ~= 3 TS packets. Video PES duration = 1/25 = 0.04 = 40ms. Duration of each TS packet = 40/3 = 13.33ms. Consider , An audio PES packet of 600 bytes @ sampling frequency = 44100 Hz. Number of TS packets for PES = 600/185 = 3.2 ~ =4 TS packets. Audio PES duration = 1024/44100 = 0.04 = 23.2ms. Duration of each TS packet = 23.2/4 = 5.8ms.
De-multiplexing De-multiplexing is the process of retrieving video and audio elementary streams from the transmitted elementary stream. Two buffers maintained separately to extract video and audio data respectively. While extracting video/audio data, the video and audio buffers need to be constantly monitored to ensure no buffer overflow or underflow of data takes place. In this thesis, the data can be de-multiplexed from start to end as well as from any TS packet.
Figure.10 Flowchart of the de-multiplexer
Buffer fullness check When the buffer fullness of the data is monitored at the de-multiplexer, the video and audio content playback time is calculated as follows : Video content playback time = Number of video frames × (1/fps) Audio content playback time = Number of audio frames × (1024/sampling frequency)
Calculation of presentation time The data is de-multiplexed from any TS packet, video and audio presentation time for the chosen frame number is calculated as follows:
Lip synchronization Calculating the delay between audio and video presentation time at the de-multiplexer, helps to determine audio/video synchronization during playback. The maximum delay that is allowed by MPEG-2 standard is ± 40ms while synchronization between audio and video. [25]
Results
Table .8 Inter coding: test clips characteristics Details of test clips Clip 1 (Shrek) Clip 2 (Despicable me) Clip Duration (seconds) 50 36 Resolution 480 × 204 480 × 270 Chroma sub-sampling format YUV 420P No: of video frames 1206 885 No: of audio frames 2166 1589 Video frame rate (fps) 24 24000/1001 Audio sampling frequency(kHz) 44.1 YUV file size (kB) 172985.63 168011.72 WAV file size (kB) 8658.04 6349.54 .DRC file size(kB) 4041.28 2629.67 .AAC file size (kB) 676.32 563.95 Video compression ratio 43:1 64:1 Audio compression ratio 13:1 12:1 DRC video bitrate (kbps) 80.83 73.05 AAC audio bitrate (kbps) 13.53 15.67 Video TS packets 23042 15039 Audio TS packets 4772 3927 TS packets 27814 18966 TS file size(kB) 5106.48 3482.039 TS bitrate(kbps) 102.13 96.72 Original AVI file size (kB) 4227 3199 Table .8 Inter coding: test clips characteristics
Table .9 Intra coding: test clips characteristics Details of test clips Clip 1 (Despicable me) Clip 2 Clip Duration (seconds) 36 10 Resolution 480 × 270 480 × 360 Chroma sub-sampling format YUV 420P No: of video frames 885 224 No: of audio frames 1589 385 Video frame rate (fps) 24000/1001 25 Audio sampling frequency(kHz) 44.1 YUV file size (kB) 168011.72 56700 WAV file size (kB) 6349.54 1534.54 .DRC file size(kB) 16290.97 5496.87 .AAC file size (kB) 563.95 126.15 Video compression ratio 10:1 Audio compression ratio 12:1 DRC video bitrate (kbps) 452.53 549.69 AAC audio bitrate (kbps) 15.67 12.62 Video TS packets 90714 30563 Audio TS packets 3927 842 TS packets 94641 31405 TS file size (kB) 17375.5 5765.76 TS bitrate (kbps) 482.65 576.57 Original AVI file size (kB) 3199 1083 Table .9 Intra coding: test clips characteristics
Figure.11 Frame 140 : luma component of Clip2 : 36 sec
Figure. 12 Frame 80 : luma component of Clip2 :10 sec
Check for buffer fullness # of Video Frames in buffer # of Audio frames in buffer Size of Video buffer (kB) Size of Audio buffer Video content playback time* (in sec) Audio content playback time (in sec) 100 185 277.9268 64.7295 4.2960 4.2957 200 367 631.0459 130.1924 8.5085 8.5217 400 732 1055.0078 257.5605 17.0170 16.9970 500 916 1425.7266 328.4414 21.2713 21.2695 600 1098 1670.6875 392.0518 25.4838 25.4955 700 1281 2068.3242 456.8135 29.7381 29.7448 800 1464 2382.5303 524.0430 33.9923 33.9940 Table. 10 Clip2: Check for buffer fullness and video/audio content playback time (Inter coding) *Video content playback time (sec) = (# of video frames + # of SPS/PPS) × (1/fps)
No: of Video Frames in buffer No : of Audio frames in buffer Size of Video buffer (kB) Size of Audio buffer Video content playback time* (in sec) Audio content playback time (in sec) 30 56 711.6436 16.6016 1.2800 1.3003 60 112 1419.9668 35.0361 2.6000 2.6006 100 188 2369.0645 60.0449 4.3600 4.3654 150 283 3545.3672 92.5586 6.5600 6.5712 200 378 4873.7568 123.8281 8.7600 8.7771 Table .11 Clip2: Check for buffer fullness and video/audio content playback time (Intra coding)
De- multiplexer output TS packet number Video IDR frame number chosen Synchronized audio frame number Chosen video frame presentation time (sec) Chosen audio frame presentation time (sec) Delay (ms) Visual delay Perceptible? 1 2 0.0417 0.0464 4.7 No 100 65 117 2.7110 2.7167 5.7 2000 146 262 6.0894 6.0836 5.8 10000 491 882 20.4788 20.4800 1.2 12000 611 1097 25.4838 25.4723 11.5 Table.12 Clip 2: Output of de-multiplexer (Inter coding) IDR : Instantaneous decoder refresh
Table.13 Clip 2: Output of de-multiplexer (Intra coding) TS packet number Video IDR frame number chosen Synchronized audio frame number Chosen video frame presentation time (sec) Chosen audio frame presentation time (sec) Delay (ms) Visual delay Perceptible? 1 2 0.0400 0.0464 6.4 No 5000 38 65 1.5200 1.5093 10.7 10000 75 129 3.0000 2.9954 4.6 20000 149 257 5.9600 5.9675 7.5 25000 182 314 7.2800 7.2911 11.1 Table.13 Clip 2: Output of de-multiplexer (Intra coding)
Conclusions Non-NAL (network abstraction layer) syntax format of Dirac was used for multiplexing process and there is no need for conversion to NAL syntax, if not available. Visually no delay is perceptible between video and audio. Lip synchronization was effectively achieved using time stamps as frame numbers and from any TS packet.
Future Work Multiplexing algorithm can be modified to support for multiple program elementary streams. Mkvmerge [32] supports Intra (I) and Predictive (P) pictures and is ignoring the fact that Dirac video has bi-directional predicted (B) frames. Hence, mkvmerge can be modified to support for B pictures of Dirac. In practice, the data is broadcasted in an error prone network. Rugged error correction codes can be incorporated to this implementation.
References [1] T. Borer and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review, July 2005. [2] K.Oriantar, K.K. Loo and Z.Xue, “Performance comparison of emerging Dirac video codec with H.264/AVC”, School of engineering and design, Brunel university, UK. [3] A. Ravi, "Performance analysis and comparison of Dirac video codec with H.264/ MPEG-4 Part 10 AVC", M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, Aug. 2009. [4] MPEG. Information technology - Generic coding of moving pictures and associated audio information, part 4: Conformance testing. International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998. [5] M. Bosi and M. Goldberg, “Introduction to digital audio coding and standards”, Boston: Kluwer, 2003. [6] H. Eeckhaut et al, “Speeding up Dirac’s entropy coder”, proceedings of the 5th WSEAS conference on multimedia , internet and video technologies, Corfu, Greece, pp: 120 -125, August 2005. [7] “Encapsulation of Dirac in ISO/IEC 13818-1”, British Broadcasting Corporation. [8] “Dirac ISO/IEC 13818-1 Transport stream mapping”, encapsulation of Dirac video content and time code markers in ISO/IEC 13818-1 transport streams, draft, Feb 2007.
[9] P.A. Sarginson, “MPEG-2: Overview of systems layer”, BBC RD 1996/2. [10] J.M. Boyce, “The United States television broadcasting transition”, IEEE signal processing magazine, vol.26, pp. 110-112, May 2009. [11] K. Brandenburg, “MP3 and AAC Explained”, AES 17th International conference, Florence, Italy, Sept. 1999. [12] “Introduction to MPEG”, excerpted from IPTV basics, Lawrence Harte. [13] Website: http://www.atsc.org/cms/ ATSC : Advanced television systems committee. [14] Dirac information : http://en.wikipedia.org/wiki/Dirac_%28video_compression_format%29 [15] Audio sampling frequency table link : http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio#Sampling_Frequencies [16] DVB-H link : http://dvb-h.org/ [17] ATSC-M/H link : http://atsc.org/cms/ [18] Mobile video DTV : http://www.openmobilevideo.com/about-mobile-dtv/standards/
[19] Information Technology – Generic coding of moving pictures and associated audio: Systems, International Standard 13818-1, ISO/IEC JTC1/SC29/WG11 N0801, 1994. [20] B. J. Lechner, et al, “The ATSC transport layer, including program and system information (PSIP),” Proceedings of the IEEE , vol. 94, no. 1, pp.77–101, Jan. 2006. [21] “Special issue on global digital television: technology and emerging services”, Proceedings of the IEEE, vol. 94, pp. 5-332, Jan. 2006. [22] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, May 2007. [23] Dirac specification: Version 2.3, Issued: Sept.2008. [24] MPEG transport stream: http://www.iptvdictionary.com/iptv_dictionary_MPEG_Transport_Stream_TS_definition.html [25] MPEG. Information technology — generic coding of moving pictures and associated audio information, part 4: Conformance testing .International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998. [26] Dirac document: “Dirac compression family using open technology”. http://www.bbc.co.uk/rd/projects/dirac/index.shtml
[27] Digital Video Systems Characteristics Standard for Cable Television, ANSI/SCTE 432004. [28] Dirac website: www.diracvideo.org [29] Digital audio compression standard (AC-3, E-AC-3), revision B, ATSC Document A/52B, Advanced Television Systems Committee, Washington, D.C., Jun. 14, 2005 : www.atsc.org [30] /CMPT 365 Course Slides/, School of Computing Science, Simon Fraser ... fig3: http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.3/chap 4.3.html [31] Dirac software: http://sourceforge.net/projects/dirac/ [32] Mkvmerge software: http://www.bunkus.org/videotools/mkvtoolnix/downloads.html [33] I.H.Witten, R.M. Neal and J.G.Cleary: “Arithmetic coding for data compression”, Communications of the ACM, Vol.30, No.6, pp.520-540, June 1987. [34] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal processing: image communication, vol. 19, issue 9, pp. 793-849, Oct. 2004. [35] G. A. Davidson, “Digital audio coding: Dolby AC-3,” in the Digital Signal Processing Handbook, V. K. Madisetti and D. B. Williams, Eds. Boca Raton, FL: CRC, pp. 41-1–41-21, 1998.
[36] MPEG-4:ISO/IEC JTC1/SC29 14496-3: Information technology – coding of audio-visual objects-part 3 : Audio, amendment 4:audio lossless coding(ALS), new audio profiles and ASAC extensions. [37] MPEG-2: ISO/IEC JTC1/SC29 13818-7: Advanced audio coding, AAC. IS WG11, 1997. [38] ffmpeg software: http://www.ffmpeg.org/ [39] FAAC software : www.audiocoding.com [40] VLC mediaplayer : www.videolan.org [41] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4, part 10”, for the book “Advances in reasoning-based image processing, analysis and intelligent systems: conventional and intelligent paradigms”, 2011. [42] H.Murugan and K.R.Rao, “ Multiplexing H.264 video with AAC audio bit streams, de-multiplexing and achieving lip sync”, ICEAST 2007, Bangkok, Thailand, 21 – 23 Nov.2007. [43] H. Kalva, et al, “Implementing multiplexing, streaming and server interaction for MPEG-4”, IEEE Trans. on circuits and systems for video technology, vol. 9, No. 8, pp. 1299-1311, Dec. 1999. [44] C. C. Todd, et al, “AC-3: perceptual coding for audio transmission and storage,” presented at the 96th Conv. Audio Engineering Society., 1994, preprint 3796.
[45] M. Uehara, “Application of MPEG-2 systems to terrestrial ISDB (ISDB-T)”, Proceedings of the IEEE, vol.94, pp. 261-268, Jan. 2006. [46] G.A. Davidson, et al, “ATSC video and audio coding”, Proceedings of the IEEE, vol. 94, pp. 60 - 76, Jan. 2006. [47] J.Jain and P. Desale, “Low power transport de-multiplexer for ATSC and DVB broadcast format”, LSI research and development Pune pvt. Ltd. [48] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4, part 10”, IJWMIP accepted : July 2011. [49] AVI sequences: www.youtube.com [50] AVAREX YUV PLAYER: http://www.avarexgroup.com/en/solutions.php?sub=2&product=rawplayer [51] Information technology- advanced coding of audio and video part -7: mobile video [52] Geraold Blakowski et.al “A media synchronization survey: Reference model, specification and case studies”, IEEE journal on selected areas in communications, vol.14, no.1, January 1996.
Questions ?
Thank You!