Multiplexing H.264 and HEAACv2 elementary streams, de-multiplexing and achieving lip synchronization during playback Naveen Siddaraju
Contents: Introduction : Need for multiplexing Overview of codecs used Transport protocols Multiplexing De-multiplexing and synchronization Results Conclusions Future work References
Introduction: need for multiplexing Digital television broadcasting ATSC- M/H [17] DVB- H DVB- T Internet streaming IPTV, YouTube etc.
MPEG transport system [17]
Choice of CODECs Depends on the application. Transport bandwidth - ATSC-M/H channel bandwidth 19.6Mbps - DVB-H channel bandwidth 14 Mbps Processing power of the target device
H.264/ AVC Defined in MPEG4 part 10 Jointly developed by ITU – T VCEG and MPEG group of ISO/IEC. Provides better compression than its predecessors like MPEG 2 video and MPEG 4 part 2. Suitable for a wide variety of applications. Adopted standard in ATSC-M/H, DVB etc Used in Blu-ray discs, DVDs, iTunes, flash player, video conferencing applications etc
Different profiles of H.264[5]
Frame types Three basic types Intra predictive (I) frame Predictive (P) frame Bi predictive (B) frame IDR frame is a special type of I frame. - indicates the start of a video sequence.
Bitstream syntax of H.264 Data is organized into two layers VCL (video coding layer) NAL (network abstraction layer) NAL formatting of VCL and non-VCL data [6]
Forbidden bit NRI - 2bits Type - 5 bits NAL unit format[6]
NAL unit types [1]
Important NAL unit types IDR frames - indicates start a of new video sequence Sequence parameter sets (SPS) - contains parameters common to entire sequence - profile, level, size of the video, no of reference frames Picture parameter sets (PPS) - contains parameters that to a frame or some frames in a sequence - entropy coding, quantization parameters etc.
H.264 stream [37]
HEAACv2 Also called enhanced aac plus Developed by coding technologies for very low bitrate applications. Defined in MPEG4 part 3 amendment 2 Enables coding in mono, stereo and multi channels (up to 48 channels ) Is a combination of AAC, SBR, PS Provides highest perceptible quality for the lowest bitrate Adopted as audio standard in ATSC- M/H, DVB, XM satellite radio Can exist in a variety of file formats like mp4, m4a. Controlled testing conducted by 3gpp [27] indicates that HEAACv2 provides good quality audio at 24kbps.
HEAACv2 family of codecs [7]
AAC (advanced audio codec) Successor of the MP3 format Defined both in MPEG2 [3] and MPEG4 [2] Achieves better sound quality than MP3 for same bitrates. AAC is also the standard audio format for apple iPhone, iPod, iPad, Sony playstation etc. Up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1channels in MPEG-2 mode) More sampling frequencies (from 8 to 96 kHz) than MP3 (16 to 48 kHz) Achieves good quality audio at 128 kbps for stereo.
SBR (spectral band replication) [2] SBR is a bandwidth expansion technique Exploits the correlation between the high and low frequencies. Using SBR, along with AAC, high quality stereo sound can be achieved at 48 kbps.
High band reconstruction through SBR [28] Original audio signal [28]. High band reconstruction through SBR [28].
PS (parametric stereo) [2] Only used for low bitrate applications ( < 32kbps) Parameterizes the stereo image such as time/phase differences, interchannel intensity differences etc. Only monaural version of the stereo is encoded by the AAC encoder. At the decoder side the monaural signal is decoded first, and then stereo signal is reconstructed using the PS parameters Using PS along with AAC and SBR, reasonable quality stereo sound can be achieved at 24 kbps.
HEAACv2 bitstream formats ADIF (audio data interchange format) - has just one header for the whole stream - used in storage media. ADTS (audio data transport stream) - used in transport stream. - has headers in every access unit.
ADTS header format[2][3]
Profile bits expansion [2] [3]
ADTS bit stream [3]
Transport protocols Most multimedia applications involve communication channels or storage. RTP (real time protocol) - transport over IP networks MPEG2 systems - digital television broadcast - storage (asset management)
MPEG2 systems Defines two types streams - Program stream (PS) - used for storage, ex. DVD - Transport stream (TS) - used for digital broadcast Two layers of packetization - PES (packetized elementary streams) - TS (transport stream)
MPEG2 transport stream [22]
PES (packetized elementary stream) First layer of packetization Separates audio video elementary streams into access units. Variable length Contains a header and payload (frame) data. Add fields like time stamp, stream ID, packet length
Conversion of an elementary stream into PES packets [29]
PES packet header format used [4]
Frame number as time stamp For video, fps is a constant through out the sequence. For audio, sampling frequency is a constant through out the sequence.
TS packets Second layer of packetization Fixed length (188 bytes) PES is logically broken down in to 188 byte packets Three byte header contains packet ID, payload unit start flag, continuity counter etc.
Transport stream (TS) packet format
TS header description: payload unit start indicator (PUSI) flag - indicates payload has PES header. Adaptation field control (AFC) flag - indicates payload is less than 185 bytes Continuity counter (CC) (4 bits) - 4 bit counter, used to check for any packet losses, out of sequences etc. Packet ID (PID) (10 bits) - uniquely identifies the particular ES, the packet belongs to Optional offset byte : - contains the offset value is AFC is set.
Multiplexing What is multiplexing ? Multiplexing is a process of transmitting TS packets belonging to different elementary streams. Muxing is a processes of how effectively the TS packets are interleaved in the TS stream, so that both audio and video contents get transmitted simultaneously. Buffer overflow/ underflow - Can cause picture loss, skip during audio video playback.
Multiplexing flowchart
Calculation of presentation time of a TS packet: For video TS packet For audio TS packet
Video processing
Audio processing
De-multiplexing The transport stream (TS) input to a receiver is separated into a video elementary stream and audio elementary stream. These ES are initially written in to video and audio buffers respectively. Once one of the buffers is full, the elementary stream is reconstructed from the point of synchronization.
Audio- video synchronization Once video buffer is full, it is searched for the next occurring IDR frame in the video buffer. Corresponding audio frame is calculated from the equation Elementary streams are reconstructed from that point. merged in to a container format (using mkv merge), then played back.
Results : Buffer fullness
Test conditions : Video H.264 baseline profile Resolution: 416X240 GOP: IPPP (IDR forced) Fps: 24 Audio HEAACv2 ADTS format Sampling frequency: 24,000Hz
De-multiplexer output Test clip12 Clip length (sec)30 50 Video FPS24 Audio sampling frequency (Hz)24000 total video frames Total audio frames Video raw file (.yuv) size(kB) Audio raw file (.wav) size(kB) H.264 file size(kB) AAC file size (kB) Video compression ratio Audio compression ratio H.264 encoder bitrate(kBps) AAC encoder bitrate(kbps)32 Total TS packets Transport stream size(kB) Transport stream bitrate (kBps) Test clip size (kB) Reconstructed clip size (kB)
Skew observed
Conclusions buffer fullness was effectively handled with maximum buffer difference observed was around 20ms of media content audio-video synchronization was achieved with a maximum skew of 13ms.
Future work Expand the multiplexing algorithm to multiplex multiple programs Implement the same multiplexing algorithm for other transport protocols like RTP/IP Add error correction to TS stream.
References: [1] MPEG-4: ISO/IEC JTC1/SC : Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding, ISO/IEC, [2] MPEG-4: ISO/IEC JTC1/SC : Information technology — coding of audio-visual objects — Part 3: Audio, AMENDMENT 4: Audio Lossless Coding (ALS), new audio profiles and BSAC extensions [3] MPEG–2: ISO/IEC JTC1/SC –7, advanced audio coding, AAC. International Standard IS WG11, [4]MPEG-2: ISO/IEC Information technology—generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC: [5] Soon-kak Kwon et al. “Overview of H.264 / MPEG-4 Part 10 (pp )”, Special issue on “ Emerging H.264/AVC video coding standard”, J. Visual Communication and Image Representation, vol. 17, pp , April [6] A. Puri et al. “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol.19, pp , Oct [7] MPEG-4 HE-AAC v2 — audio coding for today's digital media world, article in the EBU technical review (01/2006) giving explanations on HE-AAC. Link: HE-AAC v2 — audio coding for today's digital media worldEBUhttp://tech.ebu.ch/docs/techreview/trev_305-moser.pdf [8]ETSI TS “Implementation guidelines for the use of video and audio coding in broadcasting applications based on the MPEG-2 transport stream”. [9] 3GPP TS : General Audio Codec audio processing functions; Enhanced aacPlus General Audio Codec; 2009 [10] 3GPP TS : Enhanced aacPlus general audio codec; Encoder Specification AAC part. [11] 3GPP TS : Enhanced aacPlus general audio codec; Encoder Specification SBR part. [12] 3GPP TS : Enhanced aacPlus general audio codec; Encoder Specification Parametric Stereo part.
[13] [14] MPEG Transport Stream. Link: [15] MPEG-4: ISO/IEC JTC1/SC : Information technology — coding of audio-visual objects — Part 14 :MP4 file format, 2003 [16] DVB-H : Global mobile TV. Link : [17] ATSC-M/H. Link : [18] Open mobile vidéo coalition. Link : [19] VC-1 Compressed Video Bitstream Format and Decoding Process (SMPTE 421M-2006), SMPTE Standard, 2006 ( [20] Henning Schulzrinne's RTP page. Link: Schulzrinne's RTP pagehttp:// [21] G.A Davidson et al, “ATSC video and audio coding”, Proc. IEEE, vol 94, pp , Jan ( [22] I. E.G.Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation multimedia”, Wiley, [23] European Broadcasting Union, [24] Shintaro Ueda, et, al “NAL level stream authentication for H.264/AVC”, IPSJ Digital courier, Vol 3, Feb [25] World DMB: link: [26] ISDB website. Link:
[27] 3gpp website. Link: [28] “Audio compression gets better and more complex” Mihir Modi, link : [29]”MPEG-2: Overview of systems layer”, by PA Sarginson. Link: [30] MPEG-2 ISO/IEC : GENERIC CODING OF MOVING PICTURES AND AUDIO: part 1- SYSTEMS Amendment 3: Transport of AVC video data over ITU-T Rec H |ISO/IEC streams, 2003 [31] MKV merge software. Link: [32] VLC media player. Link: [33] Gom media player. Link: [34] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, TX May [34] Gerold Blakowski et.al “A Media Synchronization Survey: Reference Model, Specification, and Case Studies”, IEEE Journal on selected areas in communications, VOL. 14, NO. 1, JANUARY 1996 [35] H.264/AVC JM Software link: [36] 3GPP Enhanced aacPlus reference software. Link: [37] H.264 bitstream link:
Thank you