MPEG-4 & Wireless Multimedia Streaming 11/12/2018 MPEG-4 & Wireless Multimedia Streaming CIS 642 Dimosthenis Anthomelidis
Overview Math Background MPEG Family MPEG-4 Overview Packet-Video Technology 11/12/2018
Math Background - DCT Discrete Cosine Transform (DCT) is key method of MPEG compression standard DCT helps separate the image into parts of differing importance Similar to DFT: transforms an image from the spatial domain to the frequency domain 2-dimensional DCT on 16x16-pixel sub-blocks of the source picture 11/12/2018
Math Background – DCT (2) A is the input image, A(i,j) is the intensity of the pixel 11/12/2018
Math Background – DCT (3) Coefficients for the output “image” B: B(k1,k2) is the DCT coefficient Signal energy lies at low frequencies These appear in the upper left corner of the DCT Increasing Horizontal Frequency 11/12/2018
MPEG Family Motion Pictures Expert Group (MPEG): Experts dedicated to standards for digital audio and video History: MPEG-1, MPEG-2 have given rise to: DVD Digital TV Digital Audio Broadcasting MP3 codecs (coder-decoder) MPEG-4 More to come: MPEG-7 (Content Description) 11/12/2018
MPEG-4 Overview Formally ISO/IEC international standard 14496 Audio-visual coding standard Versions 1 & 2 Builds on success of: Digital TV, Interactive graphics Adopts object-based audiovisual representation model Satisfy: Authors (reusability, owner rights) End-users (interaction with content, multimedia to mobile users) 11/12/2018
MPEG-4 Parts Part 1: Systems Part 2: Visual Part 3: Audio Part 4: Conformance Testing Part 5: Reference Software Part 6: DMIF (Delivery Multimedia Integration Framework) Part 7: Optimised software for MPEG-4 tools 11/12/2018
Major Forces Scene is modeled as a composition of objects Coding: units of audio, visual as media objects Object-oriented paradigm Integration: natural and synthetic AV objects Scene is modeled as a composition of objects Multiplexing, synchronization of data associated with media objects Interactivity: locally at the receiver or via a back channel High Compression Mobility (low bit-rate) & Real-time data Identification and Protection of intellectual property 11/12/2018
Convergence of 3 worlds Convergence 11/12/2018
Functionalities Content-based interactivity Compression User is able to select one object in the scene Hybrid natural and synthetic data coding Compression Improved coding efficiency Multiple concurrent data streams 3D natural ‘objects’, virtual reality Universal access Robustness in error-prone environments Content-based scalability Fine granularity in content 11/12/2018
Part 1:Systems Framework for integrating natural and synthetic components of complex multimedia scenes. 11/12/2018
Audiovisual Interactive Part 1:Systems (2) Decoding Primitive AV Objects DAI Audiovisual Interactive Scene Composition and Rendering Network TransMux ... ... Elementary Streams FlexMux Display & local user interaction Ex: MPEG-2 Transport Scene Description Information Object Descriptor 11/12/2018
Systems Structure DAI ESI Composition 11/12/2018 TransMux Layer FlexMux Tool Sync. Compression Composition DAI ESI 11/12/2018
Media Objects Content-based AV representation AVO (AV objects) VOC (Video Object Component), AOC(Audio OC) User may access it AV scene: composition of several media objects organized in hierarchical fashion Leaves: primitive media objects Still images, Video objects etc Objects are placed in elementary streams (Ess) VOP (Video Object Plane): 2D VOC time sample with arbitrary shape. Contains motion parameters, shape info, texture data 11/12/2018
Media Objects (2) Sprites: used to code unchanging backgrounds A scalable object can have an ES for basic quality info plus one or more enhancement layers (Video Object Layer) Visual objects in a scene are described mathematically and given a position in 2D or 3D space Object descriptor identifies all streams associated to one media object: informs the system which ESs belong to an object It has its own ES BIFS (Binary Format for Scenes): language for describing and dynamically changing the scene. Borrows concepts from VRML. 11/12/2018
MPEG-4 scene 11/12/2018
Composition Task of combining all of the separate entities that make up the scene. Multimedia scenes are conceived as hierarchical structures represented as a graph. Each leaf is a media object. Graph structure isn’t necessarily static. Composition info is delivered in one elementary stream 11/12/2018
Multiplex (1) 3-layer multiplex: Sync Layer: adding info for timing and synchronization FlexMux layer: multiplexing streams with different characteristics Transmux Layer: adapting the multiplexed stream to the particular network characteristics Elementary streams are packetized adding headers with timing info (clock references) and synchronization data (timestamps). They make up the synchronization layer 11/12/2018
Multiplex (2) Flexible multiplex layer: intermediate multiplex layer. Group together several low-bit-rate streams (with similar QoS requirements). Transport multiplex layer: it is specific to the characteristics of the transport network. No specific transport mechanism is defined: Existing transport formats: ATM, RTP suffice 11/12/2018
Multiplex (3) 11/12/2018
Multiplex (4) 11/12/2018
Synchronization layer Associate timing and synchronization Elementary streams (ES) consist of access units: portions of the stream with a specific decoding and composition time. ES are split into SL packets, not necessarily matching the size of the access units. A header attached contains: Sequence number object clock reference- a time stamp used to reconstruct the time base for the object (speed of the encoder clock) Decoding time stamp- identify the correct time to decode the access unit Composition time stamp- identify the correct time to render a decoded access unit 11/12/2018
MP4 File Format Reliable way for users to exchange complete files of MPEG-4 content 11/12/2018
MPEG-J MPEG-4 specific subset of Java Defines interfaces to elements in the scene, network resources, terminal resources Personal Profile: lightweight package for personal devices Network Scene Resource 11/12/2018
Part 2 – Visual “rectangular” video objects Arbitrary shaped objects Binary shape: an encoded pixel either is or is not part of the object in question (on/off). Useful for low-bit rate environments Alpha shape: for higher-quality content each pixel is assigned a value for its transparency 11/12/2018
Visual (Cont’d) MPEG-2 defines the decoding process. Encoding processes are left to the marketplace. Provide users a new level of interaction with visual contents Manipulate objects Error robustness Scalability: minimum subset that can be decoded – Base layer. Each of the other bitstreams is called enhancement layer Optimized for 3 bitrate ranges: < 64 kbps ( wireless scenario) 64-384 kbps 384-4 Mbps 11/12/2018
Error Resilience Very important for mobile communications because of error burstiness Resynchronization Errors are localized through the use of resynchronization markers. These markers can be inserted in the bitstream. If error then decoder skips data till next marker and restarts from that point. Insertion after constant #coded bits - “video packets”. Data partitioning - motion info seperated from texture info If error in texture bits use decoded motion info. Header Extension code: redundant info, vital for correct decoding video Reversible Variable Length code: codewords decoded in forward and backward. If error it’s possible to decode portions of the corrupted bitstream in reverse order. 11/12/2018
Scalability Use of multiple VOLs (base layer-enhancement layer) Spatial scalability Enhancement layer improves spatial resolution Temporal scalability: Offers higher frame rate. Improves smoothness of motion (temporal resolution) Generalized framework: a scalability preprocessor implements the desired scalability. For spatial scal., it down-samples the input VOPs to produce the base layer which is encoded by base-layer encoder. The reconstructed base layer is up-sampled by a mid-processor. The difference from original VOP is the input for enhancement encoder. 11/12/2018
Hold that smile Map images onto computer-generated shapes. 11/12/2018
Applications Criteria Non real-time, Non-symmetric, Non-Interactive Timing constraints Real-time or non real-time Symmetry of transmission facilities Interactivity Non real-time, Non-symmetric, Non-Interactive Multimedia broadcasting for mobile devices Manufacturers of mobile equipment and providers of mobile services have been adopting MPEG-4 11/12/2018
Mobile Interactive Multimedia Mobile computing= portable computer + wireless comm. Limitations: Limited computation capacity Narrow bandwidth Unreliable channel Requirements High Compression Error resilience 11/12/2018
Thinking small Moving video possible at very low bit-rates for mobile devices. Even at 10kb/s (GSM’s data rate) Use of scalable objects: providers need encode clips only once. A base layer conveys all the info in some basic quality Already existing MPEG-4 hardware decoders, encoders to bring video to mobile devices (e.g Toshiba) 11/12/2018
Packet Video Technology Visual communication “anywhere – anytime” Compliant with MPEG-4 visual spec. Optimized for single rectangular objects based on motion compensation and DCT coding of macroblocks Scalability: allows subsets of a single bitstream to go to a receiver. You encode once and deliver to multiple decoders with different capabilities 11/12/2018
Video Encoding 11/12/2018
Rate Control Rate control: multiple layer bitstreams Temporal scalability – adding enhancement to a base layer Spatial scalability – adding enhancement with differential images 11/12/2018
Video Decoding 11/12/2018
PV error-resilient decoding 11/12/2018
Products Software-based solutions PVPlayer: decoder application for rendering PVServer: server application PVAuthor: encoder, create MP4 file format bit stream 11/12/2018
Conclusion Extensive tests show that MPEG-4 achieves better or similar image qualities at all bitrates targeted, with the bonus of added functionalities. 11/12/2018
References http://www.cselt.it/mpeg/ http://www.packetvideo.com 11/12/2018