Overview of MPEG-4 Lihang Ying Department of Computing Science University of Alberta, Edmonton, Canada These slides are available online:
Outline MPEG-4 Demos and Overview Demos Overview How to Organize MPEG-4 Contents – Scene/Object Description Examples Study Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
Demos EnvivioTV:
It’s a plug-in for realplayer, media player or quicktime
Characters(1) MPEG-4 vs MPEG-1/2 Not merely video and audio Interactive Object-based Scalability
Characters(2) Why MPEG-4? Interoperability: Run on all kinds of platforms and devices Reuse Multimedia contents Create once, use everywhere Multi-network Delivery Internet/Mobile/Broadcast Networks Different bandwidth Scalability Different capacity (i.e. display resolution) of different devices
MPEG-J API: org.iso.mpeg.mpegj org.iso.mpeg.mpegj.scene org.iso.mpeg.mpegj.resource org.iso.mpeg.mpegj.decoder org.iso.mpeg.mpegj.net Implement MPEG-4 Coder/Decoder conveniently with MPEG-J API Create Coder/Decoder once, run on all kinds of devices and platforms
Profile/Level Different Implementations: Profile Divide functionality into different subsets Level Constraints on parameters(bitrate,frames/sec … ) Example: EnvivioTV Video: Advanced simple profile at levels Audio: High-quality profile at levels Graphics: Advanced profile
Interactive Multi-network Delivery Coder/Decoder: Using MPEG-J Scalability: Different Capacity Profile/Level Not merely audio/video Object-based Interoperability
Outline MPEG-4 Demos and Overview Demos Overview How to Organize MPEG-4 Contents – Scene/Object Description Examples Study Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
How to Organize Contents Scene Descriptor Assemble objects into audiovisual scene Scene description format — binary format for MPEG-4 scenes (BIFS) Object Descriptor Describe objects
initial object description ES_Descriptor 1 ES_Descriptor 2 scene descriptor stream BIFS update (replace scene) scene description scene description Video Source Audio Source object descriptor stream object descriptor object descriptor update object descr. object descr. ES_Descr 1 ES_Desc 2 visual stream (base layer) visual stream (e.g. temporal enhancement layer) audio stream ES_ID 1 ES_ID 2 ES_D 1 ES_ID c ES_ID b ES_ID a ES_IDi ES_IDii
Scene Description - BIFS Represented by XMT-A Format: Similar to XML Express bitstream syntax in document Enable easy generation of bitstream parser BIFS Examples: …
BIFS Example(1) –Trivial Scene(MPEG-2/DVD) Scene Tree Layer2D Sound2D AudioSource Shape Bitmap Appearance MovieTexture
BIFS Example(1) –Trivial Scene(MPEG-2/DVD)
BIFS Example(2) –Movie with Subtitles
BIFS Example(3) –Icons Icons
BIFS Example(4) –Buttons Event Response
Object Description Syntactic Description Language (SDL) Express bitstream syntax in document Enable easy generation of bitstream parser SDL Example: …
Object Description - SDL ObjectDescriptor class ObjectDescriptor extends ObjectDescriptorBase: bit(8) tag=ObjectDescrTag { bit(10) ObjectDescriptorID; bit(1) URL_Flag; const bit(5) reserved=0b1111.1; if (URL_Flag) { bit(8) URLlength; bit(8) URLstring(URLlength); } else { ES_Descriptor esDescr[1..255];ES_Descriptor OCI_Descriptor ociDescr[0..255]; IPMP_DescritporPointer ipmpDescriPtr[0..255]; } ExtensionDescriptor extDescr[0..255]; }
Object Descriptor Summary ObjectDescriptor ObjectDescriptorID URL_Flag ES_Descriptor // Elementary Streaming ES_ID, streamDependenceFlag, URL_Flag, OCRstreamFlag, streamPriority, DecoderConfigDescriptor, SLConfigDescriptor, IPI_DescrPointer, IP_IdentificationDataSet, IPMP_DescriptorPointer, LanguageDescriptor, QoS_Decriptor...DecoderConfigDescriptor OCI_Descriptor // Object Content Information ContentClassificationDescriptor, KeywordDescriptor, RatingDecriptor, LanguageDescriptor, ShortTextualDescriptor, ExpandedTextualDescriptor, ContentCreatorNameDescriptor, ContentCreationDataDescriptor, OCICreatorNameDescriptor, OCICreationNameDescriptor, SmpteCameraPositionDescriptor, MediaTimeDescriptor,... IPMP_DescriptorPointer // Intellectual Property Management and Protection Applications of OCI/IPMP–eDonkey’s problems
MPEG-4 Objects and Tools Audio Natural Audio Synthetic and Natural Hybrid Coding(SNHC) Visual Natural Video Object-based/Scalability SNHC 2D/3D Mesh Object/Face and Body Animation Image Text …
Outline MPEG-4 Demos and Overview Demos Overview How to Organize MPEG-4 Contents – Scene/Object Description Examples Study Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding
[2D Mesh Coding] Natural Video Coding Block-based textual and motion coding Shape information coding 2D Mesh Coding Designed for video manipulation 2D mesh or 2D planar graphs with triangles Natural images and video mapped on 2D meshes Applications: Object tracking, Content-based video retrieval(e.g. motion-based queries), 2D animation, Augmented reality, …
Example (a)original frame (b)Mesh generated (c)Text overlaid on video:Text moves along with the fish ’ s meshs
Architecture of 2D Mesh Coding
2D Mesh Object Also called 2D Dynamic Mesh Support video coding by moving the vertices of the mesh Topology of the mesh does not change in one session Mesh Data includes: Connectivity: how vertices are connected Geometry: 2D coordinates of vertices Motion: temporal difference of vertices ’ positions
I-MOP and P-MOP I-MOP:Intra-Mesh Object Plane For a given session, connectivity and geometry information needs to be transmitted only once P-MOP:Inter-Mesh Object Plane The deformation of the given mesh over time can be described as temporal difference of the geometry, or geometry motion
2D Mesh Decoding Scheme
Mesh Data - Connectivity Uniform Triangulation: Suited for rectangular video objects Located in x and y grids Specify the length of grid intervals
Mesh Data - Connectivity Delaunay Triangulation: Suited for arbitrarily shaped video objects Guarantee: Close to Equilateral: producing the largest minimal angle Unique: unique triangulation for given vertices
Coding of Connectivity Data Uniform Triangulation: Delaunay Triangulation: Differential coding: x n =x n-1 +dx n, y n =y n-1 +dy n
Coding Order of Delaunay Triangulation 1) Boundary vertices Start from top-left most Counterclockwise 2) Inside vertices Choice the next by distance-closest one
Coding of Mesh Motion Motion: temporal difference of vertices ’ positions Mesh Traversal: 1) Start from top-left, breadth-first 2) Right(Next counterclockwise) 3) Left This order remain unchanged(intact) until next I- MOP is decoded Mesh Motion Coding Encoded based on previously encoded two neighboring vertices, e.g.
[3D Mesh Coding] 2D Mesh Coding: supports to map natural images and video mapped on 2D meshes 3D Mesh Coding: Represent and compress 3D objects onto which images and videos may be mapped Compress static 3D models, not their animation
Functionalities High compression 2%-4% of VRML ASCII file Incremental rendering Building the model with part bitstream Error resilience Suffer less from network errors Hierarchical buildup Scalable bitstream with different resolutions, depending on viewing distance
Incremental Rendering
Data of 3D Mesh Object Connectivity: how vertices are connected Geometry: 3D coordinates of vertices Photometry Colors Normals Texture
Bitstream of 3D Mesh Coding Connectivity Data Vertex graph Triangle tree Triangle Data Contains: geometry coordinates, colors, normals, texture coordinates Largest part of the bitstream
Bitstream of 3D Mesh Coding Connectivity Data is packed separately and before the Triangle Data. Benefits: Incremental rendering: Could decode Triangle Data incrementally since full Connectivity(topology) Data is already available Shorten the latency Error resilience: Can form 3D structure even with some missing Triangle Data
Decoding Scheme of 3D Mesh
Vertex Graph
Triangle Tree
Data of 3D Mesh Object Connectivity: how vertices are connected Geometry: 3D coordinates of vertices Photometry Colors Normals Texture
Coding of Geometry and Photometry Data 1) Quantization 2) Differential Coding No prediction Parallelogram prediction Tree prediction 3) Adaptive Arithmetric Entropy Coding Code the differential values
3D Mesh Coding Modes Error-Resilience Mode To minimize the impact of errors, divide into partition or packet Render partitions independently Progressive Transmission Mode Scalable coding One base layer One or more enhancement layers Provide Forest Split operations Contains face forest, triangle tree, triangle data
Forest Split Operation (a) Cut through the edges of vertex tree (b) Open the dotted line (c) Triangulate the opening to form a triangle tree (d) Refined mesh
References Books: Major Reference: Major Reference: Fernando Pereira,Touradj Ebrahimi,The MPEG-4 Book, Prenticle Hall PTR, 2002 Natural Video Coding Technology: Joan L.Mitchell,etc. MPEG Video Compression Standard, Chapman&Hall, 1996 MPEG Official Websites: Overview: Resources Resources: Demos: MPEG-4 Series Slides, Course Presentation of C640/2003 Winter, U. of Alberta:
The End Acknowledgements Yongjie Liu Michael Closson Questions and Comments?
DecoderConfigDescriptor Class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag { bit(8) objectTypeIndication; bit(6) streamType; bit(1) upStream; const bit(1) reserved=1; bit(24) bufferSizeDB; bit(32) maxBitrate; bit(32) avgBitrate; DecoderSpecificInfo decSpecificInfo[0..1]; profileLevelIndicationIndexDescriptor profileLevelIndicationIndexDescr[0..255]; } Back