1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.

Slides:



Advertisements
Similar presentations
March 24, 2004 Will H.264 Live Up to the Promise of MPEG-4 ? Vide / SURA March Marshall Eubanks Chief Technology Officer.
Advertisements

TRIPS Primary Memory System Simha Sethumadhavan 1.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
MPEG-2 to H.264/AVC Transcoding Techniques Jun Xin Xilient Inc. Cupertino, CA.
1 A HIGH THROUGHPUT PIPELINED ARCHITECTURE FOR H.264/AVC DEBLOCKING FILTER Kefalas Nikolaos, Theodoridis George VLSI Design Lab. Electrical & Computer.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, ICT '09. TAREK OUNI WALID AYEDI MOHAMED ABID NATIONAL ENGINEERING SCHOOL OF SFAX New Low Complexity.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple,
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Ch. 6- H.264/AVC Part I (pp.160~199) Sheng-kai Lin
Overview of the H.264/AVC Video Coding Standard
CS :: Fall 2003 MPEG Video (Part 2) Ketan Mayer-Patel.
CS :: Fall 2003 MPEG-1 Video (Part 1) Ketan Mayer-Patel.
Transform Domain Distributed Video Coding. Outline  Another Approach  Side Information  Motion Compensation.
BY AMRUTA KULKARNI STUDENT ID : UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video.
An Introduction to H.264/AVC and 3D Video Coding.
January 26, Nick Feamster Development of a Transcoding Algorithm from MPEG to H.263.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
CSE679: MPEG r MPEG-1 r MPEG-2. MPEG r MPEG: Motion Pictures Experts Group r Standard for encoding videos/movies/motion pictures r Evolving set of standards.
H.264 Deblocking Filter Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.
MPEG-1 and MPEG-2 Digital Video Coding Standards Author: Thomas Sikora Presenter: Chaojun Liang.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
Outline JVT/H.26L: History, Goals, Applications, Structure
A Flexible Multi-Core Platform For Multi-Standard Video Applications Soo-Ik Chae Center for SoC Design Technology Seoul National University MPSoC 2009.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.
June, 1999 An Introduction to MPEG School of Computer Science, University of Central Florida, VLSI and M-5 Research Group Tao.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
By: Hitesh Yadav Supervising Professor: Dr. K. R. Rao Department of Electrical Engineering The University of Texas at Arlington Optimization of the Deblocking.
VIDEO COMPRESSION USING NESTED QUADTREE STRUCTURES, LEAF MERGING, AND IMPROVED TECHNIQUES FOR MOTION REPRESENTATION AND ENTROPY CODING Present by fakewen.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Figure 1.a AVS China encoder [3] Video Bit stream.
Low-Power Wireless Video System Advisor: Professor Alex Doboli Students: Christian Austin Artur Kasperek Edward Safo.
Copyright © 2003 Texas Instruments. All rights reserved. DSP C5000 Chapter 18 Image Compression and Hardware Extensions.
Vamsi Krishna Vegunta University of Texas, Arlington
Video Compression—From Concepts to the H.264/AVC Standard
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.
MPEG Video Coding I: MPEG-1 1. Overview  MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.  It is appropriately.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
HEVC Complexity and Implementation Analysis
Thomas Daede October 5, 2017 AV1 Update Thomas Daede October 5, 2017.
Last update on June 15, 2010 Doug Young Suh
Operating System I/O System Monday, August 11, 2008.
5.2 Eleven Advanced Optimizations of Cache Performance
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Supplement, Chapters 6 MC Course, 2009.
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
* From AMD 1996 Publication #18522 Revision E
How to improve (decrease) CPI
Implementation of a De-blocking Filter and Optimization in PLX
What Choices Make A Killer Video Processor Architecture?
MPEG-1 MPEG is short for the ‘Moving Picture Experts Group‘.
Presentation transcript:

1 Modular Refinement of H.264 Kermin Fleming

2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel blocks –Smaller blocks 4x4, 8x4, 4x8 In-loop deblocking filter Base profile Bluespec implementation –Works on FPGA!

3 H.264 Overview

4 H.264 Modules NAL unwrap –Unwraps network packets –Byte stream separated by special tags Entropy Decoder –Decodes various slices, parameters –Primarily Golomb encoded –Residual data uses CAVLC Inverse Transform –Reconstructs whole blocks –Quantized frequency coefficients

5 H.264 Modules Intra-prediction –Prediction based on previously blocks –Corrected by residual Inter-predication –Correlation between frames –Motion vectors Deblocking filter –Removes prediction artifacts Frame Buffer –Maintains cache of previous frames

6 Modular Refinement Latency insensitive design –Data centric –Swap functionally equivalent modules –Design exploration easy Bluespec generates control –Design timing change? –No problem.

7 Deblocking Filter Details Block prediction leaves artifacts Apply a smoothing filter across macroblock boundaries Highly configurable Macroblock Filter Order

8 Original Implementation Store the whole macroblock Iteratively filter the macroblock Store and stream left macroblock Simple to reason about – very like software BAD!!!! –Highly sequential –Large storage requirements –Wiring:

9 Pipelining Sequential execution was a problem Unclear how to pipeline design –Data stored in row major –Can be rotated to column major 16-stage pipeline –Horizontal Filter –Row-to-Column –Vertical Filter –Column-to-Row

10 Pipelining Parallelism Improved –Two filtrations per cycle Memory Reduced –5/8 of macroblock stored –Accesses simplified Fewer Filters –Only need one… Design now far more complex –2x code size

11 Pipeline Issues Throughput improved, but not perfect Structural Hazards –Loads and Stores to the Above memory –Third and Fourth Macroblocks conflict Both need to be rotated at the same time –Outputing Left Blocks Pipeline drain –Control data shared – Pipeline control state

12 Relaxed Memory Ordering Original Sequential Ordering too conservative Above data is not immediately used –Allowing stores to bypass loads –Separate load and store request queues Stalls eliminated –Design complexity stays the same –Artificial dependency removed

13 Side Buffering Frequent conflicts between 4x4 blocks Store one of them in a side buffer When the resource is available, release the stored data –Sometimes ordering matters – sometimes not –Memory acts a reorder buffer Encode priority in rule Deadlock can be a problem…

14 Other Refinement Pipelined Interpredict rules –Chroma interpolation Improved Interpolator filter implementation Improved memory subsystem –Previously too general –Needless crossbar Interpolation Sampling

15 Results

16 Results Nearly 60 fps at 1080p Power, area, and throughput improvements Fast Deblocking filter implementation –Faster than any known implementation –Does it really matter?

17 Questions?