1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.

Slides:



Advertisements
Similar presentations
Parallelizing Video Transcoding With Load Balancing On Cloud Computing Song Lin, Xinfeng Zhang, Qin Y, Siwei Ma Circuits and Systems, 2013 IEEE.
Advertisements

MPEG4 Natural Video Coding Functionalities: –Coding of arbitrary shaped objects –Efficient compression of video and images over wide range of bit rates.
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Ai-Mei Huang And Truong Nguyen Image processing, 2006 IEEE international conference on Motion vector processing based on residual energy information for.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
{ Fast Disparity Estimation Using Spatio- temporal Correlation of Disparity Field for Multiview Video Coding Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen.
An Improved 3DRS Algorithm for Video De-interlacing Songnan Li, Jianguo Du, Debin Zhao, Qian Huang, Wen Gao in IEEE Proc. Picture Coding Symposium (PCS),
Limin Liu, Member, IEEE Zhen Li, Member, IEEE Edward J. Delp, Fellow, IEEE CSVT 2009.
Compressed-domain-based Transmission Distortion Modeling for Precoded H.264/AVC Video Fan li Guizhong Liu IEEE transactions on circuits and systems for.
Fast Mode Decision for Multiview Video Coding Liquan Shen, Tao Yan, Zhi Liu, Zhaoyang Zhang, Ping An, Lei Yang ICIP
CABAC Based Bit Estimation for Fast H.264 RD Optimization Decision
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Ai-mei Huang And Truong Nguyen IEEE, WORLD OF WIRELESS, MOBILE AND MULTIMEDIA NETWORKS. (WOWMOM), 2008 IEEE, WORLD OF WIRELESS, MOBILE AND MULTIMEDIA NETWORKS.
Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen IEEE TCE, 2010.
Yu-Han Chen, Tung-Chien Chen, Chuan-Yung Tsai, Sung-Fang Tsai, and Liang-Gee Chen, Fellow, IEEE IEEE CSVT
Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao
Novel Point-Oriented Inner Searches for Fast Block Motion Lai-Man Po, Chi-Wang Ting, Ka-Man Wong, and Ka-Ho Ng IEEE TRANSACTIONS ON MULTIMEDIA, VOL.9,
FAST MACROBLOCK MODE SELECTION BASED ON MOTION CONTENT CLASSIFICATION IN H.264/AVC Ming Yang, Wensheng Wang ICIP 2004.
Recursive End-to-end Distortion Estimation with Model-based Cross-correlation Approximation Hua Yang, Kenneth Rose Signal Compression Lab University of.
Outline Introduction Introduction Fast Inter Prediction Mode Decision for H.264 – –Pre-encoding An Efficient Inter Mode Decision Approach for H.264 Video.
11 A Memory Interleaving and Interlacing Architecture for Deblocking Filter in H.264/AVC Yeong-Kang Lai, Member, IEEE, Lien-Fei Chen, Student Member, IEEE,
Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVTH.264 Mei-Juan Chen, Yi-Yen Chiang, Hung- Ju Li and Ming-Chieh Chi ISCAS 2004.
Low-complexity mode decision for MVC Liquan Shen, Zhi Liu, Ping An, Ran Ma and Zhaoyang Zhang CSVT
An Error-Resilient GOP Structure for Robust Video Transmission Tao Fang, Lap-Pui Chau Electrical and Electronic Engineering, Nanyan Techonological University.
Efficient Motion Vector Recovery Algorithm for H.264 Based on a Polynomial Model Jinghong Zheng and Lap-Pui Chau IEEE TRANSACTIONS ON MULTIMEDIA, June.
An Efficient Low Bit-Rate Video-coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam, Wan-Chi Siu IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS.
Analysis, Fast Algorithm, and VLSI Architecture Design for H
FAST MULTI-BLOCK SELECTION FOR H.264 VIDEO CODING Chang, A.; Wong, P.H.W.; Yeung, Y.M.; Au, O.C.; Circuits and Systems, ISCAS '04. Proceedings of.
Multi-Frame Reference in H.264/AVC 卓傳育. Outline Introduction to Multi-Frame Reference in H.264/AVC Multi-Frame Reference Problem Two papers propose to.
Motion Vector Refinement for High-Performance Transcoding Jeongnam Youn, Ming-Ting Sun, Fellow,IEEE, Chia-Wen Lin IEEE TRANSACTIONS ON MULTIMEDIA, MARCH.
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
Fundamentals of Multimedia Chapter 11 MPEG Video Coding I MPEG-1 and 2
Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.
A Low-Power VLSI Architecture for Full-Search Block-Matching Motion Estimation Viet L. Do and Kenneth Y. Yun IEEE Transactions on Circuits and Systems.
An Introduction to H.264/AVC and 3D Video Coding.
HARDEEPSINH JADEJA UTA ID: What is Transcoding The operation of converting video in one format to another format. It is the ability to take.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
January 26, Nick Feamster Development of a Transcoding Algorithm from MPEG to H.263.
Conference title 1 A WYNER-ZIV TO H.264 VIDEO TRANSCODER José Luis Martínez, Pedro Cuenca, Gerardo Fernández-Escribano, Francisco José Quiles and Hari.
1 Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi- Processor Architecture Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney.
1 Efficient Reference Frame Selector for H.264 Tien-Ying Kuo, Hsin-Ju Lu IEEE CSVT 2008.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
Low-Power H.264 Video Compression Architecture for Mobile Communication Student: Tai-Jung Huang Advisor: Jar-Ferr Yang Teacher: Jenn-Jier Lien.
MOTION ESTIMATION IMPLEMENTATION IN VERILOG
2 3 Be introduced in H.264 FRExt profile, but most H.264 profiles do not support it. Do not need motion estimation operation.
Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Guillaume Laroche, Joel Jung, Beatrice Pesquet-Popescu CSVT
Computational Complexity Management of a Real-Time H.264/AVC Encoder C S Kannangara, I E Richardson, and A J Miller CSVT
Fast motion estimation and mode decision for H.264 video coding in packet loss environment Li Liu, Xinhua Zhuang Computer Science Department, University.
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 Kyungmin Lim, Seongwan Kim, Jaeho Lee, Daehyun Pak and Sangyoun Lee, Member, IEEE 報告者:劉冠宇.
Mode Decision and Fast Motion Estimation in H.264 K.-C. Yang Qionghai Dai, Dongdong Zhu and Rong Ding,”FAST MODE DECISION FOR INTER PREDICTION IN H.264,”
A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
Outline  Introduction  Observations and analysis  Proposed algorithm  Experimental results 2.
Introduction to MPEG Video Coding Dr. S. M. N. Arosha Senanayake, Senior Member/IEEE Associate Professor in Artificial Intelligence Room No: M2.06
An Area-Efficient VLSI Architecture for Variable Block Size Motion Estimation of H.264/AVC Hoai-Huong Nguyen Le' and Jongwoo Bae 1 1 Department of Information.
Fine-granular Motion Matching for Inter-view Motion Skip Mode in Multi-view Video Coding Haitao Yanh, Yilin Chang, Junyan Huo CSVT.
Fast disparity motion estimation in MVC based on range prediction Xiao Zhong Xu, Yun He ICIP 2008.
Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.
Computational Controlled Mode Selection for H.264/AVC June Computational Controlled Mode Selection for H.264/AVC Ariel Kit & Amir Nusboim Supervised.
Sum of Absolute Differences Hardware Accelerator
Fast Decision of Block size, Prediction Mode and Intra Block for H
MPEG4 Natural Video Coding
Optimizing Baseline Profile in H
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien Chen, Chuan-Yung Tsai, Yu-Wen Huang, and Liang-Gee Chen, Fellow, IEEE

2 Outline Introduction Fundamentals and Problem Statement Proposed Data Reuse Scheme Proposed Framework for SRMC Scheme Simulation Results and Performance Evaluation

3 Introduction The H.264/AVC can save 25%-45% and 50%-70% of bitrates when compared with MPEG-4 advanced simple profile and MPEG-2, respectively, but higher computation and memory bandwidth. The inter prediction occupies over 95% of the computational resource, which is mainly caused by multiple reference frames motion estimation (MRF-ME).

4 Introduction We propose a new frame-level DR (data reuse) scheme. With the frame-level rescheduling, the data of one loaded SW (search window) can be reused by multiple current MBs in different original frames for MRF-ME, and the system bandwidth and local memory size is greatly reduced.

5 Fundamentals and Problem Statement A. Inter Prediction in H.264/AVC For variable block size ME (VBS-ME), there are 41 different blocks within one MB and gives rise to a large number of possible combinations. Lagrangian mode decisions : The Lagrangian cost function considers both the distortion and the rate parts. Distortion : sum of absolute differences (SAD) Rate : the number of bits required to code the reference frame number and the motion vectors (MVs).

6 Fundamentals and Problem Statement B. Conventional Data Reuse Scheme and Problem Statement The bandwidth between system memory and ME core is very heavy if all required pixels are loaded from system memory. A common solution is to design local buffers to store reusable data.

7 Fundamentals and Problem Statement B. Conventional Data Reuse Scheme and Problem Statement When the ME of MB-a is finished, and the MB-b will be processed, only the reference pixels in D are loaded to replace A in the local memory.

8 Fundamentals and Problem Statement B. Conventional Data Reuse Scheme and Problem Statement There are four SW memories, and each SW memory will be independently loaded and updated.

9 Fundamentals and Problem Statement B. Conventional Data Reuse Scheme and Problem Statement The hardware cost is almost proportional to the maximum reference frame number. The more system bandwidth requirement, the more power consumption. The more memory size, the more silicon area and cost.

10 Proposed Data Reuse Scheme A. Frame-Level Data Reuse

11 Proposed Data Reuse Scheme A. Frame-Level Data Reuse In MRSC (multiple reference frames single current macroblock) scheme, one current MB is loaded only one time, and one reference SW is loaded several times. In SRMC (single reference frame multiple current macroblocks) scheme, one current MB is loaded several times while one reference SW is only loaded once. Since the SW is much larger than one MB, both the bandwidth and memory size can be largely reduced.

12 Proposed Data Reuse Scheme B. Frame-Level Rescheduling It is assumed that there are six MBs in each frame and five P-frames to be coded. The maximum reference frame number is four.

13 Proposed Data Reuse Scheme B. Frame-Level Rescheduling The first, second, third and fourth ME cubes in one column represent the step-1, step-2, step-3, and step-4 searching processes in Fig.3.

14 Proposed Data Reuse Scheme B. Frame-Level Rescheduling The first, second, third, and fourth ME cubes in one vertical column represents the step-1, step-2, step-3, and step-4 searching processes in fig. 4.

15 Proposed Framework for SRMC Scheme A. Mode Decision for SRMC Scheme The problem of inaccurate mode decision will occur after the block-level data reuse with the parallel hardware and frame- level rescheduling with the proposed SRMC scheme.

16 Proposed Framework for SRMC Scheme A. Mode Decision for SRMC Scheme In the reference software (JM8.5), the Lagrangian cost function takes MV costs into consideration. The MV of each block is generally predicted by the medium value of MVs from the left, top, and top-right neighboring blocks. The exact MVPs of variable blocks are changed to the medium of MVs of the left, top, and top-right MBs in order to facilitate the parallel processing with block-level data reuse.

17 Proposed Framework for SRMC Scheme A. Mode Decision for SRMC Scheme

18 Proposed Framework for SRMC Scheme A. Mode Decision for SRMC Scheme The mode decision flow is divided into partial mode decision (PMD) and final mode decision (FMD) as Table II. The MVs and the distortion costs of these suboptimal results are written to the external memory. After the PMD results of all reference frames are generated for a certain current MB, the FMD decides the best combination of variable blocks in different reference frames with system RISC.

19 Proposed Framework for SRMC Scheme B. Architecture Design The ME core computes the candidates ’ distortion value, and the PMD engine on-line decides the MVs of variable blocks according to the estimated MVPs. The PMD results are buffered at system memory, and then the RISC performs FMD.

20 Proposed Framework for SRMC Scheme B. Architecture Design

21 Proposed Framework for SRMC Scheme B. Architecture Design The SW at the frame t-4 is loaded to SW buffer first. Then, the ME task of the current MB in the frame t-3 will be performed. After that, the FMD of this current MB is then done by RISC after the PMD results are written out. At the same time, the current MBs at the same location of the following frames as t-2, t-1, t are processed one after another.

22 Simulation Results and Performance Evaluation A. Simulation Results Four sequences: Foreman, Mobile, Akiyo, and Stefan. The encoding parameters are Baseline profile, IPPP … structure, four reference frames, 16-pel search range, and low complexity mode decision.

23 Simulation Results and Performance Evaluation A. Simulation Results

24 Simulation Results and Performance Evaluation B. Performance Evaluation MRSC: SRMC: The includes the MVs and the matching costs of variable blocks. So it is relatively small.

25 Simulation Results and Performance Evaluation B. Performance Evaluation The increases with larger search range. Therefore, the proposed SRMC scheme has better performance for the videos the larger frame sizes that inherently require larger search range.

26 Conclusion By frame-level rescheduling, the procedures for multiple current MBs in different original frames can simultaneously utilize the data of single SW. In the proposed framework, the SRMC scheme reduces not only 63% of external system bandwidth but also 75% of internal memory size for HDTV specifications.