Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1
Outline Introduction Related Work Proposed Method Experimental Results Conclusion 2
Introduction (1/2) HEVC coding tree unit (CTU) 3
Introduction (2/2) Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs) Directed acyclic graph (DAG) 4
Related Work (1/2) Local parallel method (LPM) [16] Motion estimate region (MER) 5 [16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012
Related Work (2/2) Local parallel method (LPM) 123 M = 16 or 8 6 8
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 7
Proposed Method.A (1/3) Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not overlap. The IPU’s upper boundary and MER’s upper boundary do not overlap. 123 8
Proposed Method.A (2/3) 9
Proposed Method.A (3/3) Neighboring CTUs left upper upper-left upper-right 10
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 11
Proposed Method.B (1/4) Generate a DAG to capture the dependency relationships of CTUs. 12
Proposed Method.B (2/4) DAG consists of a set of vertices V and edges E. data dependency an edge. Processed remove
Proposed Method.B (3/4) Condition matrix (CM) 14
Proposed Method.B (4/4) 15
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 16
Proposed Method.C (1/5) 17
Proposed Method.C (2/5) Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is designed to record the number of related CTUs for each CTU. Step2 : When some values in the CM become zero, get the corresponding coordinates and push them into DQ. 18
Proposed Method.C (3/5) Step3 : Get coordinates from DQ and process corresponding CTUs in parallel on many-core platform. Step4 : Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation. Step5 : Repeat above steps 2~4 until each frame is over. 19
Proposed Method.C (4/5) Maximum parallelism of CTU 123 Maximum parallelism of highly parallel framework 123 Average parallelism of highly parallel framework
Proposed Method.C (5/5) 21
Experimental Results (1/5) 22
Experimental Results (2/5) 23
Experimental Results (3/5) 24
Experimental Results (4/5) 25
Experimental Results (5/5) 26
Conclusion (1/1) Highly parallel framework provide sufficient parallelism for many-core platforms. Use the DAG-based order to parallelize CTUs. 27