Download presentation
Presentation is loading. Please wait.
Published byBrice Blair Modified over 9 years ago
1
Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1
2
Outline Introduction Related Work Proposed Method Experimental Results Conclusion 2
3
Introduction (1/2) HEVC coding tree unit (CTU) 3
4
Introduction (2/2) Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs) Directed acyclic graph (DAG) 4
5
Related Work (1/2) Local parallel method (LPM) [16] Motion estimate region (MER) 5 [16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012
6
Related Work (2/2) Local parallel method (LPM) 123 M = 16 or 8 6 8
7
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 7
8
Proposed Method.A (1/3) Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not overlap. The IPU’s upper boundary and MER’s upper boundary do not overlap. 123 8
9
Proposed Method.A (2/3) 9
10
Proposed Method.A (3/3) Neighboring CTUs left upper upper-left upper-right 10
11
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 11
12
Proposed Method.B (1/4) Generate a DAG to capture the dependency relationships of CTUs. 12
13
Proposed Method.B (2/4) DAG consists of a set of vertices V and edges E. data dependency an edge. Processed remove 123 13
14
Proposed Method.B (3/4) Condition matrix (CM) 14
15
Proposed Method.B (4/4) 15
16
Proposed Method A. Data Dependency Analysis B. DAG for CTUs C. Highly Parallel Framework 16
17
Proposed Method.C (1/5) 17
18
Proposed Method.C (2/5) Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is designed to record the number of related CTUs for each CTU. Step2 : When some values in the CM become zero, get the corresponding coordinates and push them into DQ. 18
19
Proposed Method.C (3/5) Step3 : Get coordinates from DQ and process corresponding CTUs in parallel on many-core platform. Step4 : Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation. Step5 : Repeat above steps 2~4 until each frame is over. 19
20
Proposed Method.C (4/5) Maximum parallelism of CTU 123 Maximum parallelism of highly parallel framework 123 Average parallelism of highly parallel framework 123 20
21
Proposed Method.C (5/5) 21
22
Experimental Results (1/5) 22
23
Experimental Results (2/5) 23
24
Experimental Results (3/5) 24
25
Experimental Results (4/5) 25
26
Experimental Results (5/5) 26
27
Conclusion (1/1) Highly parallel framework provide sufficient parallelism for many-core platforms. Use the DAG-based order to parallelize CTUs. 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.