Presentation is loading. Please wait.

Presentation is loading. Please wait.

Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos

Similar presentations


Presentation on theme: "Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos"— Presentation transcript:

1 GLOpenCL: OpenCL Support on Hardware- and Software-Managed Cache Multicores
Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos Department of Computer and Communications Engineering University of Thessaly Volos, Greece

2 Introduction OpenCL: a unified programming standard and framework
Targets: Homogeneous and heterogeneous multicores Accelerator-based systems Aims at being platform-agnostic Enhances portability 20/11/2018 HiPEAC 2011

3 Motivation Vendor specific OpenCL implementations target either only hardware- or software-controlled cache multicores AMD Stream SDK – x86 IBM OpenCL SDK – Cell B.E. Lack of a unified framework for architectures with diverse characteristics 20/11/2018 HiPEAC 2011

4 Contribution GLOpenCL: a unified OpenCL framework
Compilation infrastructure Run-time support Enables native execution of OpenCL applications on: Hardware-controlled cache multicores Software-controlled cache multicores Achieves comparable performance to architecture-specific implementations 20/11/2018 HiPEAC 2011

5 Outline Introduction The OpenCL Programming Model
Compilation Infrastructure Run-Time Support Experimental Evaluation Conclusions 20/11/2018 HiPEAC 2011

6 OpenCL Platform Model From: http://www.viznet.ac.uk 20/11/2018
HiPEAC 2011

7 OpenCL Execution Model
An OpenCL application consists of two parts: Main program that executes on the host A number of kernels that execute on the compute devices Main constructs of the OpenCL execution model: Kernels Memory Buffers Command Queues 20/11/2018 HiPEAC 2011

8 OpenCL Kernel Execution “Geometry”
20/11/2018 HiPEAC 2011

9 Outline Introduction The OpenCL Programming Model
Compilation Infrastructure Run-Time Support Experimental Evaluation Conclusions 20/11/2018 HiPEAC 2011

10 Granularity Management
work-group 20/11/2018 HiPEAC 2011

11 Serialization of Work Items
C code OpenCL code __kernel void Vadd(…) { int index; for( i = 0; i < get_local_size(2); i++) for( j = 0; j < get_local_size(1); j++) for( k = 0; k < get_local_size(0); k++) { index = get_item_gid(0); c[index] = a[index] + b[index]; } __kernel void Vadd(…) { int index = get_global_id(0); c[index] = a[index] + b[index]; } #define get_item_gid(N) \ (__global_id[N] + (N == 0 ? __k : (N == 1 ? __j : __i))) 20/11/2018 HiPEAC 2011

12 Elimination of Synchronization Operations
OpenCL code C code triple_nested_loop { Statements_block1 barrier(); Statements_block2 } Statements_block1 barrier(); Statements_block2 20/11/2018 HiPEAC 2011

13 Elimination of Synchronization Operations
C code C code triple_nested_loop { Statements_block1 } //barrier(); Statements_block2 triple_nested_loop { Statements_block1 barrier(); Statements_block2 } 20/11/2018 HiPEAC 2011

14 Variable Privatization
C code C code triple_nested_loop { Statements_block1 x = …; } Statements_block2 … = … x …; triple_nested_loop { Statements_block1 x[i][j][k] = …; } Statements_block2 … = … x[i][j][k] …; 20/11/2018 HiPEAC 2011

15 Outline Introduction The OpenCL Programming Model
Compilation Infrastructure Run-Time Support Experimental Evaluation Conclusions 20/11/2018 HiPEAC 2011

16 Run-Time System Library
Provide run-time support for: Work management Manipulation of memory buffers The run-time system provides a unified design for both architectures 20/11/2018 HiPEAC 2011

17 Architecture for Hardware- Controlled Cache Multicores
Main Thread C2 C1 Command Queue 20/11/2018 HiPEAC 2011

18 Architecture for Hardware- Controlled Cache Multicores
Main Thread C2 C1 Command Queue 20/11/2018 HiPEAC 2011

19 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread Worker Thread C1 C2 C3 Command Queue Ready Command Queue 20/11/2018 HiPEAC 2011

20 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread 1 Worker Thread N C2 C1 T1 TN-1 C3 T2 TN Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

21 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread 1 Worker Thread N C2 T1 TN-1 C3 T2 TN Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

22 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread 1 Worker Thread N C2 C3 Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

23 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread 1 Worker Thread N C2 C3 T1 TN-1 C3 T2 TN Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

24 Architecture for Hardware- Controlled Cache Multicores
Main Thread Worker Thread 1 Worker Thread N C3 T1 TN T2 TN Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

25 Architecture for Hardware- Controlled Cache Multicores
Helper Thread Async. Copy Queue Main Thread Worker Thread 1 Worker Thread N Command Queue Ready Command Queue Work Queue 1 Work Queue N 20/11/2018 HiPEAC 2011

26 Architecture for Software- Controlled Cache Multicores
Host Side Accelerator Side Work related req./replies Worker Thread 1 Work Queue 1 Main Thread Helper Thread SW cache req./replies Worker Thread N Ready Command Queue Work Queue N Command Queue 20/11/2018 HiPEAC 2011

27 Outline Introduction The OpenCL Programming Model
Compilation Infrastructure Run-Time Support Experimental Evaluation Conclusions 20/11/2018 HiPEAC 2011

28 Experimental Evaluation
Evaluate the framework on two representative architectures: Homogeneous, hardware-controlled cache memory processor – Intel’s E5520 i7 (Nehalem) processor Heterogeneous, software-controlled cache memory processor – Cell B.E. processor Compare with vendor provided, platform-customized implementations AMD Stream SDK for x86 architectures IBM OpenCL SDK for the Cell B.E. 20/11/2018 HiPEAC 2011

29 Vector Add – x86 20/11/2018 HiPEAC 2011

30 Vector Add – Cell 20/11/2018 HiPEAC 2011
- Average Performance Overhead: 27.5% 20/11/2018 HiPEAC 2011

31 AES Encryption – x86 20/11/2018 HiPEAC 2011

32 AES Encryption – Cell 20/11/2018 HiPEAC 2011

33 BlackScholes – x86 20/11/2018 HiPEAC 2011

34 BlackScholes – Cell 20/11/2018 HiPEAC 2011

35 Outline Introduction The OpenCL Programming Model
Compilation Infrastructure Run-Time Support Experimental Evaluation Conclusions 20/11/2018 HiPEAC 2011

36 Conclusions GLOpenCL achieves comparable, or better performance than architecture specific implementations OpenCL standard leaves enough room for platform specific optimizations May yield significant performance improvements Future Work: Reduce the effect of shared data structures Memory access pattern analysis 20/11/2018 HiPEAC 2011

37 Acknowledgements This project is partially supported by the EC Marie Curie International Reintegration Grant (IRG) 20/11/2018 HiPEAC 2011


Download ppt "Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos"

Similar presentations


Ads by Google