Presentation is loading. Please wait.

Presentation is loading. Please wait.

7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J.

Similar presentations


Presentation on theme: "7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J."— Presentation transcript:

1 7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J. Nieto 1, D.Sanz 1, G. de Arcas 1, R. Castro 2, J.M. López 1, J. Vega 2 1 Universidad Politécnica de Madrid (UPM), Spain 2 Asociación EURATOM/CIEMAT para Fusión. Spain

2 2 7th Workshop on Fusion Data Processing Validation and Analysis Index  Scope of the project Project goals Sample algorithm Test system  Subtask 1: GPU benchmarking  Subtask 2: EPICS integration (DPD)  Results  Conclusions

3 3 7th Workshop on Fusion Data Processing Validation and Analysis FPSC Project  FPSC Project Objective: To develop a FPSC prototype focused on Data Acquisition for ITER IO  The “functional requirements” of FPSC prototype: To provide high rate data acquisition, pre-processing, archiving and efficient data distribution among the different FPSC software modules To interface with CODAC and to provide archiving FPSC software based compatible with RHEL and EPICS To use COTS solutions

4 4 7th Workshop on Fusion Data Processing Validation and Analysis FPSC HW architecture DEVELOPMENT HOST GPUs

5 5 7th Workshop on Fusion Data Processing Validation and Analysis GPU subtasks  Goals: To provide benchmarking of Fermi GPUs (subtask 1)  Analyze GPU development cycle (methodology)  Compare execution times in GPU & CPU for similar developing effort To provide a methodology to integrate GPU processing units into EPICs (subtask 2)  Requisites: Use an algorithm representative of the type of operations that would be needed in plasma pre- processing

6 6 7th Workshop on Fusion Data Processing Validation and Analysis GPU Test System Linux RedHat Enterprise v5.5 64bits CPU Asyn DPD Subsystem EPICS IOC GPU Asyn IPP v7.0 CULA R11 CUBLAS v3.2 CODAC CORE SYSTEM 2.0 Host processor software OS Middleware Compilers CPU Libraries GPU Libraries RedHat Enterprise Linux 5.5 EPICS 3.14.12 and asynDriver 4.16 gcc V4.12.20080704 and nvcc V0.2.1221 MKL 10.3 Update 9 and IPP 7.0 NVIDIA SDK 3.2 NVIDIA CUBLAS 3.2 EMPHOTONICS CULA R11 NVIDIA GTX580 Xeon X5550 QuadCore

7 7 7th Workshop on Fusion Data Processing Validation and Analysis Sample algorithm  Best fit code for detecting position and amplitude of a spectra composed by a set of Gaussians based on Levenberg-Marquardt method

8 8 7th Workshop on Fusion Data Processing Validation and Analysis Subtask 1  Goal: benchmarking of a Fermi GPU  Standard GPU programming methodology : GPU is operated from the host as a coprocessor Host threads sequence GPU operations:  Responsible for moving data (Host↔Device)  Operations are coded: Programming kernels: CUDA Using libraries primitives: CULA, CUBLAS…

9 9 7th Workshop on Fusion Data Processing Validation and Analysis Results S1 (I) Block Size Exec. Time (ms) Throughput (MB/s) Improv. Ratio GPUCPUGPUCPU 2562,862,75 0,7 1,0 5122,854,83 1,40,81,7 10243,68,69 2,30,92,4 20485,2116,07 3,11,03,1 409616,4228,55 2,01,11,7 819242,8555,26 1,51,21,3 1638485,4107,5 1,51,21,3 32768168,65210,99 1,61,21,3 65536334,77425,96 1,61,21,3

10 10 7th Workshop on Fusion Data Processing Validation and Analysis Results S1 (II)

11 11 7th Workshop on Fusion Data Processing Validation and Analysis Subtask 2  Goal: to provide EPICS support for GPU processing Processing units EPICS IOC DPD FPGA GPU Others: archiving… Asyn Layer Data Generation CPU EPICS IOC Acquisition & Processing Acquisition & Processing Asyn Layer Single process approach DPD approach

12 12 7th Workshop on Fusion Data Processing Validation and Analysis Proposed methodology  The core of FPSC software is the DPD, it allows for: Moving data with very good performance. Integrating all the functional elements (EPICS monitoring, Data processing, Data Acquisition, Remote archiving, etc). Having a code completely based on the standard asynDriver. Full compatibility with any type of required data EPICS IOC State Machine CODAC Configuration Hardware Monitoring DPD (Data Processing and Distribution) Subsystem Timing TCN/1588 FPGA GPU Proc. GPU Proc. Hardware/ Cubicle Signals Hardware/ Cubicle Signals Archiving Asyn Layer Monitoring CPU Proc. CPU Proc. SDN

13 13 7th Workshop on Fusion Data Processing Validation and Analysis DPD features (I)  DPD enables to configure both the different functional elements (FPGA acquisition, GPU processing, SDN, EPICS monitoring, data processing, data archiving) of the FPSC and the connections (links) between them.  Functional elements allow: reading data blocks from inputs processing received data generating new signals routing data blocks to output links  DPD enables the integration of new type of functional elements to extend the FPSC functionality. This implies the creation of the corresponding asynDrivers that can be carried out in a simple way.  Enables a very easy integration of any existing asynDriver EPICS IOC Input Links Output Links

14 14 7th Workshop on Fusion Data Processing Validation and Analysis DPD features (II)  DPD enables to configure the data routing at configuration-time or even at run-time (to implement fault tolerant solutions)  DPD provides a common set of EPICS PVs for the several functional elements and their respective links  DPD provides on-line measurements of both throughputs and buffer occupancy in the links  DPD implements an optional multi-level buffering (memory, disk) backup solution for any link of the system Level 0 Level 1 Level 2 Backup Block Link

15 15 7th Workshop on Fusion Data Processing Validation and Analysis Test scenario T0T0 T2T2 T3T3 T 3 -T 2 Processing Time (T P ) T 4 -T 1 Module Service Time (T MS ) Internal Process Time (T P0 ) Host → Dev DPD (Data Processing and Distribution) Subsystem GPU Proc. GPU Proc. Data Generation Host → Dev Processing Dev → Host T1T1 T4T4 GPU Proc. GPU Proc. GPU Proc. GPU Proc. GPU T 4 -T 0 Total Service Time (T TS ) T P0

16 16 7th Workshop on Fusion Data Processing Validation and Analysis Timing (II) TiCamera DataGenerator TiCamera DataGenerator T0: New data block is generated Received data block DataFit Processing Data block Received DataFit result packing and routing T1: Data block is received in the module T2: Data block is ready to be processed T3: DataFit processing is finished T4: New DataFit processed data is packed and sent TPTP T MS T TS

17 17 7th Workshop on Fusion Data Processing Validation and Analysis Test scenario 1 Monitoring EPICS waveform TiCamera DataGenerator TiCamera DataGenerator GPU processing: TiCameraFit GPU processing: TiCameraFit

18 18 7th Workshop on Fusion Data Processing Validation and Analysis Test scenario 2 Monitoring EPICS waveform TiCamera DataGenerator TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit

19 19 7th Workshop on Fusion Data Processing Validation and Analysis Test scenario 3 Monitoring EPICS waveform TiCamera DataGenerator TiCamera DataGenerator GPU#0 processing: TiCameraFit GPU#0 processing: TiCameraFit GPU#1 processing: TiCameraFit GPU#1 processing: TiCameraFit

20 20 7th Workshop on Fusion Data Processing Validation and Analysis Results S2 1. To determine DPD overhead with respect to “hard coded” approach 2. To test DPD scalability (multi-module, multiple-hw support) Block SizeSP App1M/1GPU2M/1GPU2M/2GPU 409613,214,129,714,2 819243,144,889,945,4 1638485,686,6172,387,0 -Using 3 rd solution, we have been able to process 3MB/s running 2 modules in 2 different GPUs

21 21 7th Workshop on Fusion Data Processing Validation and Analysis Conclusions  Development methodology for using GPUs is being standardized, providing increasing levels of abstraction from hardware implementation details  “Hard coded” implementations seriously compromise scalability and maintainability, without guarantying relevant increase in performance  Specific frameworks are being developed for different scenarios (Thrust, DPD…) To simplify development To promote reusability To provide scalability and maintainability To include first level parallelism (internal load balancing based on multithreading)


Download ppt "7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J."

Similar presentations


Ads by Google