Download presentation
Presentation is loading. Please wait.
1
Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science and Informatics University College Dublin
2
Introduction GridSolve Programming system for distributed computing Based on RPC SmartGridSolve is an extension of GridSolve Aims to achieve higher performance
3
Motivation Core aspects of GridSolve effecting application performance Mapping How tasks are assigned to servers Execution Model How tasks are executed in the distributed environment
4
Motivation: GridSolve Overview Maps tasks individually on a star network Mapping: Map task individually Execution Model : Client-Server Model Star Network
5
Motivation: SmartGridSolve Overview Maps a group of tasks on a fully connected network Mapping: Map a group of tasks Execution Model : Fully-Connected Network
6
Motivation : Performance Increase Performance increase Improved load balancing of computation Reducing volume of communication Improved load balancing of communication
7
Motivation: Performance Increase GridSolve Tasks are mapped individually Slow servers may be assigned large tasks. SmartGridSolve : Improves load balancing of computation Each task in the group is known prior to mapping Workload of a group of tasks is balanced across servers.
8
GridSolve Data transfers are mapped to client links only. Unnecessary data transfers Client links are heavily loaded SmartGridSolve : Reduces volume of communication Data dependencies are known Reduce Data Transfers Eliminate bridge communication Server-Server Communication Caching SmartGridSolve : Improved load balancing of communication Volume of data transfer are known More even distribution of communication load Motivation : Performance Increase
9
Mapping Design: GridSolve Mapping Star Network Discovery Task Discovery time flops Individual Task Mapping Heuristic
10
Design : GridSolve Execution Model Client Mapping execute task send input args recv output args
11
Mapping Performance Model of Fully Connected Network Task Graph................. Mapping Heuristic N …… Mapping Heuristic 2 Mapping Heuristic 1 Mapping Heuristics Design: SmartGridSolve Mapping a Group of Tasks
12
Client Mapping caching server comm Design: SmartGridSolve Executing a Group of Tasks
13
Executing on a Client-Server Execution Model Mapping Heuristic Individual Task Discovery Star Network Discovery Executing on a Server Comm. Enabled Execution Model Mapping Heuristic Group of Tasks Group of Task Discovery Fully Connected Network Discovery Processes for mapping individual tasks on a star network Processes for mapping a group of tasks on a fully connected network Design : SmartGridSolve Extensions
14
Design: Network Discovery Star Network Discovery Static Performance (LINPACK Benchmark) Dynamic Performance (CPU Load) Dynamic Client-Server Bandwidth Fully-Connected Network Discovery Dynamic Server- Server Bandwidth
15
Design: GridSolve Task Discovery Static Parameters Number of arguments Argument – (input, output) Argument – (scalar/nonscalar) Argument object types (matrix, vector.) Argument data types (int, double, etc..) Function for complexity (flops) Run-Time Parameters Dimensions of arguments, variables of complexity function.
16
Design: GridSolve Task Discovery Static Parameters Discovered when server is registered Run-Time parameters Discovered when task is called. complexity input arg size output arg size input arg size..... output arg size....
17
Build a task graph before any of the tasks are called.................. Design: SmartGridSolve Group of Tasks Discovery
18
GridSolve Discovery, Mapping and Execution are one atomic operation SmartGridSolve To map a group of tasks Separate the discovery and mapping from the execution of tasks. Addition to GridSolve API gs_smart_map(“ex_map”){ //group of tasks } Design: SmartSolve Group of Tasks Discovery
19
Application Real World Application Hydropad An astrophysics application that simulates the clustering of galaxies from the big bang till present.
20
Internal Structure It constitutes of four parts Initialisation Gravitation (FFT) Dark Matter (N -Body) Baryonic Matter (PPM)
21
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover
22
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map
23
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
24
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover
25
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map
26
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
27
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover
28
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map
29
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
30
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
31
Individual Mapping for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
32
Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Start Discovery
33
Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Discover
34
Task Graph
35
Network Performance Model
36
Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Map Group of Task
37
Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Start Execution
38
Group Mapping gs_smart_map(“ex_map”){ for(i=0;i<nb_evolutions;i++) {... grpc_call(grav_hndl,phiold,...); grpc_call_async(dark_hndl,&sid_dark,x3dm,...); grpc_call_async(bary_hndl,&sid_bary,v3bm,...); /* wait for non blocking calls to finish */ grpc_wait(sid_dark); grpc_wait(sid_bary);... } Execute
39
barmatter 7 fields 5 darkmatter 1 barmatter 7 fields 5 darkmatter 1 hcl08 3 hcl06 1 hcl08 3 hcl08 3 hcl06 1 hcl08 3 poor load balancing Forced bridge communication GridSolve Mapping of Evolution Stage
40
barmatter 7 fields 5 darkmatter 1 barmatter 7 fields 5 darkmatter 1 hcl08 3 hcl08 3 hcl06 1 hcl08 3 hcl08 3 hcl06 11 improved load balancing SmartSolve Mapping of Evolution Stage reduced communication volume caching server comm
41
Published in paper: ~2 times speedup (client and servers on local network) Results
42
New enhancements: Client and server broadcast Asynchronous communication ~5 times speedup (client and servers on local network) ~13 times speedup (client and servers on remote networks) Results
43
Conclusion SmartGridSolve improves performance Improved load balancing of computation Reduces the volume of communication Improved load balancing of communication
44
Future Work Future work Fault Tolerance Improved synchronisation of tasks Functional Performance model ADL language and compiler
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.