CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU Professor at University Joseph Fourier – Grenoble (France) TIMA Lab - SLS 46 av. Félix Viallet – Grenoble – France Communication Synthesis in Low Level Software for Hierarchical Heterogeneous Systems
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th Context of MPSoC An increasing number of processors: 380 processors on chip in 2015 (ITRS) Heterogeneity is the trend (good ratio FLOPS/W) In the High Performance Computing TOP500 (Nov. 2010): 2 heterogeneous architectures in the top 3 GREEN500 (June 2010): 3 heterogeneous architectures in the top 3 In the embedded world TI OMAP, Nexperia, D940, … A hierarchical structure is mandatory 3 levels: tile, chip, system (multi-chip) 2 System
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th Communication in hierarchical structure Challenges in communication synthesis Hierarchy and HW should be transparent for the system designer Complexity of the infrastructure and abstraction Heterogeneity of tile, chip and system Specific processor (VLIW) Non Uniform Memory Access Multiple hierarchy Use of complex network interfaces Efficient use of communication infrastructure Control of the limited resources (memory) TIMA is in charge of providing low level software that includes communication synthesis 3
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th Binary code generation flow 4 Application & Source code of task Application & Source code of task Architecture Mapping Parsing of input models Parsing of input models SW component selection SW component selection Compilation and linking tools Com OS FRONT-END BACK-END Y-CHART Binary SW component libraries
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th Binary code generation flow 5 Application & Source code of task Application & Source code of task Architecture Mapping Parsing of input models Parsing of input models SW component selection SW component selection Compilation and linking tools Com OS FRONT-END BACK-END Y-CHART Binary SW component libraries Communication paths FIFO in KPN model Association path FIFO
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 6 Outline Introduction HW communication paths Software components for communication Conclusion
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 7 The need of HW paths Introduction to HW paths HW components used for communications (data transfers) Use or not of specific components (DMA, …) Intermediate memories These HW paths are given by the architecture designer Why do we need these HW paths ? Communication synthesis System designers want to have a control on communication Where do we use these HW paths ? Used in simulation (architecture exploration, CF DOL methodology) Mapping Perspectives: analyze and verification …
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 8 Read and Write paths for intra-tile CPU3 Mem3 Mem2NI2 NI3 NI4 Tile CPU1CPU2 Mem1NI1 Network 3 Network 2 Network 4 Network 1
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 9 Read and Write paths for inter-tile Multi-Tile Network 1 9 CPU1CPU2 CPU3 Mem1 Mem3 Mem2 NI1 NI2 NI3 NI4 Network 1 Network 2 Network 3 Network 4 NI6 Multi-Tile Network 2 NI5 Tile
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th How to use these HW paths ? Hypothesis All HW paths are listed in the architecture model In the mapping, each channel from the application model should be associated with one HW path A protocol may be given The communication synthesis consists in Parsing architecture and mapping models Selecting the SW components Specializing SW components (ex: FIFO size, base address, …) And then providing a source code ready to be compiled and linked 10
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 11 Outline Introduction HW communication paths Software components for communication Conclusion
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 12 Software stack Application 1 task per process Source code of task OS Task and driver management Virtual file system (VFS) HW access only via HAL COM Based on VFS HAL Interface for HW access: Interrupts, locks, caches, endianess, …
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 13 Software stack: write function function t1_behavior(Channel c1) begin … channel_write(c1, buffer, len); end int main() { Channel c1; Thread t1; // Communication channel initialization c1= channel_init(“/dev/fifo.0”); // Task initialization t1 = thread_create(…, t1_behavior); …}
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 14 Software stack: write function function channel_write(Channel c, char *buffer, int len) begin … vfs_write(c->desc, buffer, len); end function t1_behavior(Channel c1) begin … channel_write(c1, buffer, len); end
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 15 Software stack: write function function vfs_write(Vfile f, char *buffer, int len) begin … f->stream->write(desc->id,buffer, len); end Driver choice (Software FIFO inter-CPU, Rendez-vous,…) function channel_write(Channel c, char *buffer, int len) begin … vfs_write(c->desc, buffer, len); end
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 16 Software stack: write function function vfs_write(Vfile f, char *buffer, int len) begin … f->stream->write(desc->id,buffer, len); end function fifo_write(char *buffer, int len) begin config = getConfiguration(); … HAL_WRITE (buffer, config->writeptr, len); end
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th function fifo_write(char *buffer, int len) begin config = getConfiguration(); … HAL_WRITE (buffer, config->writeptr, len); end 17 Software stack: write function function HAL_WRITE(char *from, char *to, int len) begin // May use of DMA end
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 18 The need of driver library One driver for each HW path is not realist Too much development Only few drivers corresponding to few HW paths Need of driver configurability Memory addresses Platform resources: locks, timer, … Exotic configurations while using specific network interfaces (DNP !) => Tradeoff efficiency/number of paths represented
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 19 About the HW path selected Each driver should be specialized To respect the selected HW path Right configuration To access all HW components mentioned in the HW path BUT it has to be compatible with the HAL HAL has a limited number of interfaces (and limited HW access) Efficiency Ease the porting to another platform Difficult to respect HW paths given in the mapping Due to HAL (usually minimal but expected as optimal) Local memory not necessary respected by compilers
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 20 Available protocols For the D940 platform (ARM & mAgicV processors) Intra-tile SW FIFO Rendez-vous in synchronous mode Inter-tile Sockets RDMA protocols (eager and Rendez-vous)
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th 21 Example of results LQCD application (from INFN) About 50 processes and 100 channels Protocols used Intra-tile: Rendez-vous Inter-tile: Eager for small message, Rendez-vous otherwise Mapping Intra-procIntra-tile Inter-proc Inter-tiles#Drivers Specializations 1 tile, ARM tile, ARM+DSP tiles, ARM tiles, ARM+DSP tiles, ARM tiles, ARM+DSP
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th What next for EURETILE ? WP4: Distributed Hardware Dependant Software Generation OS, HAL, communication mechanisms 3 main topics Brain-inspired many processes SW requirements Fault tolerance aware capabilities provided by HW Real-time aspect Interesting solution: task migration, but it is challenging Heterogeneity of the architecture NUMA Message passing Semi-centralized architecture 22
TIMA Laboratory- Frédéric ROUSSEAU - CASTNESS’11 Roma January 18 th Conclusion & perspectives Communication synthesis in multi-tile platform Formalization of multi-tile communications Introduction of HW paths Development of communication driver library Automatic selection and configuration of drivers What is really implemented may not be what has been decided HAL constraints Communication are the basics for task migration in a message passing system 23
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU Professor at University Joseph Fourier – Grenoble (France) TIMA Lab - SLS 46 av. Félix Viallet – Grenoble – France Communication Synthesis in Low Level Software for Hierarchical Heterogeneous Systems