MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam
2 Vetter, Smith, Roth, Alam, Storaasli Multi-Paradigm Computing Systems are Quickly Becoming a Reality In addition to general purpose processors, they include specialized devices in a tightly- coupled system –FPGAs –Multithreaded accelerators –Graphics processors –Game chips –Digital signal processors New designs coming –DARPA HPCS –DARPA PCA – TRIPS, MONARCH, etc. Multiple vendors are designing, building, selling multi-paradigm systems & components –IBM –SGI –Cray –SRC Computers –ClearSpeed –Linux Networx –Others… For computational scientists to realize the benefit offered by MPC systems, we must develop a consistent, user- friendly infrastructure that provides both portability and performance across devices and platforms.
3 Vetter, Smith, Roth, Alam, Storaasli We Propose the Multi-Paradigm Procedure Call as a Solution to Improve Productivity Multi-paradigm Procedure Call (MPPC), for exploiting diverse devices within a MPC system –Software development kit for MPPC provides ease of programming –Open protocol for infrastructure communication that can be shared by vendors, developers, and application users MPC runtime system (MPCRS), including –a runtime management system and –directory service, to discover and bind applications to specific devices within a single MPC system Policies for scheduling applications onto the devices of MPC system –Blocking, non-blocking, work queue Transparency/Portability across a diverse set of existing and future devices –“Write once, run on any MPC system” –Vendor and application neutral protocol Optimizations for different architectures exploit their benefits –E.g., memory model
4 Vetter, Smith, Roth, Alam, Storaasli Design of Multi-Paradigm Procedure Call GOAL: allow application nodes to discover, request, and schedule services from registered MPPC devices Stubs in the compiled code represent the generic MPPC devices –Handle the interface with the MPC runtime system (MPCRS) to send requests and receive results –Generated with Interface Definition Language (IDL) automates the creation of the interface software between the application, device, and resource manager Synchronous operation like normal procedure call; however, threads may be used to perform multiple MPPCs concurrently …. call funcX() …. call funcY() funcX ( ) funcY ( ) funcX() MPPC interface specified in IDL Host processor funcY() Device A / FPGA Device B / IBM CELL
Bonus Slides
6 Vetter, Smith, Roth, Alam, Storaasli Broad Consensus on Programming these Devices From 2005 DSB Report on Microchips… From Federal Plan on High End Computing, 2004 “…high-level programming tools should eventually include support for non-traditional HEC systems, for example, based on reconfigurable FPGA processors or PIMs.” From DARPA HPCS program…
7 Vetter, Smith, Roth, Alam, Storaasli MPC present programming hurdles… Use different programming systems Assume at most two types of devices in the system Explicit management of data movement and parallelism Use simplistic scheduling algorithms Link statically to available resources Etc. For computational scientists to realize the benefit offered by MPC systems, we must develop a consistent, user-friendly infrastructure that provides both portability and performance.
8 Vetter, Smith, Roth, Alam, Storaasli Benefits of Multi-Paradigm Procedure Call Ease of programming increasingly complex MPC environments –High productivity computing systems for scientific applications Transparency/Portability across a diverse set of existing and future devices –“Write once, run on any MPC system” –Vendor and application neutral protocol In much the same way that the development of the Remote Procedure Call (RPC) in the 1980s enabled thousands of users to begin programming complex distributed systems, we believe that our Multi-paradigm Procedure Call will provide the same benefit for users of MPC systems.
9 Vetter, Smith, Roth, Alam, Storaasli MPPC Scheduling and Resource Allocation Allows the MPCRS to make intelligent scheduling decisions for competing services –Selects the most effective service for the application based on available devices –Policy selection driven by techniques including empirical measurements, historical data, and performance models –Initial design will use a lookup table of empirical data to select the most effective device –Later designs will incorporate more elegant methods such as analytical performance models and run-time performance monitoring Exploit concurrency by scheduling multiple devices in parallel taking into account synchronization requirements
10 Vetter, Smith, Roth, Alam, Storaasli Host processor Runtime Manager Device A Device B Device C Boot phase Boot device Register Boot manager Accept requests Application build Compile with MPPC Library Link with MPPC Library Application Execution Phase Load application Query MPPC manager Response to the query Load MPPC libraries Schedule MPPC resources Schedule MPPC devices MPPC Sequence of Operations Schedule device C Acknowledge Reset, load binary Run MPPC protocol Notify manager Notify application MPPC call, transfer data and synchronize Receive data and compute Return response Receive response, return MPPC call Notify manager Release device C TIME
11 Vetter, Smith, Roth, Alam, Storaasli MPPC Runtime System (MPCRS) Manages discovery, binding, scheduling of MPC nodes –At boot time, devices register their capabilities with the server –Each device will require a small kernel that boots, configures, and communicates with the server Provides a generalized interface to all MPC resources on the system Will use empirical or analytical performance models to select the device that maximizes the benefit to cost ratio Application on host MPPC stub MPPC Runtime Manager MPPC stub Device B
12 Vetter, Smith, Roth, Alam, Storaasli MPPC Work Queue MPPC Dynamic Scheduling Scheduling and resource management dynamically service requests using a work-queue –Devices are not statically allocated to an application for its entire lifetime –MPC devices service requests from multiple application nodes Advantages –Fine grained scheduling of device improves efficiency –Transparent load-balancing Challenges –Fairness of scheduling –Security –More complex data movement –OS, TLB management App A Request 3 App C Request 3 App A Request 2 App B Request 4 App A Request 1 MPC Device
13 Vetter, Smith, Roth, Alam, Storaasli Optimizations for MPPC Operation Zero-copy memory semantics on platforms that support globally addressable memory –Cray Rainier –SGI UV, RASC MPPC call aggregation to reduce overhead for calling MPC devices –Use static analysis to collect and aggregate sequences of MPPC calls
14 Vetter, Smith, Roth, Alam, Storaasli Summary Multi-Paradigm Computing systems are quickly becoming a reality –Demonstrated by wide vendor support and customer interest –Need high productivity programming support for computational scientists to make use of these systems Propose the Multi-paradigm procedure call (MPPC) as a solution to improve productivity on MPC systems –Uses familiar programming techniques enabling high productivity computing systems –Provides runtime system and resource management for scheduling and resource discovery –Is a open protocol that can be supported by vendors, developers, & users