MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam www.csm.ornl.gov/ft.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Distributed Processing, Client/Server and Clusters
Database Architectures and the Web
Remote Procedure Call Design issues Implementation RPC programming
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Chapter 3 Database Architectures and the Web Pearson Education © 2009.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
Distributed Processing, Client/Server, and Clusters
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
G Robert Grimm New York University Lightweight RPC.
Distributed Systems Architectures
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Implementing Remote Procedure Calls Authors: Andrew D. Birrell and Bruce Jay Nelson Xerox Palo Alto Research Center Presenter: Jim Santmyer Thanks to:
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 I/O Management in Representative Operating Systems.
.NET Mobile Application Development Remote Procedure Call.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
Chapter 3 Database Architectures and the Web Pearson Education © 2009.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Computer System Architectures Computer System Software
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Threads, Thread management & Resource Management.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
DCE (distributed computing environment) DCE (distributed computing environment)
Introduction to Distributed Systems Slides for CSCI 3171 Lectures E. W. Grundke.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
MIDORI The Post Windows Operating System Microsoft Research’s.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Middleware Services. Functions of Middleware Encapsulation Protection Concurrent processing Communication Scheduling.
Programmability Hiroshi Nakashima Thomas Sterling.
Full and Para Virtualization
SelfCon Foil no 1 Variability in Self-Adaptive Systems.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
CT101: Computing Systems Introduction to Operating Systems.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Texas Instruments TDA2x and Vision SDK
Database Architectures and the Web
Real-time Software Design
Chapter 15, Exploring the Digital Domain
DISTRIBUTED COMPUTING
Multiple Processor Systems
Multithreaded Programming
Presented by Neha Agrawal
Presentation transcript:

MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam

2 Vetter, Smith, Roth, Alam, Storaasli Multi-Paradigm Computing Systems are Quickly Becoming a Reality  In addition to general purpose processors, they include specialized devices in a tightly- coupled system –FPGAs –Multithreaded accelerators –Graphics processors –Game chips –Digital signal processors  New designs coming –DARPA HPCS –DARPA PCA – TRIPS, MONARCH, etc.  Multiple vendors are designing, building, selling multi-paradigm systems & components –IBM –SGI –Cray –SRC Computers –ClearSpeed –Linux Networx –Others… For computational scientists to realize the benefit offered by MPC systems, we must develop a consistent, user- friendly infrastructure that provides both portability and performance across devices and platforms.

3 Vetter, Smith, Roth, Alam, Storaasli We Propose the Multi-Paradigm Procedure Call as a Solution to Improve Productivity  Multi-paradigm Procedure Call (MPPC), for exploiting diverse devices within a MPC system –Software development kit for MPPC provides ease of programming –Open protocol for infrastructure communication that can be shared by vendors, developers, and application users  MPC runtime system (MPCRS), including –a runtime management system and –directory service, to discover and bind applications to specific devices within a single MPC system  Policies for scheduling applications onto the devices of MPC system –Blocking, non-blocking, work queue  Transparency/Portability across a diverse set of existing and future devices –“Write once, run on any MPC system” –Vendor and application neutral protocol  Optimizations for different architectures exploit their benefits –E.g., memory model

4 Vetter, Smith, Roth, Alam, Storaasli Design of Multi-Paradigm Procedure Call  GOAL: allow application nodes to discover, request, and schedule services from registered MPPC devices  Stubs in the compiled code represent the generic MPPC devices –Handle the interface with the MPC runtime system (MPCRS) to send requests and receive results –Generated with Interface Definition Language (IDL) automates the creation of the interface software between the application, device, and resource manager  Synchronous operation like normal procedure call; however, threads may be used to perform multiple MPPCs concurrently …. call funcX() …. call funcY() funcX ( ) funcY ( ) funcX() MPPC interface specified in IDL Host processor funcY() Device A / FPGA Device B / IBM CELL

Bonus Slides

6 Vetter, Smith, Roth, Alam, Storaasli Broad Consensus on Programming these Devices From 2005 DSB Report on Microchips… From Federal Plan on High End Computing, 2004 “…high-level programming tools should eventually include support for non-traditional HEC systems, for example, based on reconfigurable FPGA processors or PIMs.” From DARPA HPCS program…

7 Vetter, Smith, Roth, Alam, Storaasli MPC present programming hurdles…  Use different programming systems  Assume at most two types of devices in the system  Explicit management of data movement and parallelism  Use simplistic scheduling algorithms  Link statically to available resources  Etc. For computational scientists to realize the benefit offered by MPC systems, we must develop a consistent, user-friendly infrastructure that provides both portability and performance.

8 Vetter, Smith, Roth, Alam, Storaasli Benefits of Multi-Paradigm Procedure Call  Ease of programming increasingly complex MPC environments –High productivity computing systems for scientific applications  Transparency/Portability across a diverse set of existing and future devices –“Write once, run on any MPC system” –Vendor and application neutral protocol In much the same way that the development of the Remote Procedure Call (RPC) in the 1980s enabled thousands of users to begin programming complex distributed systems, we believe that our Multi-paradigm Procedure Call will provide the same benefit for users of MPC systems.

9 Vetter, Smith, Roth, Alam, Storaasli MPPC Scheduling and Resource Allocation  Allows the MPCRS to make intelligent scheduling decisions for competing services –Selects the most effective service for the application based on available devices –Policy selection driven by techniques including empirical measurements, historical data, and performance models –Initial design will use a lookup table of empirical data to select the most effective device –Later designs will incorporate more elegant methods such as analytical performance models and run-time performance monitoring  Exploit concurrency by scheduling multiple devices in parallel taking into account synchronization requirements

10 Vetter, Smith, Roth, Alam, Storaasli Host processor Runtime Manager Device A Device B Device C Boot phase Boot device Register Boot manager Accept requests Application build Compile with MPPC Library Link with MPPC Library Application Execution Phase Load application Query MPPC manager Response to the query Load MPPC libraries Schedule MPPC resources Schedule MPPC devices MPPC Sequence of Operations Schedule device C Acknowledge Reset, load binary Run MPPC protocol Notify manager Notify application MPPC call, transfer data and synchronize Receive data and compute Return response Receive response, return MPPC call Notify manager Release device C TIME

11 Vetter, Smith, Roth, Alam, Storaasli MPPC Runtime System (MPCRS)  Manages discovery, binding, scheduling of MPC nodes –At boot time, devices register their capabilities with the server –Each device will require a small kernel that boots, configures, and communicates with the server  Provides a generalized interface to all MPC resources on the system  Will use empirical or analytical performance models to select the device that maximizes the benefit to cost ratio Application on host MPPC stub MPPC Runtime Manager MPPC stub Device B

12 Vetter, Smith, Roth, Alam, Storaasli MPPC Work Queue MPPC Dynamic Scheduling  Scheduling and resource management dynamically service requests using a work-queue –Devices are not statically allocated to an application for its entire lifetime –MPC devices service requests from multiple application nodes  Advantages –Fine grained scheduling of device improves efficiency –Transparent load-balancing  Challenges –Fairness of scheduling –Security –More complex data movement –OS, TLB management App A Request 3 App C Request 3 App A Request 2 App B Request 4 App A Request 1 MPC Device

13 Vetter, Smith, Roth, Alam, Storaasli Optimizations for MPPC Operation  Zero-copy memory semantics on platforms that support globally addressable memory –Cray Rainier –SGI UV, RASC  MPPC call aggregation to reduce overhead for calling MPC devices –Use static analysis to collect and aggregate sequences of MPPC calls

14 Vetter, Smith, Roth, Alam, Storaasli Summary  Multi-Paradigm Computing systems are quickly becoming a reality –Demonstrated by wide vendor support and customer interest –Need high productivity programming support for computational scientists to make use of these systems  Propose the Multi-paradigm procedure call (MPPC) as a solution to improve productivity on MPC systems –Uses familiar programming techniques enabling high productivity computing systems –Provides runtime system and resource management for scheduling and resource discovery –Is a open protocol that can be supported by vendors, developers, & users