Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer.

Similar presentations


Presentation on theme: "Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer."— Presentation transcript:

1 Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer Support and Services 12 th July 2005

2 Slide 2 Outline  Current problems faced by application designers: –Code Development and Application Design –Execution Environment –Application Portability –Application Speed-up and Performance –Toolset  Solution: –Current emphasis - Development environment, programming tools –Concern - Application speed-up –Future directions …

3 Slide 3 Code Development  Current HPC applications designed using ‘C’ and ‘C’-based languages that perform serial execution on processors.  Parallel computing languages and architectures e.g Unified Parallel C (UPC),MPI.  Languages designed for developing applications to run on single or multiple processors, clusters, supercomputers.

4 Slide 4 Viva™ - Graphical Interface Windows-based application –Menu/Toolbar –Window Panes Object oriented –Drag and drop –Connect the dots Abstraction –High level (“black box”) –Low level (bits)

5 Slide 5 Viva™ - Graphical Interface

6 Slide 6 Viva™ - “3D Development” Top Sheet 2 nd Level 3 rd Level x,y z

7 Slide 7 Graphical Interface Advantages  Capture native parallelism  Tune algorithms for speed or space  Interactively debug code running in hardware

8 Slide 8 Execution Environment  Current generation of parallel computing applications based on single or multiple processors, clusters, supercomputers.  Next generation processors constitute multiple cores on a single processor allowing for parallel thread execution.  Significant overheads in processing and transfer of data.  Huge set-up costs in terms of space, time, power and money.

9 Slide 9 Execution Environment  Reconfigurable FPGA-based computers already allow the creation of parallel execution modules.  This could potentially allow the instantiation of multiple parallel execution modules depending on application scalability.  Less overheads when communicating and transferring data between modules.  Significantly lower ownership, operation and maintenance costs.

10 Slide 10 Reconfigurable Computers  Hypercomputer® –8 - Virtex II – 6000 (6M gates) –1 – Virtex II – Router –1 – Virtex II – Cross Point Switch –1 - Virtex II - PCIX –36 Gig RAM in 36 banks FPGA Virtex II 6000 0.5 GB DDR RAM

11 Slide 11 When someone says ``I want a programming language in which I need only say what I wish done,'' give him a lollipop. -- Alan Perlis

12 Slide 12 Application Portability  No direct or straight forward path for application portability.  What might help: –Using Viva there is no need to know Verilog/VHDL to design for FPGA hardware –Abundance of design and application libraries to easily build newer optimized scalable applications for FPGA execution –Allows existing VHDL/Verilog cores to be ported into the development environment –Allows code portability across different hardware platforms

13 Slide 13 Porting to Viva ® Algorithm analysis  Un-optimized Design considerations  Parallelization Internals Multiple “pipes”  Hardware efficiency I/O Memory Data width Code/Test/Modify

14 Slide 14 Design Flow in Viva ® START Load x86 System Description Design Sheet (.IDL)/Project (.IPG) Algorithm Implementation Viva ® synthesis Functional Test and Simulation NO YES Load FPGA System Description Viva ® synthesis Pass ? NO Xilinx PAR Timing, Area ? NO YES END/RUN Viva ® Xilinx

15 Slide 15 Viva ® : Library and Composite Objects  Contained within CoreLib.  Composite objects consist of modules constructed using primitives, EDIF imports and other composite objects.  Objects can be polymorphic or mapped to a particular data set.  Contains modules with a host of functionality like logic gates, math operators, communication objects, memory modules and grammatical objects.

16 Slide 16 Simulation in X86 Environment  The x86 SD is used in the initial stages of design to test functionality.  Almost every object in CoreLib has an equivalent x86 SD for simulation.  Runs on the micro-processor and provides accurate simulation of design ensuring successful place-and-route during synthesis.  Performs functional simulation of the design.  May not be cycle accurate.

17 Slide 17 Application Interface  Viva provides a widget based interface to the application whether you are simulating or executing on the hardware.

18 Slide 18 Execution using Hardware specific System Description  Contains objects and system level implementations mapped to specific components and primitives within FPGA system.  All Library objects and components contain equivalent descriptions for each FPGA SD.  Different SDs can be created using Viva ® for different FPGA- based systems from other vendors.

19 Slide 19 Viva™ Execution Environment CoreLib IIADL EditorSystem Definition EDIF HDL X86 Xilinx Tools Behavioral Communication System FPGA System Description Compiler

20 Slide 20 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Hypercomputer HC-62

21 Slide 21 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler NASA RSC

22 Slide 22 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler SGI Athena

23 Slide 23 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Nallatech

24 Slide 24 Viva™ - COM/ActiveX Interface and ‘C’ API Provides link to/from host –Data requests (e.g., File I/O) using COM or ‘C’ API (for HC-xx) –Process “spawning” (e.g., multiple execution threads)

25 Slide 25 Viva Bridges to Existing Environments ED IF Import & Export HDL code  EDIF Import Process Viva Primitive Viva Design Export Process EDIF

26 Slide 26 Application Speed-Up Speed-Up FPGA Clock Speed IO (Communication) Speed Parallelism within Algorithm Design Complexity Operations PCI/PCI-X PCI Express JTAG Proprietary / Non- standard IO Data dependency Loops/Iterations

27 Slide 27 Application speed-up  Factors affecting application speed-up can be split into three broad categories: FPGA clock speed IO Communication and bus speeds Parallelism within the algorithm being implemented

28 Slide 28 FPGA Clock speed  FPGA clock speed directly relates to the speed of execution in hardware  Higher FPGA clock speeds requires more stringent design rules, heavy use of pipelining and potentially more area on the FPGA  May increase synthesis and place and route time of applications  The maximum clock speed at which an application can be clocked depends to a large extent on the complexity of the application

29 Slide 29 FPGA Clock Speed  Viva allows the user to adjust the clock speed depending on the constraints and complexity of the algorithm being implemented  Viva allows for quick synthesis with a major portion of the time being spent in place and route  Objects and libraries created in Viva support high clock speeds, removing one more barrier for an application designer

30 Slide 30 IO Communication and Bus Speeds  IO Bandwidth determines to a large extent the efficiency of the system  Could potentially affect the processing rate on the FPGA  A variety of protocols exist to facilitate IO communication between the host and the FPGA  Some are industry standards e.g PCI, PCI-X, PCI-Express, VME, JTAG, etc  Others are non-standard or proprietary employing innovative solutions to achieve high bandwidth  Using industry standard protocols allows easy upgrade and use of COTS components

31 Slide 31 IO Communication and Bus Speeds  The Hypercomputers use a standard PCI-X interface (66 MHz) to communicate with the host processors.  The Hypercomputer itself could be placed on a PCI slot within any standard desktop or server configuration.  Provides for an easy path for migration from PCI to PCI- Express.  Presence of External IO pins allow for real time data acquisition and processing using FPGAs.

32 Slide 32 IO Communication and Bus Speeds  Performance: HC – 62: Memory76.0 GB/s Interconnect12.7 GB/s Crosspoint12.5 GB/s Router12.5 GB/s External IO8.5 GB/s PCIX200 MB/s

33 Slide 33 Parallelism within algorithm being implemented  The advantage of Reconfigurable hardware lies in the ability of the designer to unroll software loops and parallelize data independent statements on the FPGA. //Typical software loop loop (1, 3) { statement 1; statement 2; } //Software loop unrolled statement 1; statement 2; statement 1; statement 2; statement 1; statement 2;

34 Slide 34 Parallelism within algorithm being implemented Statement 1Statement 2 Statement 1Statement 2 Statement 1Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 1: Statement 1 and 2 are dependent Every iteration of the loop is dependent on the results from the previous one. Case 2: Statement 1 and 2 are independent Every iteration of the loop is dependent on the results from the previous one.

35 Slide 35 Parallelism within algorithm being implemented Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 3: Statement 1 and 2 are independent Every iteration of the loop is independent from the results of the previous one.

36 Slide 36 Viva™ - Application Speed-up  Smith-Waterman oPattern matching algorithm oMulti-million gates (60-70M) oFull HC-62 (10 FPGAs, 2 GB SDRAM) oCompile time of 20 minutes o14.7 billion S-W steps/s o4 bits per character oNational Cancer Institute Tests  Data load, process, visualize, single data set  1M x 1M (Rat/Human) Starbridge: approx. 5 min. NCI: approx. 24 hours 288 X Performance  167M x 47M (Human X/Y) Starbridge: approx. 5.5 days NCI: N/A

37 Slide 37 Viva™ - Application Speed-up Traveling Salesman Problem (TSP) oMulti-million gates (approx. 5.5M) oSingle HC-62 FPGA oNASA Tests Base: 3.2GHz Xeon w/compiler optimization 65 city tour Viva/FPGA: over 11x improvement

38 Slide 38 Future Direction  Take the best of both worlds:  Include a text based programming interface to supplement the GUI  Include Petri-net based simulation environment for more accurate, fast and reliable simulation  Create support for team based development for FPGA-based modules  Speed-up place and route time by employing processors within a network

39 Slide 39 Star Bridge Systems, Inc. Esmail Chitalwala echitalwala@starbridgesystems.com support@starbridgesystems.com “The computer is the first metamedium, and as such it has degrees of freedom for representation and expression never before encountered and as yet barely investigated.” - Alan Kay


Download ppt "Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer."

Similar presentations


Ads by Google