Download presentation
Presentation is loading. Please wait.
Published byDarcy Cook Modified over 9 years ago
1
Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer Support and Services 12 th July 2005
2
Slide 2 Outline Current problems faced by application designers: –Code Development and Application Design –Execution Environment –Application Portability –Application Speed-up and Performance –Toolset Solution: –Current emphasis - Development environment, programming tools –Concern - Application speed-up –Future directions …
3
Slide 3 Code Development Current HPC applications designed using ‘C’ and ‘C’-based languages that perform serial execution on processors. Parallel computing languages and architectures e.g Unified Parallel C (UPC),MPI. Languages designed for developing applications to run on single or multiple processors, clusters, supercomputers.
4
Slide 4 Viva™ - Graphical Interface Windows-based application –Menu/Toolbar –Window Panes Object oriented –Drag and drop –Connect the dots Abstraction –High level (“black box”) –Low level (bits)
5
Slide 5 Viva™ - Graphical Interface
6
Slide 6 Viva™ - “3D Development” Top Sheet 2 nd Level 3 rd Level x,y z
7
Slide 7 Graphical Interface Advantages Capture native parallelism Tune algorithms for speed or space Interactively debug code running in hardware
8
Slide 8 Execution Environment Current generation of parallel computing applications based on single or multiple processors, clusters, supercomputers. Next generation processors constitute multiple cores on a single processor allowing for parallel thread execution. Significant overheads in processing and transfer of data. Huge set-up costs in terms of space, time, power and money.
9
Slide 9 Execution Environment Reconfigurable FPGA-based computers already allow the creation of parallel execution modules. This could potentially allow the instantiation of multiple parallel execution modules depending on application scalability. Less overheads when communicating and transferring data between modules. Significantly lower ownership, operation and maintenance costs.
10
Slide 10 Reconfigurable Computers Hypercomputer® –8 - Virtex II – 6000 (6M gates) –1 – Virtex II – Router –1 – Virtex II – Cross Point Switch –1 - Virtex II - PCIX –36 Gig RAM in 36 banks FPGA Virtex II 6000 0.5 GB DDR RAM
11
Slide 11 When someone says ``I want a programming language in which I need only say what I wish done,'' give him a lollipop. -- Alan Perlis
12
Slide 12 Application Portability No direct or straight forward path for application portability. What might help: –Using Viva there is no need to know Verilog/VHDL to design for FPGA hardware –Abundance of design and application libraries to easily build newer optimized scalable applications for FPGA execution –Allows existing VHDL/Verilog cores to be ported into the development environment –Allows code portability across different hardware platforms
13
Slide 13 Porting to Viva ® Algorithm analysis Un-optimized Design considerations Parallelization Internals Multiple “pipes” Hardware efficiency I/O Memory Data width Code/Test/Modify
14
Slide 14 Design Flow in Viva ® START Load x86 System Description Design Sheet (.IDL)/Project (.IPG) Algorithm Implementation Viva ® synthesis Functional Test and Simulation NO YES Load FPGA System Description Viva ® synthesis Pass ? NO Xilinx PAR Timing, Area ? NO YES END/RUN Viva ® Xilinx
15
Slide 15 Viva ® : Library and Composite Objects Contained within CoreLib. Composite objects consist of modules constructed using primitives, EDIF imports and other composite objects. Objects can be polymorphic or mapped to a particular data set. Contains modules with a host of functionality like logic gates, math operators, communication objects, memory modules and grammatical objects.
16
Slide 16 Simulation in X86 Environment The x86 SD is used in the initial stages of design to test functionality. Almost every object in CoreLib has an equivalent x86 SD for simulation. Runs on the micro-processor and provides accurate simulation of design ensuring successful place-and-route during synthesis. Performs functional simulation of the design. May not be cycle accurate.
17
Slide 17 Application Interface Viva provides a widget based interface to the application whether you are simulating or executing on the hardware.
18
Slide 18 Execution using Hardware specific System Description Contains objects and system level implementations mapped to specific components and primitives within FPGA system. All Library objects and components contain equivalent descriptions for each FPGA SD. Different SDs can be created using Viva ® for different FPGA- based systems from other vendors.
19
Slide 19 Viva™ Execution Environment CoreLib IIADL EditorSystem Definition EDIF HDL X86 Xilinx Tools Behavioral Communication System FPGA System Description Compiler
20
Slide 20 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Hypercomputer HC-62
21
Slide 21 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler NASA RSC
22
Slide 22 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler SGI Athena
23
Slide 23 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Nallatech
24
Slide 24 Viva™ - COM/ActiveX Interface and ‘C’ API Provides link to/from host –Data requests (e.g., File I/O) using COM or ‘C’ API (for HC-xx) –Process “spawning” (e.g., multiple execution threads)
25
Slide 25 Viva Bridges to Existing Environments ED IF Import & Export HDL code EDIF Import Process Viva Primitive Viva Design Export Process EDIF
26
Slide 26 Application Speed-Up Speed-Up FPGA Clock Speed IO (Communication) Speed Parallelism within Algorithm Design Complexity Operations PCI/PCI-X PCI Express JTAG Proprietary / Non- standard IO Data dependency Loops/Iterations
27
Slide 27 Application speed-up Factors affecting application speed-up can be split into three broad categories: FPGA clock speed IO Communication and bus speeds Parallelism within the algorithm being implemented
28
Slide 28 FPGA Clock speed FPGA clock speed directly relates to the speed of execution in hardware Higher FPGA clock speeds requires more stringent design rules, heavy use of pipelining and potentially more area on the FPGA May increase synthesis and place and route time of applications The maximum clock speed at which an application can be clocked depends to a large extent on the complexity of the application
29
Slide 29 FPGA Clock Speed Viva allows the user to adjust the clock speed depending on the constraints and complexity of the algorithm being implemented Viva allows for quick synthesis with a major portion of the time being spent in place and route Objects and libraries created in Viva support high clock speeds, removing one more barrier for an application designer
30
Slide 30 IO Communication and Bus Speeds IO Bandwidth determines to a large extent the efficiency of the system Could potentially affect the processing rate on the FPGA A variety of protocols exist to facilitate IO communication between the host and the FPGA Some are industry standards e.g PCI, PCI-X, PCI-Express, VME, JTAG, etc Others are non-standard or proprietary employing innovative solutions to achieve high bandwidth Using industry standard protocols allows easy upgrade and use of COTS components
31
Slide 31 IO Communication and Bus Speeds The Hypercomputers use a standard PCI-X interface (66 MHz) to communicate with the host processors. The Hypercomputer itself could be placed on a PCI slot within any standard desktop or server configuration. Provides for an easy path for migration from PCI to PCI- Express. Presence of External IO pins allow for real time data acquisition and processing using FPGAs.
32
Slide 32 IO Communication and Bus Speeds Performance: HC – 62: Memory76.0 GB/s Interconnect12.7 GB/s Crosspoint12.5 GB/s Router12.5 GB/s External IO8.5 GB/s PCIX200 MB/s
33
Slide 33 Parallelism within algorithm being implemented The advantage of Reconfigurable hardware lies in the ability of the designer to unroll software loops and parallelize data independent statements on the FPGA. //Typical software loop loop (1, 3) { statement 1; statement 2; } //Software loop unrolled statement 1; statement 2; statement 1; statement 2; statement 1; statement 2;
34
Slide 34 Parallelism within algorithm being implemented Statement 1Statement 2 Statement 1Statement 2 Statement 1Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 1: Statement 1 and 2 are dependent Every iteration of the loop is dependent on the results from the previous one. Case 2: Statement 1 and 2 are independent Every iteration of the loop is dependent on the results from the previous one.
35
Slide 35 Parallelism within algorithm being implemented Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 3: Statement 1 and 2 are independent Every iteration of the loop is independent from the results of the previous one.
36
Slide 36 Viva™ - Application Speed-up Smith-Waterman oPattern matching algorithm oMulti-million gates (60-70M) oFull HC-62 (10 FPGAs, 2 GB SDRAM) oCompile time of 20 minutes o14.7 billion S-W steps/s o4 bits per character oNational Cancer Institute Tests Data load, process, visualize, single data set 1M x 1M (Rat/Human) Starbridge: approx. 5 min. NCI: approx. 24 hours 288 X Performance 167M x 47M (Human X/Y) Starbridge: approx. 5.5 days NCI: N/A
37
Slide 37 Viva™ - Application Speed-up Traveling Salesman Problem (TSP) oMulti-million gates (approx. 5.5M) oSingle HC-62 FPGA oNASA Tests Base: 3.2GHz Xeon w/compiler optimization 65 city tour Viva/FPGA: over 11x improvement
38
Slide 38 Future Direction Take the best of both worlds: Include a text based programming interface to supplement the GUI Include Petri-net based simulation environment for more accurate, fast and reliable simulation Create support for team based development for FPGA-based modules Speed-up place and route time by employing processors within a network
39
Slide 39 Star Bridge Systems, Inc. Esmail Chitalwala echitalwala@starbridgesystems.com support@starbridgesystems.com “The computer is the first metamedium, and as such it has degrees of freedom for representation and expression never before encountered and as yet barely investigated.” - Alan Kay
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.