Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer.

Slides:

Advertisements

Similar presentations

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Advertisements

StreamBlade SOE TM Initial StreamBlade TM Stream Offload Engine (SOE) Single Board Computer SOE-4-PCI Rev 1.2.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.

Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.

Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.

Reference: Message Passing Fundamentals.

Seven Minute Madness: Special-Purpose Parallel Architectures Dr. Jason D. Bakos.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

Configurable System-on-Chip: Xilinx EDK

1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.

Foundation and XACTstepTM Software

Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.

1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.

Delevopment Tools Beyond HDL

Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January

Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Interconnection Structures

RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1

Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan

ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.

A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian.

CSC271 Database Systems Lecture # 4.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,

Silicon Building Blocks for Blade Server Designs accelerate your Innovation.

Silicon Graphics, Inc. Re-Configurable Application Specific Computing (RASC/FPGA) David Alexander Director of Engineering.

GBT Interface Card for a Linux Computer Carson Teale 1.

Xilinx Development Software Design Flow on Foundation M1.5

Automated Design of Custom Architecture Tulika Mitra

BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.

Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.

NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.

J. Christiansen, CERN - EP/MIC

FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.

SW and HW platforms for development of SDR systems SW: Model-Based Design and SDR HW: Concept of Modular Design and Solutions Fabio Ancona Sundance Italia.

1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.

Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:

1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.

Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.

Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.

Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.

Tools - Design Manager - Chapter 6 slide 1 Version 1.5 FPGA Tools Training Class Design Manager.

Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.

FPGA-based Supercomputers

Ethernet Bomber Ethernet Packet Generator for network analysis

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Enhancements for Voltaire’s InfiniBand simulator

Introduction to Programmable Logic

A Streaming FFT on 3GSPS ADC Data using Core Libraries and DIME-C

THE PROCESS OF EMBEDDED SYSTEM DEVELOPMENT

Constructing a system with multiple computers or processors

Storage Virtualization

Anne Pratoomtong ECE734, Spring2002

Reconfigurable Computing

RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS

Software Defined Networking (SDN)

Constructing a system with multiple computers or processors

Star Bridge Systems, Inc.

Xilinx Alliance Series

Presentation transcript:

Slide 1 Starbridge Viva™ Starbridge Solutions to Supercomputing Problems Reconfigurable Systems Summer Institute Esmail Chitalwala Starbridge Customer Support and Services 12 th July 2005

Slide 2 Outline  Current problems faced by application designers: –Code Development and Application Design –Execution Environment –Application Portability –Application Speed-up and Performance –Toolset  Solution: –Current emphasis - Development environment, programming tools –Concern - Application speed-up –Future directions …

Slide 3 Code Development  Current HPC applications designed using ‘C’ and ‘C’-based languages that perform serial execution on processors.  Parallel computing languages and architectures e.g Unified Parallel C (UPC),MPI.  Languages designed for developing applications to run on single or multiple processors, clusters, supercomputers.

Slide 4 Viva™ - Graphical Interface Windows-based application –Menu/Toolbar –Window Panes Object oriented –Drag and drop –Connect the dots Abstraction –High level (“black box”) –Low level (bits)

Slide 5 Viva™ - Graphical Interface

Slide 6 Viva™ - “3D Development” Top Sheet 2 nd Level 3 rd Level x,y z

Slide 7 Graphical Interface Advantages  Capture native parallelism  Tune algorithms for speed or space  Interactively debug code running in hardware

Slide 8 Execution Environment  Current generation of parallel computing applications based on single or multiple processors, clusters, supercomputers.  Next generation processors constitute multiple cores on a single processor allowing for parallel thread execution.  Significant overheads in processing and transfer of data.  Huge set-up costs in terms of space, time, power and money.

Slide 9 Execution Environment  Reconfigurable FPGA-based computers already allow the creation of parallel execution modules.  This could potentially allow the instantiation of multiple parallel execution modules depending on application scalability.  Less overheads when communicating and transferring data between modules.  Significantly lower ownership, operation and maintenance costs.

Slide 10 Reconfigurable Computers  Hypercomputer® –8 - Virtex II – 6000 (6M gates) –1 – Virtex II – Router –1 – Virtex II – Cross Point Switch –1 - Virtex II - PCIX –36 Gig RAM in 36 banks FPGA Virtex II GB DDR RAM

Slide 11 When someone says ``I want a programming language in which I need only say what I wish done,'' give him a lollipop. -- Alan Perlis

Slide 12 Application Portability  No direct or straight forward path for application portability.  What might help: –Using Viva there is no need to know Verilog/VHDL to design for FPGA hardware –Abundance of design and application libraries to easily build newer optimized scalable applications for FPGA execution –Allows existing VHDL/Verilog cores to be ported into the development environment –Allows code portability across different hardware platforms

Slide 13 Porting to Viva ® Algorithm analysis  Un-optimized Design considerations  Parallelization Internals Multiple “pipes”  Hardware efficiency I/O Memory Data width Code/Test/Modify

Slide 14 Design Flow in Viva ® START Load x86 System Description Design Sheet (.IDL)/Project (.IPG) Algorithm Implementation Viva ® synthesis Functional Test and Simulation NO YES Load FPGA System Description Viva ® synthesis Pass ? NO Xilinx PAR Timing, Area ? NO YES END/RUN Viva ® Xilinx

Slide 15 Viva ® : Library and Composite Objects  Contained within CoreLib.  Composite objects consist of modules constructed using primitives, EDIF imports and other composite objects.  Objects can be polymorphic or mapped to a particular data set.  Contains modules with a host of functionality like logic gates, math operators, communication objects, memory modules and grammatical objects.

Slide 16 Simulation in X86 Environment  The x86 SD is used in the initial stages of design to test functionality.  Almost every object in CoreLib has an equivalent x86 SD for simulation.  Runs on the micro-processor and provides accurate simulation of design ensuring successful place-and-route during synthesis.  Performs functional simulation of the design.  May not be cycle accurate.

Slide 17 Application Interface  Viva provides a widget based interface to the application whether you are simulating or executing on the hardware.

Slide 18 Execution using Hardware specific System Description  Contains objects and system level implementations mapped to specific components and primitives within FPGA system.  All Library objects and components contain equivalent descriptions for each FPGA SD.  Different SDs can be created using Viva ® for different FPGA- based systems from other vendors.

Slide 19 Viva™ Execution Environment CoreLib IIADL EditorSystem Definition EDIF HDL X86 Xilinx Tools Behavioral Communication System FPGA System Description Compiler

Slide 20 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Hypercomputer HC-62

Slide 21 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler NASA RSC

Slide 22 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler SGI Athena

Slide 23 Viva™ Execution Environment CoreLib IIADL Editor EDIF HDL X86 Xilinx ToolsFPGA System Description Compiler Nallatech

Slide 24 Viva™ - COM/ActiveX Interface and ‘C’ API Provides link to/from host –Data requests (e.g., File I/O) using COM or ‘C’ API (for HC-xx) –Process “spawning” (e.g., multiple execution threads)

Slide 25 Viva Bridges to Existing Environments ED IF Import & Export HDL code  EDIF Import Process Viva Primitive Viva Design Export Process EDIF

Slide 26 Application Speed-Up Speed-Up FPGA Clock Speed IO (Communication) Speed Parallelism within Algorithm Design Complexity Operations PCI/PCI-X PCI Express JTAG Proprietary / Non- standard IO Data dependency Loops/Iterations

Slide 27 Application speed-up  Factors affecting application speed-up can be split into three broad categories: FPGA clock speed IO Communication and bus speeds Parallelism within the algorithm being implemented

Slide 28 FPGA Clock speed  FPGA clock speed directly relates to the speed of execution in hardware  Higher FPGA clock speeds requires more stringent design rules, heavy use of pipelining and potentially more area on the FPGA  May increase synthesis and place and route time of applications  The maximum clock speed at which an application can be clocked depends to a large extent on the complexity of the application

Slide 29 FPGA Clock Speed  Viva allows the user to adjust the clock speed depending on the constraints and complexity of the algorithm being implemented  Viva allows for quick synthesis with a major portion of the time being spent in place and route  Objects and libraries created in Viva support high clock speeds, removing one more barrier for an application designer

Slide 30 IO Communication and Bus Speeds  IO Bandwidth determines to a large extent the efficiency of the system  Could potentially affect the processing rate on the FPGA  A variety of protocols exist to facilitate IO communication between the host and the FPGA  Some are industry standards e.g PCI, PCI-X, PCI-Express, VME, JTAG, etc  Others are non-standard or proprietary employing innovative solutions to achieve high bandwidth  Using industry standard protocols allows easy upgrade and use of COTS components

Slide 31 IO Communication and Bus Speeds  The Hypercomputers use a standard PCI-X interface (66 MHz) to communicate with the host processors.  The Hypercomputer itself could be placed on a PCI slot within any standard desktop or server configuration.  Provides for an easy path for migration from PCI to PCI- Express.  Presence of External IO pins allow for real time data acquisition and processing using FPGAs.

Slide 32 IO Communication and Bus Speeds  Performance: HC – 62: Memory76.0 GB/s Interconnect12.7 GB/s Crosspoint12.5 GB/s Router12.5 GB/s External IO8.5 GB/s PCIX200 MB/s

Slide 33 Parallelism within algorithm being implemented  The advantage of Reconfigurable hardware lies in the ability of the designer to unroll software loops and parallelize data independent statements on the FPGA. //Typical software loop loop (1, 3) { statement 1; statement 2; } //Software loop unrolled statement 1; statement 2; statement 1; statement 2; statement 1; statement 2;

Slide 34 Parallelism within algorithm being implemented Statement 1Statement 2 Statement 1Statement 2 Statement 1Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 1: Statement 1 and 2 are dependent Every iteration of the loop is dependent on the results from the previous one. Case 2: Statement 1 and 2 are independent Every iteration of the loop is dependent on the results from the previous one.

Slide 35 Parallelism within algorithm being implemented Statement 1 Statement 2 Statement 1 Statement 2 Statement 1 Statement 2 Case 3: Statement 1 and 2 are independent Every iteration of the loop is independent from the results of the previous one.

Slide 36 Viva™ - Application Speed-up  Smith-Waterman oPattern matching algorithm oMulti-million gates (60-70M) oFull HC-62 (10 FPGAs, 2 GB SDRAM) oCompile time of 20 minutes o14.7 billion S-W steps/s o4 bits per character oNational Cancer Institute Tests  Data load, process, visualize, single data set  1M x 1M (Rat/Human) Starbridge: approx. 5 min. NCI: approx. 24 hours 288 X Performance  167M x 47M (Human X/Y) Starbridge: approx. 5.5 days NCI: N/A

Slide 37 Viva™ - Application Speed-up Traveling Salesman Problem (TSP) oMulti-million gates (approx. 5.5M) oSingle HC-62 FPGA oNASA Tests Base: 3.2GHz Xeon w/compiler optimization 65 city tour Viva/FPGA: over 11x improvement

Slide 38 Future Direction  Take the best of both worlds:  Include a text based programming interface to supplement the GUI  Include Petri-net based simulation environment for more accurate, fast and reliable simulation  Create support for team based development for FPGA-based modules  Speed-up place and route time by employing processors within a network

Slide 39 Star Bridge Systems, Inc. Esmail Chitalwala “The computer is the first metamedium, and as such it has degrees of freedom for representation and expression never before encountered and as yet barely investigated.” - Alan Kay