Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

Hao wang and Jyh-Charn (Steve) Liu
FPGA-Based System Design: Chapter 7 Copyright  2004 Prentice Hall PTR Topics n Bus interfaces. n Platform FPGAs.
An Introduction to Reconfigurable Computing Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003.
Hardwired networks on chip for FPGAs and their applications
EECE579: Digital Design Flows
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
I/O Channels I/O devices getting more sophisticated e.g. 3D graphics cards CPU instructs I/O controller to do transfer I/O controller does entire transfer.
Configurable System-on-Chip: Xilinx EDK
Programmable logic and FPGA
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman.
1 Design of the Front End Readout Board for TORCH Detector 10, June 2010.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Router Architectures An overview of router architectures.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Module I Overview of Computer Architecture and Organization.
General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Extensible Message Layers for Resource-Rich Cluster Computers Craig Ulmer Center for Experimental Research in Computer Systems A Doctoral Thesis.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
© Copyright Xilinx 2004 All Rights Reserved 9 November, 2004 XUP Virtex-II Pro Development System.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Extensible Message Layers for Multimedia Cluster Computers Dr. Craig Ulmer Center for Experimental Research in Computer Systems.
Network Intrusion Detection Systems on FPGAs with On-Chip Network Interfaces Christopher ClarkGeorgia Institute of Technology Craig UlmerSandia National.
Computer Processing of Data
SLAAC Hardware Status Brian Schott Provo, UT September 1999.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
SLAAC SV2 Briefing SLAAC Retreat, May 2001 Heber, UT Brian Schott USC Information Sciences Institute.
AT94 Training 2001Slide 1 AT94K Configuration Modes Atmel Corporation 2325 Orchard Parkway San Jose, CA Hotline (408) OR.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963.
J. Christiansen, CERN - EP/MIC
Micro-Research Finland Oy Components for Integrating Device Controllers for Fast Orbit Feedback Jukka Pietarinen EPICS Collaboration Meeting Knoxville.
Reconfigurable Computing: FPGAs for Ultrascale Science Sandia National Laboratories Keith UnderwoodSNL/NM Craig Ulmer SNL/CA SOS-8 Workshop.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 EDK 7.1 Tutorial -- SystemACE and EthernetMAC on Avnet Virtex II pro Development Boards Chia-Tien Dan Lo Department of Computer Science University of.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Reconfigurable Computing Aspects of the Cray XD1 Sandia National Laboratories / California Craig Ulmer Cray User Group (CUG 2005) May.
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
Lecture 11: FPGA-Based System Design October 18, 2004 ECE 697F Reconfigurable Computing Lecture 11 FPGA-Based System Design.
Reconfigurable Computing Leveraging FPGA Accelerators in High-Performance Computing Applications Craig Ulmer June 2, 2005 Sandia is.
PART 7 CPU Externals CHAPTER 7: INPUT/OUTPUT 1. Input/Output Problems Wide variety of peripherals – Delivering different amounts of data – At different.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Modern FPGA architecture.
Raw Status Update Chips & Fabrics James Psota M.I.T. Computer Architecture Workshop 9/19/03.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Compute Node Tutorial(2) Agenda Introduce to RocketIO How to build a optical link connection Backplane and cross link communications How to.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
System on a Programmable Chip (System on a Reprogrammable Chip)
Status Report of the PC-Based PXD-DAQ Option Takeo Higuchi (KEK) 1Sep.25,2010PXD-DAQ Workshop.
Group Manager – PXI™/VXI Software
CS 286 Computer Organization and Architecture
Presentation transcript:

Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December 11, 2003

FPGAs are promising… But what’s the catch? There are three main challenges that need to be addressed in order to apply to practical, scientific computing.

RC Challenge #1: Floating Point Most FPGAs fine grained Floating point units are large –32b FP occupies ~1,000 CLBs –Commercial capacity improving 2000: 6,000 CLBs 2003: 40,000 CLBs (Max: 220,000) Keith Underwood at Sandia/NM –LDRD: Working on high-speed 64b floating-point cores 32b FP in Xilinx V2P7

RC Challenge #2: Design Tools Hardware design is non-trivial –Micromanage computations, clock-by-clock –Not appropriate for most scientists –Need languages, APIs that are easy to use Maya Gokhale at LANL –Streams-C: C-like language for HW design –Pipeline/unroll loops –Schedules access to external memory

RC Challenge #3: High-speed I/O FPGAs have large internal computational power –How do we get data into/out of FPGA? –How do we connect to our existing HPC machines? Mitch Sukalski, David Thompson, Craig Ulmer –LDRD: Connect FPGAs to high-performance SANs FPGA

Outline Where we have been Networking FPGAs using external NI cards Where we are going Networking FPGAs using internal transceivers Project status Early details

Previous Work Where we’ve been..

Networking Earlier FPGAs Previous generation of FPGAs were like blank ASICs –Configurable logic and pins Attach a network card to an FPGA card –Communication over PCI Examples: –Virginia Tech:Myrinet –Washington U. in St. Louis: ATM (inline) –Clemson University: Gigabit Ethernet –Georgia Tech: Myrinet CPU FPGA NIC PCI Bus

GRIM Project at Georgia Tech Add multimedia devices to cluster –Message layer connects CPUs, memory, and peripherals –Myrinet between hosts, PCI within hosts Celoxica RC-1000 FPGA –Virtex FPGA (1M logic gates) –Four SRAM banks –PCI w/ PMC SRAM 0 SRAM 1 SRAM 2 SRAM 3 PCI FPGA Control & Switching CPU FPGA RAID FPGA Ethernet GRIM

FPGA Organization Frame Incoming Message Queues Outgoing Message Queues Communication Library API Application Data Memory API FPGA Card Memory FPGA Circuit Canvas User Circuit API User Circuit n User Circuit 1

Lessons Learned Frame provides simple OS –Isolates users from board –Portability Dynamically manage resources –Card memory –Computational circuits PCI bottleneck –Distance between NI and FPGA –PCI difficult to work with Page A SRAM 1 Page B SRAM 2 Host CPU FPGA Circuit X Circuit Y Circuit E Circuit F Circuit G Function Fault Message: Use Circuit F on $C Page Fault Page C NIC

Network Features of Recent FPGAs Where we’re going…

FPGA Network Improvements Recent FPGAs have special, built-in cores –High-speed transceivers, dedicated processors Idea: Build our NI inside the FPGA –FPGA becomes a networked, compute resource –Removes the PCI bottleneck FPGA NI Tx Rx NI Tx Rx User-defined Computational Circuits CPU NIC System Area Network CPU NIC CPU NIC

Xilinx Virtex-II/Pro FPGA Up to 4 PowerPC405 cores –Embedded version of PPC – MHz Multiple gigabit transceivers –Run at 600Mbps to 3.125Gbps –Up to twenty-four transceivers Additional cores –Distributed internal memory –Arrays of 18b multipliers –Digital clock multipliers, PLLs Xilinx V2P20

Multi-Gigabit Transceivers: Rocket I/O Flexible, high-speed transceivers –Can be configured to connect with different physical layers –InfiniBand, GigE, FC, 10GigE, Aurora –Note: low-level interface (commas, disparity, clock mismatches) FPGA Fabric Serializer Deserializer Tx FIFO 8B/10B Encoder CRC 8B/10B Decoder Rx Elastic Buffer Clock Recover CRC check PIN FPGA Fabric Rocket I/O PIN Rocket I/O PIN Rocket I/O PIN

Why MGTs are Important Direct connection to networks –Same chip, different network –Remove PCI from equation Fast connections between FPGAs –Reduces analog design issues –Chain FPGAs together –Reduce pin count Update: Virtex II/ProX –Now Gbps – Gbps –Chips have either 8 or 20 transceivers Gbps over 44” FR4 * * From Xilinx,

Hard PowerPC Core PowerPC 405 –16KB Instruction / 16KB Data caches –Real and Virtual memory modes –GCC is available Multiple memory ports for core –On-chip memory (OCM) –Processor Local Bus (PLB) User-defined memory map –Connect memory blocks or cores –External memory cores available Processor Local Bus (PLB) PowerPC I-CacheD-Cache On-Chip Memory (OCM) Interface

System on a Chip (SoC) Commercial SoC –Designing with cores –Customize system New tools –Rapidly connect cores –Library of cores & buses –Saves on wiring legwork Xilinx Platform Studio

Current Status Exploring V2P –New architecture, new tools Two reference boards –ML300 (V2P7-6) –Avnet (V2P20-6) Transceiver work –Raw transmission over fiber –Working towards IB