© 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Data Communications and Networking
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
1 RAMP Implementation J. Wawrzynek. 2 RDL supports multiple platforms:  XUP, pure software, BEE2 BEE2 will be the standard RAMP platform for the next.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
Handheld TFTP Server with USB Andrew Pangborn Michael Nusinov RIT Computer Engineering – CE Design 03/20/2008.
RAMP BLUE: Double-Floating Point Coprocessor Mitch Harwell David Tylman.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Porting Plan 9 to the PowerPC Architecture Ian Friedman Ajay Surie Adam Wolbach.
OpenSPARC-Xilinx Collaboration Durgam Vahia Paul Hartke OpenSPARC.
VirtexIIPRO FPGA Device Functional Testing In Space environment. Performed by: Mati Musry, Yahav Bar Yosef Instuctor: Inna Rivkin Semester: Winter/Spring.
© 2006 Regents University of California. All Rights Reserved RAMP Blue: A Message-Passing Many-Core System in FPGAs ASPLOS Tutorial/Workshop March 2nd,
Configurable System-on-Chip: Xilinx EDK
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Students:Gilad Goldman Lior Kamran Supervisor:Mony Orbach Mid-Semester Presentation Spring 2005 Network Sniffer.
© 2006 Regents University of California. All Rights Reserved RAMP Blue: A Message Passing Multi-Processor System on the BEE2 Andrew Schultz and Alex Krasnov.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
1 RAMP Automated and Reliable Trace Support Marghoob Mohiyuddin – Matthew Brockmeyer – CS252 Spring 2006.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Switch EECS 252 – Spring 2006 RAMP Blue Project Jue Sun and Gary Voronel Electrical Engineering and Computer Sciences University of California, Berkeley.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Group 7 Jhonathan Briceño Reginal Etienne Christian Kruger Felix Martinez Dane Minott Immer S Rivera Ander Sahonero.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Study of AES Encryption/Decription Optimizations Nathan Windels.
Cortex-M3 Debugging System
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Beagle Board Fast Boot Hui Chen Keji Ren Dec 10 th, 2009 EE382N-4 Project.
Cisco Router Hardware Software overview. In this lecture we will investigate an overview of Cisco router hardware and software. We will first turn our.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
© 2007 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Hardware Design INF3430 MicroBlaze 7.1.
J. Christiansen, CERN - EP/MIC
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
A Profiler for a Multi-Core Multi-FPGA System by Daniel Nunes Supervisor: Professor Paul Chow September 30 th, 2008 University of Toronto Electrical and.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Processes Introduction to Operating Systems: Module 3.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
12/8/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
By Fernan Naderzad.  Today we’ll go over: Von Neumann Architecture, Hardware and Software Approaches, Computer Functions, Interrupts, and Buses.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
IT3002 Computer Architecture
Somervill RSC 1 125/MAPLD'05 Reconfigurable Processing Module (RPM) Kevin Somervill 1 Dr. Robert Hodson 1
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Constructive Computer Architecture Tutorial 4: Running and Debugging SMIPS Andy Wright TA October 10, 2014http://csg.csail.mit.edu/6.175T04-1.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
2.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition System Programs (p73) System programs provide a convenient environment.
2/19/2016http://csg.csail.mit.edu/6.375L11-01 FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Embedded Systems Design with Qsys and Altera Monitor Program
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative.
Introduction to Operating Systems Concepts
Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Derek Chiou The University of Texas at Austin
Presentation transcript:

© 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 3 © 2006 Regents University of California. All Rights Reserved Contributors Andrew Schultz Dave Patterson, and the Spring 2006 CS252 (grad computer architecture) class: Mitch Harwell David Tylman Xiaofen Jiang Neelima Balakrishnan Khang Tran Matt Brockmeyer Marghoob Mohiyuddin Jue Sun Zhangxi Tan Wei Xu Gary Voronel Luke Beamer

MIT Workshop - RAMP Blue Status 4 © 2006 Regents University of California. All Rights Reserved Outline Review of project goal and requirements RAMP Blue Architecture – Design principles – Processor infrastructure – Network interface and on-chip switch – Double precision floating point – Software support Implementation experience Future work

MIT Workshop - RAMP Blue Status 5 © 2006 Regents University of California. All Rights Reserved Project Goal and Requirements Goal: 1000 node cluster of MicroBlaze cores running uClinux and real MPI benchmarks Requirements: – Infrastructure to boot uClinux on MicroBlaze cores situated on BEE2 user FPGAs – Double precision floating point unit for real MPI benchmarks – On-chip switch capable of routing packets between FPGAs on and off module – Port of message passing framework (MPI, UPC, etc.)

MIT Workshop - RAMP Blue Status 6 © 2006 Regents University of California. All Rights Reserved 2VP70 FPGA 2VP70 FPGA 2VP70 FPGA 2VP70 FPGA 2VP70 FPGA Per-module: – 5 Virtex-IIPro70 FPGAs – 20GB DRAM – 20 10Gbps connections Supports 10GigE/Infinibnd System I/O Inter-mod connections RAMP-blue – maps target MBs to four “user” FPGAs, and hard PowerPC on “control” FPGA as host maintenance processor. BEE2 Module Design

MIT Workshop - RAMP Blue Status 7 © 2006 Regents University of California. All Rights Reserved Andrew’s Design Principles KISS: We tried to keep everything simple. Don’t over-engineer the network, FPU, or infrastructure until we have a working design. Share the wealth: Resources are tight and MicroBlazes are wimpy. Share infrastructure such as interchip pins, memory controllers, and even FPUs. Cut the fat: Wherever possible take care to remove unnecessary logic and interfaces not required by MicroBlaze in this context. FSL everywhere: FSL is simply FIFO based communication (very similar to very basic RAMP channel). Ease routing and provide easy migration to RDL

MIT Workshop - RAMP Blue Status 8 © 2006 Regents University of California. All Rights Reserved Processor Interfaces

MIT Workshop - RAMP Blue Status 9 © 2006 Regents University of California. All Rights Reserved Console Network Console network serves several purposes – Download application/kernel from control FPGA – Provide terminal to booted uClinux – Network conduit to route packets from MB to control FPGA (or even off board via 10/100 Ethernet) Simple, general purpose, FSL based network with OPB FIFO attachment at PPC Linux driver for TTY, char device, and Ethernet abstraction

MIT Workshop - RAMP Blue Status 10 © 2006 Regents University of California. All Rights Reserved MB/MB Network Interface Current network interface is raw FSL connected directly to a on- chip switch – Interrupt driven, programmed I/O approach – Simple Linux driver provides Ethernet interface so applications can utilize network via tradition socket interface – Very inefficient, yet very simple for first network implementation Discussion and paper design of second generation network interface – Direct memory access through direct port to memory controller – Possible RDMA support for UPC as well

MIT Workshop - RAMP Blue Status 11 © 2006 Regents University of California. All Rights Reserved On-Chip Switch Switch provides drop-free transmission of variable length packets from MB to MB Composed of two units: buffer unit and switch – Buffer unit provides buffering at each hop and address lookup logic – Switch provides cross-bar connectivity between input ports and output ports and arbitration for each port Packets are source routed (currently encapsulated Ethernet packets) CRCs are end-to-end, so end-points must manage retransmits or fail-stop

MIT Workshop - RAMP Blue Status 12 © 2006 Regents University of California. All Rights Reserved Double Precision FPU FPU is treated as a co-processor – Investigation into integrating FPU with RF as SP FPU does was too complicated and didn’t facilitate sharing Operands are transferred via FSL in four instructions, and MB blocks for result FPU is highly pipelined so to better utilize it makes sense to share (and saves loads of resources)

MIT Workshop - RAMP Blue Status 13 © 2006 Regents University of California. All Rights Reserved Initial FPU Performance MicroBlaze FP emulation MicroBlaze DP FPU Sun 386/250 Mitch Harwell & David Tylman 2D FFT (ffbench) Execution Times

MIT Workshop - RAMP Blue Status 14 © 2006 Regents University of California. All Rights Reserved Main Memory Clusters of MBs share a single physical DIMM (1GB) Memory is partitioned so each core has its own physical address space

MIT Workshop - RAMP Blue Status 15 © 2006 Regents University of California. All Rights Reserved Other Infrastructure Bootstrapping: Reduced boot-strap block RAM from four to one and fit simple boot-loader and cache-invalidation code in single, read-only BRAM. Peripherals: Remove the OPB bus and port interrupt controller and timer to LMB to save logic. Pending. Debugging: Using existing opb_mdm core and JTAG we can use existing debugging infrastructure (i.e. XMD/GDB) to debug up to 8 cores. Group of students also worked on ideas for real time instruction tracing.

MIT Workshop - RAMP Blue Status 16 © 2006 Regents University of California. All Rights Reserved Software Support MBs boots relatively unmodified version of uClinux and runs stably MPICH2 has been successfully compiled and run on an XUP test system with a pair MicroBlaze cores UPC has also been built and run on a XUP test system GCC has been modified to emit instructions that utilize double precision FPU co-processor (when SOFTFPU flag turned on) Currently finishing up final modifications to first network driver to allow proper source routing of packets between FPGAs and to other BEE2 boards

MIT Workshop - RAMP Blue Status 17 © 2006 Regents University of California. All Rights Reserved RAMP Blue FPGA Floor-plan

MIT Workshop - RAMP Blue Status 18 © 2006 Regents University of California. All Rights Reserved Implementation Experience System with 8 MicroBlaze cores per user FPGA running on the BEE2 – This system has the integrated SP FPU per core, we haven’t yet integrated DP FPU core into this base system, although we expect fewer resources with sharing (each SP FPU is ~1300 slices and the DP FPU is ~2000 slices) Early attempts to implement a 16 MicroBlaze system have failed in placement, although there are enough raw resources – We expect that with some simple floor planning we should be able to reach a 16 core system

MIT Workshop - RAMP Blue Status 19 © 2006 Regents University of California. All Rights Reserved 8 MicroBlaze Cores (SP FPU each) 4 Memory Controllers 4 XAUI Controllers 4-LUTs: 40,625 out of 66,176 61% FFs: 27,085 out of 66,176 40% BRAMs and MULTs: 116 out of % 56 out of %

MIT Workshop - RAMP Blue Status 20 © 2006 Regents University of California. All Rights Reserved Near-term Work Improve density to get 16 core system – Analysis of data paths and floor planning should allow us to increase the density of cores since current design does very little deliberate area optimization – Integrate shared DP FP core Convert design to RDL – Present design is XPS only (however it is fully parametrized with embedded TCL to allow fast changes to topology) – Have version of network switch in RDL, need to wrap the rest in RDL Improve performance of known bottlenecks – Second generation NIC with direct memory access to take load off MB – Add buffering of FPU operands to allow single cycle sharing of FPU

MIT Workshop - RAMP Blue Status 21 © 2006 Regents University of California. All Rights Reserved Spares

MIT Workshop - RAMP Blue Status 22 © 2006 Regents University of California. All Rights Reserved Processor Infrastructure Key components are required for each processor – MicroBlaze core – Console interface – Network interface – Floating point unit – Memory interface – Debug port – Miscellaneous infrastructure (timer, interrupt controller) Build one and then replicate, connect with on-chip switch

MIT Workshop - RAMP Blue Status 23 © 2006 Regents University of California. All Rights Reserved Farther Future Work System level possibilities? – Hardware paging of memory (ala VMWare ESX) to better utilize memory capacity and allow content based sharing – Coherent shared memory between MicroBlazes – More exploration of tracing and system level debugging Networking possibilities? – Highly FPGA optimized switch – More complicated routing mechanisms

MIT Workshop - RAMP Blue Status 24 © 2006 Regents University of California. All Rights Reserved Conclusion Close to functional multi-MB system: – Successfully provided infrastructure to boot multiple MicroBlaze cores on single FPGA with full uClinux support – Determined ease of porting and running MPI and UPC on uClinux – Areas targeted for both performance increase (network interface, FPU integration) and on-chip density