OpenSPARC-Xilinx Collaboration Durgam Vahia Paul Hartke OpenSPARC.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
1 CDSC CHP Prototyping Yu-Ting Chen, Jason Cong, Mohammad Ali Ghodrat, Muhuan Huang, Chunyue Liu, Bingjun Xiao, Yi Zou.
Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung,
© ABB Group Jun-15 Evaluation of Real-Time Operating Systems for Xilinx MicroBlaze CPU Anders Rönnholm.
Embedded Systems Programming
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 13 Direct Memory Access (DMA)
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Computer Architecture Lab at 1 P ROTO F LEX : FPGA-Accelerated Hybrid Functional Simulator Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi,
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
IO Controller Module Arbitrates IO from the CCP Physically separable from CCP –Can be used as independent data logger or used in future projects. Implemented.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
VirtexIIPRO FPGA Device Functional Testing In Space environment. Performed by: Mati Musry, Yahav Bar Yosef Instuctor: Inna Rivkin Semester: Winter/Spring.
Configurable System-on-Chip: Xilinx EDK
VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
OpenSPARC Program – Updates Thomas Thatcher OpenSPARC Engineering RAMP Retreat – January 2009, Berkeley.
CS252 Project Presentation Optimizing the Leon Soft Core Marghoob Mohiyuddin Zhangxi TanAlex Elium Dept. of EECS University of California, Berkeley.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
Technion Digital Lab Project Xilinx ML310 board based on VirtexII-PRO programmable device Students: Tsimerman Igor Firdman Leonid Firdman Leonid.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Switch EECS 252 – Spring 2006 RAMP Blue Project Jue Sun and Gary Voronel Electrical Engineering and Computer Sciences University of California, Berkeley.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Multithreaded SPARC v8 Functional Model for RAMP Gold Zhangxi Tan UC Berkeley RAMP Retreat, Jan 17, 2008.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Mid-Term Presentation Performed by: Roni.
OpenSPARC T1 on Xilinx FPGAs – Updates Thomas ThatcherPaul Hartke OpenSPARC Engineering Xilinx University.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation
Handheld Devices (portable but still explicit usage) Laptops Personal Digital Assistants (Palm, PocketPC) TabletPC Smart Phones.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Hardware Design This material exempt per Department of Commerce license exception TSU.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
© 2004 Xilinx, Inc. All Rights Reserved EDK Overview.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
© 2007 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Hardware Design INF3430 MicroBlaze 7.1.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
LAB1 Summary Zhaofeng SJTU.SOME. Embedded Software Tools CPU Logic Design Tools I/O FPGA Memory Logic Design Tools FPGA + Memory + IP + High Speed IO.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Implementing Memory Protection Primitives on Reconfigurable Hardware Brett Brotherton Nick Callegari Ted Huffmire.
Computer Engineering 1502 Advanced Digital Design Professor Donald Chiarulli Computer Science Dept Sennott Square
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
The Alpha – Data Stream Matt Ziegler.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
System on a Programmable Chip (System on a Reprogrammable Chip)
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
AndesCoreTM N1213-S
Combining Simulators and FPGAs “An Out-of-Body Experience”
Chip&Core Architecture
Presentation transcript:

OpenSPARC-Xilinx Collaboration Durgam Vahia Paul Hartke OpenSPARC Engineering Xilinx University Program (XUP) RAMP Retreat, UC Berkeley, January 2007

OpenSPARC-Xilinx Collaboration 2 Agenda Goals OpenSPARC T1 – Quick Recap What we have been up to – T1 on FPGAs Current Status and Results Road-map

OpenSPARC-Xilinx Collaboration 3 Big Goals Proliferation of Sun OpenSPARC technology Proliferation of Xilinx FPGA technology – Make OpenSPARC FPGA-friendly – Create reference design with complete system functionality and proven path to hardware – Boot Solaris/Linux on the reference design – Open it up.. – Seed ideas in the community Significant enabler for future research in multi-core

OpenSPARC-Xilinx Collaboration 4 What is OpenSPARC T1 SPARC V9 implementation Eight cores, four thread each - 32 simultaneous threads All cores connect through a 134.4GB/s crossbar switch High BW 12 way associative 3MB on-chip L2 cache 4 DDR2 channels (23 GB/s) 70W power ~300M transistors

OpenSPARC-Xilinx Collaboration 5 OpenSPARC T1: Design Choices Simpler core architecture to maximize cores on die Caches, DRAM channels shared across cores Shared L2 decreases cost of coherence misses significantly Crossbar good for b/w, latency and functional verification Double-click to add graphics

OpenSPARC-Xilinx Collaboration 6 OpenSPARC Core Four threads per core Single issue 6 stage pipeline 16KB I-cache, 8KB D-cache Unique resources per thread – Registers – Portions of I-fetch datapath – Store and Miss buffers Resources shared by 4 threads – Caches, TLBs, Execution units – Pipeline registers and DP IFU EXU MUL TRAP MMU LSU

OpenSPARC-Xilinx Collaboration 7 OpenSPARC Pipeline All processor IO (including interrupts) via Crossbar interface

OpenSPARC-Xilinx Collaboration 8 OpenSPARC T1 on FPGAs Create single core, single thread implementation of T1 for FPGAs Map it on Xilinx FPGA board and use board peripherals to build the working hardware system Boot commercial OS on it

OpenSPARC-Xilinx Collaboration 9 OpenSPARC FPGA Implementation Single core, single thread implementation of T1 – Small, clean and modular FPGA implementation About 39K 4-input LUTs, 123 BRAMs (synplicity on Virtex{2/2Pro/4}) Synchronous, no latches or gated clocks Better utilization of FPGA resources (BRAMs, Multiplier) – Functionally equivalent to custom implementation, except 8 entry Fully Associative TLB as opposed to 64 entry Removed Crypto unit (modular arithmetic operations)

OpenSPARC-Xilinx Collaboration 10 Single Thread T1 on FPGAs Functionally stable – Passing mini and full regressions Completely routed – No timing violations – Easily meets 20ns (50MHz) cycle time Expandable to more threads – Reasonable overhead for most blocks (~30% for 4 threads) – Some bottlenecks exist (Multi-port register files)

OpenSPARC-Xilinx Collaboration 11 System Block Diagram SPARC T1 Core processor-to- crossbar interface (PCX) Microblaze Proc Fast Simplex Links interface (FSL) PCX-FSL Interposer External DDR2 Dimm MCH-OPB MemCon Microblaze Debug UART IBM Coreconnect OPB Bus SPARC T1 UART 10/100 Ethernet MultiPort Memory Controller FPGA Boundary Xilinx Embedded Developer’s (EDK) Design Block must be developed

OpenSPARC-Xilinx Collaboration 12 System Theory of Operation OpenSPARC T1 core communicates exclusively via the processor-to-crossbar interface (PCX) – PCX is a packet based interface Microblaze softcore will sit in a polling loop and accept these packets, perform any protocol conversion, and forward them to the appropriate peripheral – Could even implement floating point operations via the Microblaze FPU unit Microblaze will also poll (or accept interrupts from) the peripherals, convert the info to a PCX packet, and forward it to the PCX interface – Microblaze has its own UART for its own diagnostic input/output

OpenSPARC-Xilinx Collaboration 13 Implementation Results XC4VFX100-11FF1152 FPGA – 42,649/84,352 LUT4s (50%) – 131/376 BRAM-16kbits (34%) – 50MHz operation Have not attempted any faster – Synplicity Synthesis: 25 minutes – Place and Route: 42 minutes (Microblaze & Related Logic)

OpenSPARC-Xilinx Collaboration 14 Preliminary Virtex5 Results Virtex5 xc5vlx110tff1136 – Same as Bee3 FPGA 30,508/69,120 LUT6s (44%) 119/148 BRAM-36kbits (80%) – Working through mapping issues… 50MHz placed and routed design – Have not attempted any faster

OpenSPARC-Xilinx Collaboration 15 OpenSPARC FPGA HW Roadmap Current reference design occupies about 45% of XC4V100FX FPGA. This design includes – Single core, single thread of OpenSPARC T1 – Microblaze to communicate with peripherals (DRAM, Ethernet) – Glue logic to connect T1 core with Microblaze More design paths exist, e.g. 1) Two single thread cores in single FPGA 2) Up to 4 threads per FPGA

OpenSPARC-Xilinx Collaboration 16 OpenSPARC FPGA SW Roadmap Boot Solaris and Linux on a single thread FPGA version of the design – Include support for all packet types with Microblaze – Hypervisor changes to support this variant of T1 Reduction in TLB size – Device driver support for the system – Emulation routines in OS for floating point operations Mainly for ISA compliance

OpenSPARC-Xilinx Collaboration 17 Reference Design ml410 board with Virtex4-100 FPGA (aka ml411) – Bit file and elf is stored on CompactFlash card Each design is a hardware implementation of one regression suite test – Microblaze soft-core sends the test packets to the OpenSPARC core and verifies the return packets

OpenSPARC-Xilinx Collaboration 18 All of this will be available under GPL(2) license – Complete verilog code of FPGA T1 and glue logic to Microblaze – Synplicity scripts for synthesis – The whole reference design

OpenSPARC-Xilinx Collaboration 19 (2) Verification Environment – Very very important – Change and VERIFY – Scripts for running regression in three modes Chip8 – Full-chip test-suit Core1 – Single core (four threads) test-suit Thread1 – Single core, Single thread test-suit for FPGA version – Supports Synopsys VCS and Cadence NC-Verilog Considering supporting Mentor ModelSim as well Bring down as many barriers as possible

OpenSPARC-Xilinx Collaboration 20 Development Team Sun OpenSPARC Team – Durgam Vahia – Ismet Bayraktaroglu – Thomas Thatcher Xilinx University Program – Paul Hartke

OpenSPARC-Xilinx Collaboration 21 OpenSPARC & Xilinx FPGAs!!