ASPLOS ’08 Ramp Tutorial BEE3 Update Chuck Thacker John Davis Microsoft Research 2 March 2008.

Slides:



Advertisements
Similar presentations
January 2008 RAMP Retreat BEE3 Update Chuck Thacker John Davis Microsoft Research Chen Chang BWRC/BEECube 16 January 2008.
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Sundance Multiprocessor Technology SMT702 + SMT712.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
June 2007 RAMP Tutorial BEE3 Update Chuck Thacker John Davis Microsoft Research 10 June, 2007.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Ramp august 2008 retreat Xilinx RAMP donations Kees Vissers Paul Hartke Xilinx Research.
IO Controller Module Arbitrates IO from the CCP Physically separable from CCP –Can be used as independent data logger or used in future projects. Implemented.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
Configurable System-on-Chip: Xilinx EDK
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
January 2007 RAMP Retreat BEE3 Update Chuck Thacker Technical Fellow Microsoft Research 11 January, 2007.
Computer Organization and Assembly language
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
5.1 Chaper 4 Central Processing Unit Foundations of Computer Science  Cengage Learning.
General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation
Micro controllers A self-contained system in which a processor, support, memory, and input/output (I/O) are all contained in a single package.
General FPGA Architecture Field Programmable Gate Array.
Out-of-Order OpenRISC 2 semesters project Semester A: Implementation of OpenRISC on XUPV5 board Final A Presentation By: Vova Menis-Lurie Sonia Gershkovich.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Out-of-Order OpenRISC 2 semesters project Semester A: Implementation of OpenRISC on XUPV5 board Midterm Presentation By: Vova Menis-Lurie Sonia Gershkovich.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
Invitation to Computer Science 5th Edition
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
BEE3 Updates June 13 th, 2007 Chuck Thacker, John Davis Microsoft Research Chen Chang UC Berkeley.
SKILL AREA: 1.2 MAIN ELEMENTS OF A PERSONAL COMPUTER.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the system bus and bus protocol Describe how the CPU and bus interact with peripheral devices.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
INTRODUCTION TO PIC MICROCONTROLLER. Overview and Features The term PIC stands for Peripheral Interface Controller. Microchip Technology, USA. Basically.
MICROOCESSORS AND MICROCONTROLLER:
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
IT3002 Computer Architecture
Computer operation is of how the different parts of a computer system work together to perform a task.
Lecture 7: Overview Microprocessors / microcontrollers.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
بسم الله الرحمن الرحيم MEMORY AND I/O.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Corflow Online Tutorial Eric Chung
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
Primary Storage The Triplets – ROM & RAM & Cache.
Maj Jeffrey Falkinburg Room 2E46E
COMP541 Memories II: DRAMs
Popular Microcontrollers and their Selection by Lachit Dutta
The Triplets – ROM & RAM & Cache
Introduction to Computing
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Memory Organization.
Chapter 2: Operating-System Structures
Java Programming Introduction
Take out a piece of paper
Chapter 2: Operating-System Structures
Lecture 2 (Memory) Computer Organization and Assembly Language. (CSC-210) Dr. Mohammad Ammad uddin.
Presentation transcript:

ASPLOS ’08 Ramp Tutorial BEE3 Update Chuck Thacker John Davis Microsoft Research 2 March 2008

ASPLOS ’08 Ramp Tutorial Outline BEE3 Overview BEE3 Status BEE3 Gateware Moving forward

ASPLOS ’08 Ramp Tutorial BEE3 System

ASPLOS ’08 Ramp Tutorial BEE3 Package

ASPLOS ’08 Ramp Tutorial BEE3 Tidbits Design uses essentially every pin on the chip. Design was done to be “PC-like” to leverage PC economies: –PWB is about half the area of BEE2. –PWB is 18 layers rather than 22 for BEE2. –Uses PC power and peripherals. System is divided into main board plus a separate (and separately designed) Control Board. –Allow designs to proceed in parallel at Celestica and BWRC, and reduced the risk of having to spin the (expensive) main board. –Control board has JTAG, and Flash for bitstreams and boot flash for each FPGA. Can operate without it. The use of pros for PCB and mechanical design was an enormous win. –Celestica’s design was 100% correct, and five systems worked with only one problem (which was easily corrected). –Took (probably) half the time, to produce something much more manufacturable and robust (and therefore cheaper).

ASPLOS ’08 Ramp Tutorial BEE3 Subsystems

ASPLOS ’08 Ramp Tutorial BEE3 Control Board

ASPLOS ’08 Ramp Tutorial Project Participants and Roles Microsoft Research (Silicon Valley) –Funds, manages system engineering, does some gateware Celestica (Ottawa and Shanghai) –Did main board engineering, prototype fabrication –Microsoft has a very deep relationship with Celestica BEECube –Builds and delivers functioning systems Function Engineering (Palo Alto) –Did thermal and mechanical engineering Xilinx (San Jose) –Provides FPGAs for academic machines –Provides FPGA application expertise Ramp Group (BWRC) –Control board, basic software Ramp Community –Uses the systems for research –Expanding to industrial users (e.g., us)

ASPLOS ’08 Ramp Tutorial BEE3 Status All subsystems work! Board spin is required to correct MGT placement. –10 Gbit channels require long routing. Due to lack of information from Xilinx, not Celestica’s error. –Respin is in progress. ETA for final board is 1 May.

ASPLOS ’08 Ramp Tutorial BEE3 Gateware Today, consists primarily of test and characterization routines. –Much of this was ported from BEE2, although some is new: –DDR2 Controller –Control RISC MS designs use a minimal subset of the Xilinx tool suite: –Just ISE, ChipScope, and (soon) Data2MEM. –May need EDK, but not yet.

ASPLOS ’08 Ramp Tutorial DDR2 Controller Largest piece of new Gateware. – 5 Modules, ~2000 lines of Verilog. Supports 2 4GB DIMMS/channel, 2 channels per FPGA. Transfers are DDR 400 (5ns clock) with -2. Supports only x4 registered DIMMs –Unbuffered DIMMs can’t work because of address/control loading. Handles all initialization, refresh, and calibration (semi) automatically. –Keeps track of up to 16 open banks/controller. Calibration is fast (768 clocks). – So can be done at frequent intervals or in response to single errors. Primary user commands are Read and Write: –Both deal with 36-byte blocks. Simple FIFO interfaces. Each channel is about 3% of the LX110T LUTs (no BRAMs).

ASPLOS ’08 Ramp Tutorial DDR Controller Organization Centralized main controller –Main control FSM –Address Fifo (64 30-bit command/addresses) –Open bank CAMs. –Clock generation, timing limit enforcement. Six replicated I/O pin bank logic: –Read and Write Fifos for 24 data bits (3 4-bit lanes, with one RAM chip/lane on each DIMM). –Calibration state machine, so that all 6 banks can calibrate in parallel.

ASPLOS ’08 Ramp Tutorial DDR Controller (simplified)

ASPLOS ’08 Ramp Tutorial Control RISC (TC4) 36 bits (memories are 36n bits wide) Harvard architecture –1K instruction memory (1 BRAM) –1K data memory (1 BRAM) –256 register 3-port register file (2 BRAMs) Very small (~100 slices) “Tiny Computer” –All instructions execute in three 5ns phases. No pipelining. Assembler, no C compiler. Sigh… So far, DRAM initialization, DRAM calibration, Control shell with UART interface.

ASPLOS ’08 Ramp Tutorial TC4

ASPLOS ’08 Ramp Tutorial Next Steps Use Data2Mem to speed up TC4 edit, assemble, load cycle time: –Currently takes 30 minutes, since we regenerate cores and rebuild entire design. –Should be a couple of minutes. Add DDR2 test system (LFSRs) to do full-speed testing with random addresses and data. Should be rock solid. Use Xilinx PlanAhead to lock the design so that it can be used as a component in larger designs. Develop an on-chip interconnect to allow multiple DDR2 requesters without needing huge cross-chip busses. Use BEE3 in our own research programs –A couple have already started. –This is the fun part. Building it was just work.

ASPLOS ’08 Ramp Tutorial Questions?