Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung,

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
G Robert Grimm New York University Virtual Memory.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.
Computer Architecture Lab at 1 P ROTO F LEX : FPGA-Accelerated Hybrid Functional Simulator Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi,
G Robert Grimm New York University Disco.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
CS252 Project Presentation Optimizing the Leon Soft Core Marghoob Mohiyuddin Zhangxi TanAlex Elium Dept. of EECS University of California, Berkeley.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.
GSRC Annual Symposium Sep 29-30, 2008 Full-System Chip Multiprocessor Power Evaluations Using FPGA-Based Emulation Abhishek Bhattacharjee, Gilberto Contreras,
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
Computer Architecture Lab at 1 ProtoFlex: Status Update and Design Experiences Eric S. Chung, Michael Papamichael, Eriko Nurvitadhi, James C. Hoe, Babak.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
Peter S. Magnusson, Magnus Crhistensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Högberg, Frederik larsson, Anreas Moestedt. Presented.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Cpr E 308 Input/Output Recall: OS must abstract out all the details of specific I/O devices Today –Block and Character Devices –Hardware Issues – Programmed.
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Input/Output CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent University.
DMA Versus Polling or Interrupt Driven I/O
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
Computer Architecture Lab at 1 FPGAs and Bluespec: Experiences and Practices Eric S. Chung, James C. Hoe {echung,
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Organisasi Sistem Komputer Materi VIII (Input Output)
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
1.4 Hardware Review. CPU  Fetch-decode-execute cycle 1. Fetch 2. Bump PC 3. Decode 4. Determine operand addr (if necessary) 5. Fetch operand from memory.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
(1) SIMICS Overview. (2) SIMICS – A Full System Simulator Models disks, runs unaltered OSs etc. Accuracy is high (e.g., pollution effects factored in)
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Computer Architecture Lab at ProtoFlex: An Architectural Exploration Vehicle Using FPGA-Accelerated Full-System Multiprocessor Simulations Eric S. Chung,
Lecture 2. A Computer System for Labs
Virtualization.
Virtual Machine Monitors
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Presented by Yoon-Soo Lee
Operating Systems (CS 340 D)
CS 286 Computer Organization and Architecture
CSCE 212 Chapter 4: Assessing and Understanding Performance
Derek Chiou The University of Texas at Austin
OS Virtualization.
Address Translation for Manycore Systems
Combining Simulators and FPGAs “An Out-of-Body Experience”
ProtoFlex Tutorial: Full-System MP Simulations Using FPGAs
A High Performance SoC: PkunityTM
CSC3050 – Computer Architecture
Chapter 13: I/O Systems.
Presentation transcript:

Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung, bgold, jhoe, S IM F LEX /P ROTO F LEX

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 2 The RAMP full-system challenge RAMP vision for studying systems w/ FPGAs –functional & cycle-accurate simulation –scalability, speed, & flexibility on FPGAs –full-system (run unmodified binaries & OS) PCI Bus Ethernet controller Graphics card I/O MMU controller Disk DMA controller IRQ controller Terminal Memory SCSI controller CPU ‘Full-sys’ RAMP will incur large effort yet, not all behaviors frequently used (e.g., I/O)   

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 3 Simulators already provide full-system  why not simulate infrequent behaviors (e.g., I/O devices)? Combining simulators & FPGAs Advantages –avoid impl. infreq. behaviors  lowers full-sys FPGA development –low impact on scalability & perf. on FPGA Memory SCSI disk SCSI disk FPGA Simulator CPU Ethernet CPU

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 4 Outline Motivation Migration Implementation status Conclusion

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 5 Migration 3 ways to map target object to host FPGA-only Simulation-only Migratable Migratable objects –switch modes between FPGA & simulator hosts –target behavior need not be 100% in FPGA mode e.g., impl. 80% target behavior in FPGA, 100% in simulator Target design FPGASimulator “Target objects” ex: func or timing cpu

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 6 Migration example Target-to-host mappings: CPU = migratable Memory = FPGA-only Devices = SW-only Memory SCSI disk SCSI FPGA Simulator CPU time load Example CPU instruction stream CPU add multiply I/O SCSI cmd add sub.. SCSI cmd CPU state transfer load CPU

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 7 Advantages Lowers development effort –avoid bring-up of infrequent behaviors –migrate & validate ref. models from simulator –tailor impl. to workload (avoid rarely used instrs, good for CISC x86) Fast & scalable –perf-critical objects on FPGA (eg, CPU, memory) –scalable for MPs  add migratable CPUs Memory SCSI FPGA Simulator CPU Memory CPU SCSI disk CPU

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 8 Subtleties Objects separated in simulator/FPGA interact –examples: interrupts, DMA –handle by forwarding messages between FPGA/simulator –FPGA-only & SW-only mapped objects easy to locate –migrated objects require tracking Memory SCSI disk SCSI FPGA Simulator CPU DMA Forwarded DMA

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 9 Subtleties Objects separated in simulator/FPGA interact –examples: interrupts, DMA –handle by forwarding messages between FPGA/simulator –FPGA-only & SW-only mapped objects easy to locate –migrated objects require tracking Memory SCSI disk SCSI FPGA Simulator CPU Interrupt Option 1: Forwarded interrupt Option 2: Forced migration Cross-host interactions rare  low impact on FPGA perf.

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 10 Subtleties cont. Migration cost –migrating object requires state copy e.g., migratable CPU has registers & TLBs –FPGA-to-simulator latency & sim. time limits # migrations/instr FPGA & simulator asynchrony –simulated time “ticks” at different rates in FPGA & simulator –must synchronize for deterministic replay & accurate device timing

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 11 Outline Motivation Migration Implementation in progress Conclusion

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 12 Implementation status Target system –Sun Fire[tm] 3800 Server (up to 24-way) –UltraSPARC III ISA –Solaris 8 Proof-of-concept software-to-software migration –run 2 instances of Virtutech Simics –migration designed & tested in 2 weeks –can migrate on arbitrary behavior (e.g., ADD instruction)

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 13 BlueSPARC core (in progress) In-order SPARCV9 core –supports 144 out of 170 integer instr behaviors –supports partial MMU w/ I- & D-TLBs –goal: % of instrs & behaviors in target workloads SPEC (mostly user-level), OLTP/DB2 (high TLB misses, 40% time in priv-mode) –CPI ranges 5 to 7 cycles –synth: 15k LUTs on Virtex-II Pro 30, 85MHz, 12MIPS (worst-case) –developed in Bluespec HDL, 6000L in 6 weeks Core validation –run RTL in lockstep w/ Simics’s UltraSPARC simulation model –workload validation w/ SPEC, OLTP/DB2, OpenSPARC verif. suite

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 14 Migration on FPGA (in progress) Xilinx XUP Virtex-II Pro 30 Virtutech Simics Migration & message interface PowerPC functions –core & memory initialization from Simics checkpoints –facilitates migration for BlueSPARC –connects simulated devices to memory (e.g., SCSI DMA) ethernet Simics UltraSPARC Simulated target devices BlueSPARC PowerPC DDR memory

June 22, 2006Eric S. Chung / RAMP 2006 Summer Retreat 15 Conclusion Contributions –virtualizes infrequent behaviors using simulation –simplifies full-system FPGA emulator, still fast/scalable –incremental validation from reference system Future work –support migration in RDL? –adding cores + scaling across multiple FPGAs We are ready for BEE2 Thanks! Questions? P ROTO F LEX /S IMFLEX (