Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
Final Presentation Part-A
Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.
Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1.
Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung,
General information Course web page: html Office hours:- Prof. Eyal.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Operating System Support Focus on Architecture
Computer Architecture Lab at 1 P ROTO F LEX : FPGA-Accelerated Hybrid Functional Simulator Eric S. Chung, Eriko Nurvitadhi, James C. Hoe, Babak Falsafi,
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Technion Digital Lab Project Xilinx ML310 board based on VirtexII-PRO programmable device Students: Tsimerman Igor Firdman Leonid Firdman Leonid.
CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Digital System Design EEE344 Lecture 1 INTRODUCTION TO THE COURSE
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
CSE378 Gen. Intro1 Machine Organization and Assembly Language Programming Machine Organization –Hardware-centric view (in this class) –Not at the transistor.
Content Project Goals. Term A Goals. Quick Overview of Term A Goals. Term B Goals. Gantt Chart. Requests.
1 A Simple but Realistic Assembly Language for a Course in Computer Organization Eric Larson Moon Ok Kim Seattle University October 25, 2008.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
2015/10/22\course\cpeg323-08F\Final-Review F.ppt1 Midterm Review Introduction to Computer Systems Engineering (CPEG 323)
CSC321 Making a Computer Binary number system → Boolean functions Boolean functions → Combinational circuits Combinational circuits → Sequential circuits.
Computer Architecture Lab at 1 FPGAs and Bluespec: Experiences and Practices Eric S. Chung, James C. Hoe {echung,
Pipelining the Beta bet·ta ('be-t&) n. Any of various species
This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
TOPIC : SYNTHESIS INTRODUCTION Module 4.3 : Synthesis.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Alpha Supplement CS 740 Oct. 14, 1998
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
Digital System Design Verilog ® HDL Introduction to Synthesis: Concepts and Flow Maziar Goudarzi.
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
MIPS Pipeline and Branch Prediction Implementation Shuai Chang.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Computer Architecture Lecture 7: Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/30/2013.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
CS161 – Design and Architecture of Computer Systems
IP – Based Design Methodology
A Review of Processor Design Flow
Designing Customized ISA Processors using High Level Synthesis
Figure 1 PC Emulation System Display Memory [Embedded SOC Software]
Combining Simulators and FPGAs “An Out-of-Body Experience”
Systems Architecture I
COMS 361 Computer Organization
THE ECE 554 XILINX DESIGN PROCESS
Portable SystemC-on-a-Chip
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

Computer Architecture Lab at Building a Synthesizable x86 Eriko Nurvitadhi, James C. Hoe, Babak Falsafi S IMFLEX /P ROTOFLEX

June 22, Motivation Build synth x86 func model for prototyping  most widely-used ISA  Intel won’t give out theirs Problem: a very complicated ISA  many instructions  482 instructions total (**ADD has 14 variations)  many individually complicated instructions  PUSHAD – push all GP registers to stack  many under-specified instructions  LOADALL inst; BCD operation flag updates Also must be maintainable & extensible  return on investment

June 22, Overcoming Complexity 4 key ingredients in our approach  working SW simulator as design spec  simplified multi-cycle datapath  high-level HDL  HW-SW co-simulation validation & evaluation What we have today...  an x86 functional model in Bluespec  all real-mode general-purpose insts  includes I/O instructions!  boots FreeDOS OS in co-simulation testbench  synthesizes to 85% of a Virtex II Pro 70 FPGA  Max 10 MIPS (based on synthesis + simulation)

June 22, Outline Introduction Our Approach Status and Results Discussions and Future work

June 22, Functional View of an ISA ISA = architectural states + instructions instruction = set of alternate behaviors  e.g., due to different addressing modes  x86 has 482 insts but ~1000 behaviors behavior = sequence of actions that read & alter states functional model ACT beh_m ACT beh_2beh_1 Inst_1 Inst_2 Inst_n

June 22, SW x86 Sim as ISA Spec Simulator source code = precise and executable design spec We use Bochs (  open-source  code structure fits our high-level ISA view  i.e.,explicit architecture state declaration one instruction behavior  C++ function  (Essentially) complete x86 functionalities  simulate complete PC system  run various OSs (e.g., Linux, Win XP)  support 386 through Pentium Pro

June 22, FU Multi-cycle Implementation Fetch Start decoder FU arch, aux states x86 functional model FU Top-level view Mem accesses I/O operations Finish DecodeExecuteCommit Sequential, multi-cycle execution

June 22, Bluespec Design Capture Explicit state declaration  x86 architectural states  auxiliary simulation states used by Bochs Predicated atomic rules  one rule  one action in our ISA view Maintainability & extensibility  new behavior: add rules  changing behavior: add/modify rules Optimizations (low-level)  reduce logic: reuse + combine rules  reduce critical path delay: split rules

June 22, HW-SW co-simulation for Validation and Evaluation Virtually “plug-in” our model into a PC  execute Bochs to provide reference behavior  simulate RTL along side the simulated Bochs PC For validation and performance (CPI) eval == Bochs CPU MEMI/Os CPU RTL Bochs MEMI/Os CPU RTL Performance EvaluationValidation

June 22, Bochs src code Bluespec x86 Verilog x86 C++ x86 Manual coding Bluespec compilation C++ conversion (Verilator) Bochs simulation Workloads on Bochs Traces Co-simulation Co-Simulation Testbench Validation and performance evaluation results Automated

June 22, Outline Introduction Our Approach Status and Results Discussions and Future work

June 22, Implementation Progress Implemented ISA subset  all real-mode general purpose instructions  166 insts, 369 inst behaviors  compared to complete x86  482 insts, ~1000 inst behaviors Synthesis  convert Bluespec to synthesizable Verilog  Xilinx ISE 7.1, Virtex II Pro 70 (FPGA on BEE2)  results: 98 MHz, 28K Slices (85% util)

June 22, Co-simulation Results Validation  validated our model w/ FreeDOS bootup traces  tested first 140M dynamic instructions  exercised 183 inst behaviors Performance Evaluation  also with FreeDOS bootup traces

June 22, A Complete x86? To finish the x86 model  can be done, but takes effort  consumes a lot of FPGA resources Do we really need all of it?  a workload uses only a subset of the ISA  some insts used more often than others  parts of ISA is never or rarely used P ROTOFLEX migration  combine FPGA & simulation  model necessary subset in HW, the rest in SW

June 22, Future Work Short-term (Fall’06)  implement protected-mode support  validate/evaluate w/ more workloads  Linux, SPEC-CPU, commercial apps (DB2)  deployment on the BEE2 board Long-term  full-system prototype execution  architectural exploration Computer Architecture Lab at S IMFLEX /P ROTOFLEX