HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts.

Slides:



Advertisements
Similar presentations
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Instruction-Level Parallelism (ILP)
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.
Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel.
RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Introduction to Computer Organization Pipelining.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
The CAE Architecture: Decoupled Program Control for Energy-Efficient Performance Ronny Krashinsky and Michael Sung Change in project direction from original.
1 Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Use of Pipelining to Achieve CPI < 1
Translation Lookaside Buffer
Chapter Six.
CS 352H: Computer Systems Architecture
Dynamic Scheduling Why go out of style?
Computer Organization
Variable Word Width Computation for Low Power
From Address Translation to Demand Paging
PowerPC 604 Superscalar Microprocessor
Timing Model of a Superscalar O-o-O processor in HAsim Framework
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Single Clock Datapath With Control
Appendix C Pipeline implementation
Microprocessor Microarchitecture Dynamic Pipeline
CDA 3101 Spring 2016 Introduction to Computer Organization
Morgan Kaufmann Publishers The Processor
Pipelining review.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Hardware Multithreading
Pipelining in more detail
CSC 4250 Computer Architectures
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Pipelining Basic concept of assembly line
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Guest Lecturer TA: Shreyas Chand
Instruction Execution Cycle
Control Hazards Constructive Computer Architecture: Arvind
Overview Prof. Eric Rotenberg
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Control Hazards Branches (conditional, unconditional, call-return)
Control unit extension for data hazards
CSC3050 – Computer Architecture
Wackiness Algorithm A: Algorithm B:
Control unit extension for data hazards
Lecture 1 An Overview of High-Performance Computer Architecture
Interrupts and exceptions
Guest Lecturer: Justin Hsia
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
CMSC 611: Advanced Computer Architecture
Presentation transcript:

HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts Institute of Technology* Computer Science and Artificial Intelligence Lab Cambridge, MA Intel Corporation VSSAD† Hudson, MA

HASim - Why? Micro-architectural Simulations are Important Better estimates for expected outcomes SW Simulations are slow to run 100s of KIPs HW “simulations” take a long time to design

Underlying Beliefs Modeling something is generally easier than designing it Don't need to be totally faithful to design for what you need It's easy to make modeling mistakes Need to insert checks to assure you didn't cheat Appropriate partitioning improves reuses Split computational aspects from timing

HASim – What? HASim is a partitioned hardware simulation framework Two Partitions: Functional (FP) – Executes instructions Timing (TP) – Responsible for determining the timing of the emulated machine

HASim: The Picture Timing Partition Token Gen Dec Exe Mem LCom GCom Fet Functional Partition Memory Bypassing Unit RegFile

Functional Partition Zoom – In TP Request to do Instruction i Response to TP’s Request <Token> <Token, DependencyInfo> Info From Prev. Stage <Token, Inst> Token Table Information to Next Stage <Token, DecodedInst> Decoder Unit To MapTable (in BypassUnit)

Functional Partition - Execute <Token> <Token, Result Value> <Token, DecodedInst> Token Table <Token, ExecedInst> Execute To RegFile (in BypassUnit)

Automated Checks We'd like our model to: Obey Causality of data usage No reading values before they're created Meet expected times for different stages e.g. Decode of an instruction completes takes at least 1 cycle Decode should not take more than two cycles Want very these very simple checks Let's have the FP verify these!

Verifying Casuality All execution interactions to the functional model are provided Annotate all data with emulated clock it was created on FP checks time on accesses of data

Leveraging FP structure for Timing Sometimes the best way to model something is to just make it Use the target designs cache structure as the FP's cache structure Can just measure the number of target ticks May need to record some more information (where misses occurred) to get appropriate timing

Similar Ideas - FAST FAST – similar underlying beliefs Differences: SW vs. HW functional partitions decoupled partitions vs. tight coupling some additional correctness checks needed Not clear which approach is more effective

Similar Ideas - UNUM UNUM – Another parameterized HW framework Much more emphasis on HW quality and structure Much more work to generate “Believable” low-level values Aimed later in the design selection cycle

Current Progress Initial functional partition Singlescalar OOO design Simple RISC ISA Physical Reg File Fast branch rewinds Simple Pipeline Timing Partition

Future Progress Porting a real ISA to design x86 (w/ µops) More complicated timing models Reorder Buffer designs Large Cache simulations

{ndave, pellauer, arvind}@csail.mit.edu Thanks! {ndave, pellauer, arvind}@csail.mit.edu emer@intel.com