Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
ARM Cortex-A9 MPCore ™ processor Presented by- Chris Cai (xiaocai2) Rehana Tabassum (tabassu2) Sam Mussmann (mussmnn2)
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Shobana Padmanabhan Phillip Jones, David Schuehler, Praveen Krishnamurthy, Scott Friedman, Huakai Zhang, Ron Cytron, John Lockwood, Roger Chamberlain,
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan Nikhil Patil Abhishek Bhattacharjee VSSAD, Intel.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
6/15/06Derek Chiou, UT Austin, RAMP1 Confessions of a RAMP Heretic: Fast, Full-System, Cycle-Accurate x86/PowerPC/ARM/Sparc Simulators Derek Chiou University.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
RAMP/HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan
ARM HARDWARE DEBUGGER Shane Mahon, Lyndsi Parker, and Drew Shafer.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor Sankaralingam et al. Presented by Cynthia Sturton CS 258 3/3/08.
Flexicache: Software-based Instruction Caching for Embedded Processors Jason E Miller and Anant Agarwal Raw Group - MIT CSAIL.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
CSE430/830 Course Project Tutorial Instructor: Dr. Hong Jiang TA: Dongyuan Zhan Project Duration: 01/26/11 – 04/29/11.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
1 AWB: Architect’s WorkBench Joel Emer. 2 A CPU model, A single model or project of anything A communications or interface protocol Or a modeling library.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Performed By: Yahel Ben-Avraham and Yaron Rimmer Instructor: Mony Orbach Bi-semesterial, /3/2013.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Out-of-Order OpenRISC 2 semesters project Semester B: OR1200 ISA Extension Final B Presentation By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach.
Advanced Topics: Prefetching ECE 454 Computer Systems Programming Topics: UG Machine Architecture Memory Hierarchy of Multi-Core Architecture Software.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
IMPLEMENTING RISC MULTI CORE PROCESSOR USING HLS LANGUAGE - BLUESPEC LIAM WIGDOR INSTRUCTOR MONY ORBACH SHIREL JOSEF Winter 2013 One Semester Mid-term.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Topics to be covered Instruction Execution Characteristics
Timing Model of a Superscalar O-o-O processor in HAsim Framework
HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts.
A Review of Processor Design Flow
Exploring Value Prediction with the EVES predictor
6.375 Tutorial 5 RISC-V 6-Stage with Caches Ming Liu
Today’s agenda Hardware architecture and runtime system
High Performance Asynchronous Circuit Design and Application
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT

Hasim2 Overview Goal –Produce compelling evidence for architecture ideas Requirements –Cycle accurate simulation –Representative simulation length –Software development (often) Current approach –Mostly software simulation (10 KHz to 1 KHz) New approach –Build a performance model in an FPGA

Hasim3 FPGA-based approaches Prototyping –Build a logically isomorphic representation of the design Modeling –Build a performance simulation in gates Hybrids –Build something that is partially a prototype and partially a model

Hasim4 Recreate Asim in hardware Modularity Inter-module communication Functional/Timing Partitioning Modeling Utilities

Hasim5 Why modularity? Speed of model development Shared components between products Reuse across generations Encourages isomorphism to design Improved fidelity Facilitates speed/fidelity trade-offs Architectural experimentation Factorial development and evaluations Sharing

Hasim6 ASIM Module Hierarchy S MCNDRXCWFB

Hasim7 ASIM Module Selection B B B B S MCN DRXCWF B B

Hasim8 DRXCWF DRXCWF S MCN CMN Module Selection S B B B B B B

Hasim9 Module Replacement B B B B S MCN DRXCWF B X

Hasim10 (H)ASIM Module Hierarchy

Hasim11 Communication C DRXCWF NN

Hasim12 Named connections SD A-outA-in

Hasim13 Model and FPGA Cycles Module A Module B Port A B A B Port

Hasim14 Functional/Timing Decomposition ISA semantics Platform semantics Micro-architecture Timing Partition Functional Partition Fetch(PC) … Instruction Simplifies timing model Amortize functional model design effort over many models Can be pipelined for performance Can be FPGA-friendly design Can be split across hardware and software

Hasim15 phases Fetch instruction Speculatively execute instruction Read memory * Speculatively write memory * (locally visible) Commit or Abort instruction Write memory * (globally visible) * Optional depending on instruction type

Hasim16 Execution in phases FDXRCFDXWCWFDXC Assertion: All data dependencies can be represented in these phases FDXRA FDXXCW

Hasim17 HASim: Partitioning Overview Token Gen Dec ExeMemLCom GComFet Timing Partition Memory State Register State RegFile Functional Partition

Hasim18 Common Infrastructure Modules Inter-module communication Statistics gathering Event logging Debug Tracing Simulation control …

Hasim19 Bluespec (Asim-style) module module [HAsim_module] mkCache#() (Empty); Port#(Addr) req_port <- mkSendPort(‘a2cache’); Port#(Bool) resp_port <- mkRecvPort(‘cache2a’); TagArray tagarray <- mkTagArray(); rule cycle(True); Maybe#(Addr) mx = req_port.get(); if (isValid(mx)) resp_port.put(tagarray.lookup(validValue(mx))); endrule endmodule

Hasim20 Bluespec (Asim-style) submodule module mkTagArray(TagArray); RegFile#(Bit#(12),Bit#(4)) tagArray<- mkRegFileFull(...); method Bool lookup(Bit#(16) a); return (tagArray.sub(getIndex(a)) == getTag(a)); endmethod function Bit#(4) getTag(Address x); return x[15:12]; endfunction function Bit#(12) getIndex(Address x); return x[11:0]; endfunction endmodule

Hasim21 Support functions - stats Module Stat Counter Module Stat Counter Module Stat Counter Stat Dumper module mkCache#(...) (Empty);... cache_hits <- mkStat(...);... hit=tagarray.lookup(...); if (hit) cache_hits.increment(); endif... endmodule

Hasim22 2Dreams

Hasim23 Support functions - events Module Event Reg Module Event Reg Module Event Reg Event Dumper module mkCache#(...) (Empty);... cache_event <- mkEvent(...);... hit=tagarray.lookup(...); cache_event.report(hit);... endmodule

Hasim24 Support functions – global controller Module Controller Module Controller Module Controller Global Controller module mkCache#(...) (Empty);... ctrl <- mkCntrlr(...);... rule (ctrl.run())... endrule endmodule

Hasim26 FPGA-based prototype Prototyping Catch-22…

Hasim27 Module Instantiation U DRXCWF MCN C DRXCWF M C DRXCWF

Hasim28 Factorial Coding/Experiments SC S MCN SM RC S MCN SM SC S MCN RM RC S MCN RM

Hasim29 HAsim: Current status - models Simple RISC functional model operating – Simple RISC ISA – Pipelined multi-phase instruction execution – Supports speculative OOO design Physical Reg File and ROB Small physically addressed memory Fast speculative rewinds Instruction-per-cycle (APE) model –Runs simple benchmarks on FPGA Five stage pipeline –Supports branch mis-speculation –Runs simple benchmarks (in software simulation) X86 functional model architecture under development

Hasim30 Connections Implement Ports foo bar foo baz PM (Module Tree w. Connections) PM (Hardware Modules w. Wrappers) bar foo baz Implemented via connections.

Hasim31 Timing Model Resources (Fast) OOO, branch prediction, three functional units, 32KB 2-way set associative ICache and DCache, iTLB, dTLB 2142 slices (15% of a 2VP30) 21 block RAMs (15% of a 2VP30) Configurable cache model 32KB 4-way set associative cache with 16B cache-lines –165 slices (1% of a 2VP30) –17 block RAMs (12% of a 2VP30) 2MB 4-way set-associative cache with 64B cache-lines –140 slices (1% of a 2VP30) –40 block RAMs (29% of a 2VP30) Current FPGAs (4VFX140) 142,128 slices 552 block RAMs 2 PowerPCs