HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan Nikhil Patil Abhishek Bhattacharjee VSSAD, Intel.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Final Presentation Part-A
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Hasim Joel Emer †‡ Michael Adler †, Artur Klauser †, Angshuman Parashar †, Michael Pellauer ‡, Murali Vijayaraghavan ‡ † VSSAD Intel ‡ CSAIL MIT.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 153 Design of Operating Systems Spring 2015
Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer † Murali Vijayaraghavan †, Michael Adler ‡, Arvind †, Joel.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
RAMP Common Interface Krste Asanovic Derek Chiou Joel Emer.
RAMP/HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
1 On Controllers, Soft Connections, and Logical Topologies Michael Pellauer MIT CSAIL Angshuman Parashar, Michael Adler, Joel Emer Intel VSSAD.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Processes and Virtual Memory
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Translation Lookaside Buffer
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
ECE232: Hardware Organization and Design
Chapter 1: A Tour of Computer Systems
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Bluespec-6: Modeling Processors
Timing Model of a Superscalar O-o-O processor in HAsim Framework
CSE 120 Principles of Operating
/ Computer Architecture and Design
HASim Implementing a Functional/Timing Partitioned Microprocessor Simulator with an FPGA Nirav Dave*, Michael Pellauer*, Joel Emer†*, & Arvind* Massachusetts.
Caches-2 Constructive Computer Architecture Arvind
Pipelining: Advanced ILP
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Page Replacement.
Translation Lookaside Buffer
Modular Refinement - 2 Arvind
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Modeling Processors Arvind
Lecture 8: Efficient Address Translation
System Calls System calls are the user API to the OS
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali Vijayaraghavan Nikhil Patil Abhishek Bhattacharjee VSSAD, Intel CSG Group, CSAIL MIT UT Austin Princeton University

2 Recap: Virtual Platform Set of Abstractions –Provide common set of functionalities across multiple physical platforms XUP Board PCI-express Board Intel FSB Socket Bluesim/Vsim BEE3 –Leverage Asim Plug N Play Minimize module replacements/recoding while moving across platforms

3 Virtual Platform Infrastructure Communication Layers RRR Layers FPGA Modules Virtual Platform Platform Interface Communication Layers RRR Layers Hardware Software Software Modules MemoryFront Panel ExeDecodeFetch Func Model ControlDecode Front Panel Memory

4 RRR Specification Language // // create a new service called ISA_EMULATOR // service ISA_EMULATOR { // // declare services provided by CPU // server CPU <- FPGA; { method UpdateRegister(in REG_INDEX i, in REG_VALUE v); method Emulate(in INST_INFO i, out INST_ADDR a); }; // // declare services provided by FPGA // server FPGA <- CPU; { method SyncRegister(in REG_INDEX i, in REG_VALUE v); };

5 FPGA CPU Remote Request/Response ClientStub_ISA_EMULATOR cpu;... cpu.UpdateRegister_MakeRequest( REG_R27, regFile[REG_R27]);... cpu.Emulate_MakeRequest(inst);... targetPC <- cpu.Emulate_GetResponse(); ISA_EMULATOR::UpdateRegister( REG_INDEX i, REG_VALUE v) { regFile[i] = v; } ISA_EMULATOR::Emulate( INST_INFO inst) { // emulate the instruction return target_PC; } Client StubServer Stub Communication Layers (Runtime System) User Code RRR specification files

6 Virtual Platform/RRR Status Update Software + Hardware, Client + Server Stubs Multiple Arguments for method calls Auto-generation of Soft Connections through Platform Interface, and Remote Stubs PCI-Express Physical Platform –Physical Channel implementation using CSRs –Soft Reset Several services in HAsim –Very positive feedback from developers

7 HAsim: MIPS  Alpha Motivation –Couldn’t find any Full System MIPS simulator with multi- processor + large memory support HAsim-Alpha –M5 “running” in software Target Memory Image Syscall Emulation Other instructions not implemented on FPGA (e.g. FP currently) –Functional + Timing model on FPGA

8 HAsim-Alpha Highlights Implemented Alpha Functional Model –Primary changes ISA spec –Instruction Format + Queries Datapath –Execution Semantics –Unchanged Dependency logic Register File Memory Subsystem (incl. Store Buffer) Multiple timing models –Unpipelined –5 Stage –In order with caches –OoO Running long Alpha programs (e.g. SPEC2k)

9 Old Instruction Emulation with Cache Flush FPGA Software Time Execute Functional Cache Memory Server Execute Emulation Server Sync Registers Instruction Simulator Write Line Sync Registers RRR Layer Emulate Instruction Emulation Done Execute Flush Done … … …

10 Write Line Write Back or Invalidate Hybrid Instruction Emulation FPGA Software Time Execute Emulation Server Instruction Simulator Memory Server Functional Cache Execute Emulation Server Sync Registers Instruction Simulator Done Sync Registers RRR Layer Emulate Instruction Emulation Done … … Ack

11 RRR ISA Emulation Specification service ISA_EMULATOR { server sw (cpp, method) <- hw (bsv, connection) { method sync(in RNAME[RNAME_BITS] rname, in RVAL[RVAL_BITS] rval); method emulate(in INST[INST_BITS] inst, in ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] pc, out ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] newPc); }; server hw (bsv, connection) <- sw (cpp, method) { method sync(in RNAME[RNAME_BITS] rname, in RVAL[RVAL_BITS] rval); };

12 Dynamic Simulator Configuration FPGA Software Time Dynamic Param Controller Set Parameters Param Node Done RRR Layer Param Node Dynamic Param Controller Set Value Done? Enable Functional Cache?

13 RRR Dynamic Parameter Specification service PARAMS { // // Send one dynamic parameter ID and value to the hardware. // An ACK is returned to guarantee that the parameter has // been received. // server hw (bsv, connection) <- sw (cpp, method) { method sendParam(in UINT32[32] pname, in UINT64[64] pval, out UINT8[8] ack); };

14 Other Uses of RRR Stats Events Assertions Control Messages Streams

15 ProducerConsumer Data A-Port Credits A-Port ProducerConsumer No buffering present within the Ports Producer Interface: Bool canSend() Do we have enough credits? Action enq(Maybe#(t) x) Send data or invalid. Action pass() Indicate end of cycle Consumer Interface: Bool canReceive() Is data available? AV#(Data) pop() Receive data Action done (cred) Indicate end of cycle, and send back credits if (canSend) enq(x) else pass() if (canReceive) x <- pop() done(x) Modeling Back-Pressure using A-Ports A-Port Credit Port

16 Structures using Credit Ports Model FIFOs using Credit Ports Data (A1) Credits (A1) ProducerConsumer “Stall ports”: A stall down the pipeline doesn’t get combinationally propagated Data (A1) Credits (A0) ProducerConsumer “Pipeline ports”: The pipeline registers in traditional pipelines

17 Caches Functional Partition –Functional Cache Target memory image data from M5 –Functional TLB Target V  P translations Timing Partition –I and D Cache models –Attempting to unify interface for all caches

18 Request MEMORY stage L1 Cache MAIN MEMORY Cache Req Interface: LOAD STORE PREFETCH INVALIDATE LINE INVALIDATE ALL KILL ALL FLUSH LINE FLUSH ALL Cache Response: Immediate Response: HIT MISS SERVICING MISS RETRY Delayed Response: MISS RESPONSE Timing Partition Cache Interface Immediate Response Delayed Response

19 Ongoing/Future Work Virtual Platform Infrastructure –More Sophisticated Type System –Virtual Memory for FPGA Share page tables with software application Cache V  P translations in a TLB –FPGA requests user software for translations –Software kernel must shootdown FPGA TLB when mapping changes –Note: distinct from HAsim Functional TLB Functional Model –Multiple Contexts –Ultimate goal: Run a full system Timing Model –Multiple Contexts –Realistic Microarchitecture

Backup

21 “Connection”-style Stubs interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( REG_INFO reg_info); endinterface typedef struct {...} REG_INFO deriving (Bits, Eq); Connection_Send#(REG_INFO) link <- mkConnection_Send( “ISA_EMULATOR_UpdateRegister”); link.send(reg_info); RRR Stack Connection_Receive#(REG_INFO) link <- mkConnection_Receive( “ISA_EMULATOR_UpdateRegister”); ClientStub_ISA_EMULATOR <- mkClient... let a = link.receive(); stub.makeRequest_UpdateRegister(a); Connections: Per-method or Per-service? Platform Interface How does Platform Interface get the RRR types? Stub User Code auto-generated hand-written Soft connections

22 interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( Bit#(70) reg_info); endinterface typedef struct {...} REG_INFO deriving (Bits, Eq); `include “remote_client_stub_ISA_EMULATOR.bsh” ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM... stub.makeRequest_UpdateRegister(reg_info); RRR Stack Connection_Receive#(Bit#(70)) link <- mkConnection_Receive(“ISA_EMULATOR_UpdateRegister”); ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM... let a = link.receive(); stub.makeRequest_UpdateRegister(a); Platform Interface Connection_Receive#(Bit#(70)) link <- mkConnection_Send(“ISA_EMULATOR_UpdateRegister”); method Action makeRequest_UpdateRegister( REG_INFO reg_info); link.send(pack(reg_info)); endmethod User Code Stub Remote Stub auto-generated hand-written Soft connections

23 Hello, World! hello.bsv module mkSystem#(LowLevelPlatformInterface llpi)(); Streams streams <- mkStreams(llpi); Reg#(Bool) done <- mkReg(False); rule hello (!done); streams.makeRequest(`STREAMS_MESSAGE_HELLO); done <= True; endrule endmodule hello.dict def STREAMS.MESSAGE.HELLO "Hello, World!\n";

24 RRR Memory Interface Specification service FUNCP_MEMORY { server sw (cpp, method) <- hw (bsv, connection) { method Load (in MEM_ADDRESS_RRR[64] addr, out MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] data); method LoadCacheLine (in MEM_ADDRESS_RRR[64] addr, out MEM_CACHELINE[FUNCP_CACHELINE_BITS] data); method Store(in MEM_STORE_INFO_RRR[MEMORY_STORE_INFO_SIZE] info); method StoreCacheLine(in MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE] info); // Store cache line with ACK method StoreCacheLine_Sync(in MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE] info, out UINT32[32] ack); method VtoP(in MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] va, out MEM_ADDRESS_RRR[64] pa); }; server hw (bsv, connection) <- sw (cpp, method) { method Invalidate(in MEM_INVAL_CACHELINE_INFO_RRR[96] info, out UINT32[32] ack); method InvalidateAll(in UINT32[32] req, out UINT32[32] ack); };

25 Request MEMORY stage L1 Cache MAIN MEMORY Cache Req Interface: LOAD STORE PREFETCH INVALIDATE LINE INVALIDATE ALL KILL ALL FLUSH LINE FLUSH ALL Cache Response: Immediate Response: HIT HIT SERVICING MISS SERVICING MISS RETRY Delayed Response: MISS RESPONSE HIT RESPONSE Timing Partition Cache Interface Immediate Response Delayed Response

26 ProducerConsumer Data Credits Data A-Port Credits A-Port ProducerConsumer No buffering present in the Ports Producer Interface: Bool canSend() Do we have enough credits? Action enq(Maybe#(t) x) Send data or invalid. Action pass() Indicate end of cycle Consumer Interface: Bool canReceive() Is data available? AV#(Data) pop() Receive data Action done (cred) Indicate end of cycle, and send back credits if (canSend) enq(x) else pass() if (canReceive) x <- pop() done(x) Credit Ports

27 Producer Data Credits Consumer Completion Buffer Structures using Credit Ports Since buffering is not modeled in credit ports using FIFOs, any sort of buffer can sit on the consumer side Reduced the code size of timing models drastically