Introduction to SimpleScalar (Based on SimpleScalar Tutorial)

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Project : Phase 1 Grading Default Statistics (40 points) Values and Charts (30 points) Analyses (10 points) Branch Predictor Statistics (30 points) Values.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Branch Prediction in SimpleScalar
SimpleScalar CS401. A Computer Architecture Simulator Primer What is an architectural simulator? – Tool that reproduces the behavior of a computing device.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Goal: Reduce the Penalty of Control Hazards
Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Topic 1: Introduction to Computers and Programming
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
COM181 Computer Hardware Ian McCrumRoom 5B18,
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) TA: Kyung Hoon Kim CSCE614 Texas A&M University.
Compiled from SimpleScalar Tutorial
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Processes Introduction to Operating Systems: Module 3.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CSCE614 Hyunjun Jang Texas A&M University.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
1 TM 1 Embedded Systems Lab./Honam University ARM Microprocessor Programming Model.
CS203 – Advanced Computer Architecture Computer Architecture Simulators.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Computer Organization and Architecture Lecture 1 : Introduction
CS203 – Advanced Computer Architecture
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Dynamic Branch Prediction
Microprocessor and Assembly Language
Introduction to SimpleScalar
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Samira Khan University of Virginia Nov 13, 2017
ECE 353 Lab 3 Pipeline Simulator
CSCE 212 Chapter 4: Assessing and Understanding Performance
Flow Path Model of Superscalars
Agenda Why simulation Simulation and model Instruction Set model
CSCI1600: Embedded and Real Time Software
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Control unit extension for data hazards
Lecture 10: Branch Prediction and Instruction Delivery
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Guest Lecturer TA: Shreyas Chand
Instruction Execution Cycle
Chapter 2: Operating-System Structures
Overview Prof. Eric Rotenberg
Control unit extension for data hazards
Computer Architecture
Control unit extension for data hazards
rePLay: A Hardware Framework for Dynamic Optimization
Chapter 2: Operating-System Structures
CSCI1600: Embedded and Real Time Software
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University 1 1

Overview What is an architectural simulator? Why we use a simulator? a tool that reproduces the behavior of a computing device Why we use a simulator? Leverage a faster, more flexible software development cycle Permit more design space exploration Facilitates validation before H/W becomes available Level of abstraction is tailored by design task Possible to increase/improve system instrumentation Usually less expensive than building a real system 2 2

A Taxonomy of Simulation Tools Before I introduce the detail of simplescalar, I first give you some general knowledge about simulators. This graph here shows a classification of simulators. Shaded tools are included in SimpleScalar Tool Set 3 3

Functional vs. Performance Functional simulators implement the architecture. Perform real execution Implement what programmers see Performance simulators implement the microarchitecture. Model system resources/internals Concern about time Do not implement what programmers see I mentioned in previous slide that simplescalar is highly flexible since it provide both functional and performance simulators. Functional simulators: Ex, for a branch predictor, you care more about the prediction accuracy than the actual timing for example, memory and registers are visible resources to a programmer using assembly language Performance simulators: programmers cannot see how an instruction is transmitted. However, the transmitting process is important for performance evaluation 4 4

Trace- vs. Execution-Driven Trace-Driven Simulator reads a ‘trace’ of the instructions captured during a previous execution Easy to implement, no functional components necessary Execution-Driven Simulator runs the program (trace-on-the-fly) Hard to implement Advantages Faster than tracing No need to store traces Register and memory values usually are not in trace Support mis-speculation cost modeling One thing I want to point out is that a simulator can both be an execution driven and a performance simulator. 5 5

SimpleScalar Tool Set Computer architecture research test bed Compilers, assembler, linker, libraries, and simulators Targeted to the virtual SimpleScalar architecture Hosted on most any Unix-like machine Alpha AXP: Anomalous X-ray Pulsar, a MIPS (Microprocessor without interlocked pipeline stages ) ISA 6 6

Advantages of SimpleScalar Highly flexible functional simulator + performance simulator Portable Host: virtual target runs on most Unix-like systems Target: simulators can support multiple ISAs Extensible Source is included for compiler, libraries, simulators Easy to write simulators Performance Runs codes approaching ‘real’ sizes 7 7

Simulator Suite Performance Detail Sim-Fast Sim-Safe Sim-Profile Sim-Cache Sim-BPred Sim-Outorder 300 lines functional 4+ MIPS 350 lines functional w/checks 900 lines functional Lot of stats < 1000 lines functional Cache stats Branch stats 3900 lines performance OoO issue Branch pred. Mis-spec. ALUs Cache TLB 200+ KIPS Performance Detail 8 8

Sim-Fast Functional simulation Optimized for speed Assumes no cache Assumes no instruction checking Does not support Dlite! Does not allow command line arguments <300 lines of code 9 9

Sim-Cache Cache simulation Ideal for fast simulation of caches (if the effect of cache performance on execution time is not necessary) Accepts command line arguments for: level 1 & 2 instruction and data caches TLB configuration (data and instruction) Flush and compress and more Ideal for performing high-level cache studies that don’t take access time of the caches into account 10 10

Sim-Bpred Simulate different branch prediction mechanisms Generate prediction hit and miss rate reports Does not simulate the effect of branch prediction on total execution time nottaken taken perfect bimod bimodal predictor 2lev 2-level adaptive predictor comb combined predictor (bimodal and 2-level) 11 11

Sim-Outorder Most complicated and detailed simulator Supports out-of-order issue and execution Provides reports branch prediction cache external memory various configuration 12 12

Sim-Outorder HW Architecture Fetch Dispatch Register Scheduler Exe Writeback Commit Memory Scheduler Mem I-Cache I-TLB D-Cache D-TLB Virtual Memory 09/18/13 13 13

Sim-Outorder (Main Loop) sim_main() in sim-outorder.c ruu_init(); for(;;){ ruu_commit(); ruu_writeback(); lsq_refresh(); ruu_issue(); ruu_dispatch(); ruu_fetch(); } Executed once for each simulated machine cycle Walks pipeline from Commit to Fetch Reverse traversal handles inter-stage latch synchronization by only one pass 14 14

Specifying Sim-outorder -fetch:ifqsize <size> -instruction fetch queue size (in insts) -fetch:mplat <cycles> - extra branch miss-prediction latency (cycles) … -bpred <type> -bpred:bimod <size> -bpred:2lev <l1size> <l2size> <hist_size> … -config <file> -dumpconfig <file> $ sim-outorder –config <file> <benchmark command line> 15 15

Benchmark SPEC CPU 2000 Suite Consists of 26 benchmarks Two groups CINT: 12 benchmarks CFP: 14 benchmarks http://www.spec.org/cpu2000 Now Retired: CPU2006 For homework: Alpha binaries, input data files 16 16

References SimpleScalar Tutorial/Hack Guide WWW Computer Architecture Read tutorial/Run, test, and debug WWW Computer Architecture http://www.cs.wisc.edu/arch/www 17 17

Column Associative Caches Biggest drawback of using direct-mapped caches is the large number of conflict misses. Idea is to resolve conflicts by dynamically choosing different locations, which are accessed by different hashing functions. Simplest solution of rehashing function is bit selection with the highest-order bit inverted, called bit flipping.

Example

Problem with Bit-Flipping scheme

Decision tree

Secondary Thrashing

Rehash-bit