Introduction to SimpleScalar (Based on SimpleScalar Tutorial)

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Project : Phase 1 Grading Default Statistics (40 points) Values and Charts (30 points) Analyses (10 points) Branch Predictor Statistics (30 points) Values.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Branch Prediction in SimpleScalar
SimpleScalar CS401. A Computer Architecture Simulator Primer What is an architectural simulator? – Tool that reproduces the behavior of a computing device.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 12 Pipelining Strategies Performance Hazards.
Goal: Reduce the Penalty of Control Hazards
Multiscalar processors
Chapter 12 CPU Structure and Function. Example Register Organizations.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) TA: Kyung Hoon Kim CSCE614 Texas A&M University.
Compiled from SimpleScalar Tutorial
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Computer Systems Organization CS 1428 Foundations of Computer Science.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CSCE614 Hyunjun Jang Texas A&M University.
Power Profiling using Sim-Panalyzer Andria Dyess and Trey Brakefield CPE631 Spring 2005.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Pipelining and Parallelism Mark Staveley
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
1 A Superscalar Pipeline [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 and Instruction Issue Logic, IEEETC, 39:3, Sohi,
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
CS203 – Advanced Computer Architecture Computer Architecture Simulators.
PipeliningPipelining Computer Architecture (Fall 2006)
Introduction to Operating Systems Concepts
Chapter Overview General Concepts IA-32 Processor Architecture
CS203 – Advanced Computer Architecture
CS 352H: Computer Systems Architecture
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Instruction Level Parallelism
/ Computer Architecture and Design
Microprocessor and Assembly Language
nZDC: A compiler technique for near-Zero silent Data Corruption
Assembly Language for Intel-Based Computers, 5th Edition
Introduction to SimpleScalar
Pipeline Implementation (4.6)
Flow Path Model of Superscalars
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Agenda Why simulation Simulation and model Instruction Set model
CSCI1600: Embedded and Real Time Software
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Topic 5: Processor Architecture Implementation Methodology
Control unit extension for data hazards
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Topic 5: Processor Architecture
Instruction Execution Cycle
Overview Prof. Eric Rotenberg
Control unit extension for data hazards
Computer Architecture
Control unit extension for data hazards
COMPUTER ORGANIZATION AND ARCHITECTURE
CSCI1600: Embedded and Real Time Software
Project Guidelines Prof. Eric Rotenberg.
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University

Overview What is an architectural simulator? Why we use a simulator? a tool that reproduces the behavior of a computing device Why we use a simulator? Leverage a faster, more flexible software development cycle Permit more design space exploration Facilitates validation before H/W becomes available Level of abstraction is tailored by design task Possible to increase/improve system instrumentation Usually less expensive than building a real system

A Taxonomy of Simulation Tools Before I introduce the detail of simplescalar, I first give you some general knowledge about simulators. This graph here shows a classification of simulators. Shaded tools are included in SimpleScalar Tool Set

Functional vs. Performance Functional simulators implement the architecture. Perform real execution Implement what programmers see Performance simulators implement the microarchitecture. Model system resources/internals Concern about time Do not implement what programmers see I mentioned in previous slide that simplescalar is highly flexible since it provide both functional and performance simulators. Functional simulators: Ex, for a branch predictor, you care more about the prediction accuracy than the actual timing for example, memory and registers are visible resources to a programmer using assembly language Performance simulators: programmers cannot see how an instruction is transmitted. However, the transmitting process is important for performance evaluation

Trace- vs. Execution-Driven Trace-Driven Simulator reads a ‘trace’ of the instructions captured during a previous execution Easy to implement, no functional components necessary Execution-Driven Simulator runs the program (trace-on-the-fly) Hard to implement Advantages Faster than tracing No need to store traces Register and memory values usually are not in trace Support mis-speculation cost modeling One thing I want to point out is that a simulator can both be an execution driven and a performance simulator.

SimpleScalar Tool Set Computer architecture research test bed Compilers, assembler, linker, libraries, and simulators Targeted to the virtual SimpleScalar architecture Hosted on most any Unix-like machine Alpha AXP: Anomalous X-ray Pulsar, a MIPS (Microprocessor without interlocked pipeline stages ) ISA

Advantages of SimpleScalar Highly flexible functional simulator + performance simulator Portable Host: virtual target runs on most Unix-like systems Target: simulators can support multiple ISAs Extensible Source is included for compiler, libraries, simulators Easy to write simulators Performance Runs codes approaching ‘real’ sizes

Simulator Suite Performance Detail Sim-Fast Sim-Safe Sim-Profile Sim-Cache Sim-BPred Sim-Outorder 300 lines functional 4+ MIPS 350 lines functional w/checks 900 lines functional Lot of stats < 1000 lines functional Cache stats Branch stats 3900 lines performance OoO issue Branch pred. Mis-spec. ALUs Cache TLB 200+ KIPS Performance Detail

Sim-Fast Functional simulation Optimized for speed Assumes no cache Assumes no instruction checking Does not support Dlite! Does not allow command line arguments <300 lines of code

Sim-Cache Cache simulation Ideal for fast simulation of caches (if the effect of cache performance on execution time is not necessary) Accepts command line arguments for: level 1 & 2 instruction and data caches TLB configuration (data and instruction) Flush and compress and more Ideal for performing high-level cache studies that don’t take access time of the caches into account

Sim-Bpred Simulate different branch prediction mechanisms Generate prediction hit and miss rate reports Does not simulate the effect of branch prediction on total execution time nottaken taken perfect bimod bimodal predictor 2lev 2-level adaptive predictor comb combined predictor (bimodal and 2-level)

Sim-Profile Program Profiler Generates detailed profiles, by symbol and by address Keeps track of and reports Dynamic instruction counts Instruction class counts Branch class counts Usage of address modes Profiles of the text & data segment

Sim-Outorder Most complicated and detailed simulator Supports out-of-order issue and execution Provides reports branch prediction cache external memory various configuration

Sim-Outorder HW Architecture Fetch Dispatch Register Scheduler Exe Writeback Commit Memory Scheduler Mem I-Cache I-TLB D-Cache D-TLB Virtual Memory 2018-08-28

Sim-Outorder (Main Loop) sim_main() in sim-outorder.c ruu_init(); for(;;){ ruu_commit(); ruu_writeback(); lsq_refresh(); ruu_issue(); ruu_dispatch(); ruu_fetch(); } Executed once for each simulated machine cycle Walks pipeline from Commit to Fetch Reverse traversal handles inter-stage latch synchronization by only one pass

RUU/LSQ in Sim-Outorder RUU (Register Update Unit) Handles register synchronization/communication Serves as reorder buffer and reservation stations Performs out-of-order issue when register and memory dependences are satisfied LSQ (Load/Store Queue) Handles memory synchronization/communication Contains all loads and stores in program order Relationship between RUU and LSQ Memory dependencies are resolved by LSQ Load/Store effective address calculated in RUU

Specifying Sim-outorder -fetch:ifqsize <size> -instruction fetch queue size (in insts) -fetch:mplat <cycles> - extra branch miss-prediction latency (cycles) … -bpred <type> -bpred:bimod <size> -bpred:2lev <l1size> <l2size> <hist_size> … -config <file> -dumpconfig <file> For Assignment #1, change at least l1size. $ sim-outorder –config <file> <benchmark command line>

Benchmark SPEC CPU 2000 Integer/Floating Point http://www.spec.org For homework: Alpha binaries, input data files input ref 179.art data output … src test CFP2000 164.gzip … train CINT2000 … Directory organization

SimPoint Goal Single Simulation Points (Standard for homework) To find simulation points that accurately representatives the complete execution program based on phase analysis Single Simulation Points (Standard for homework) If the Simulation Point is 90, then you start simulating at instruction 90 * 100 million (9 billion) and stop simulating at instruction 9.1 billion. Multiple Simulation Points

References SimpleScalar Tutorial/Hack Guide WWW Computer Architecture Read tutorial/Run, test, and debug WWW Computer Architecture http://www.cs.wisc.edu/arch/www