1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Project : Phase 1 Grading Default Statistics (40 points) Values and Charts (30 points) Analyses (10 points) Branch Predictor Statistics (30 points) Values.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Branch Prediction in SimpleScalar
SimpleScalar CS401. A Computer Architecture Simulator Primer What is an architectural simulator? – Tool that reproduces the behavior of a computing device.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Lab6 – Debug Assembly Language Lab
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Multiscalar processors
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
The Memory Behavior of Data Structures Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences The University.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Topic 1: Introduction to Computers and Programming
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Tutorial 0 SimpleScalar Installation CPEG-323 Intro. To Computer Engineering Tom St. John September 19, 2008.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Apr 14,2003CPE 631 Project Performance Analysis and Power Estimation of ARM Processor Team: Ajayshanker Krishnamurthy Swathi Tanjore Gurumani Zexin Pan.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) TA: Kyung Hoon Kim CSCE614 Texas A&M University.
Compiled from SimpleScalar Tutorial
SimpleScalar Tool Set, Version 2 CSE 323 Department of Computer Engineering.
Computer Systems Organization CS 1428 Foundations of Computer Science.
Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
AMD Opteron Overview Michael Trotter (mjt5v) Tim Kang (tjk2n) Jeff Barbieri (jjb3v)
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CSCE614 Hyunjun Jang Texas A&M University.
Power Profiling using Sim-Panalyzer Andria Dyess and Trey Brakefield CPE631 Spring 2005.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Pipelining and Parallelism Mark Staveley
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
1 A Superscalar Pipeline [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005 and Instruction Issue Logic, IEEETC, 39:3, Sohi,
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Introduction to SimpleScalar Tool Set CPEG323 Tutorial Long Chen September, 2005.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
CS203 – Advanced Computer Architecture Computer Architecture Simulators.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
PipeliningPipelining Computer Architecture (Fall 2006)
CS203 – Advanced Computer Architecture
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Microprocessor and Assembly Language
Introduction to SimpleScalar
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Flow Path Model of Superscalars
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Computer Architecture
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Computer Architecture
Presentation transcript:

1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University

Overview What is an architectural simulator? –a tool that reproduces the behavior of a computing device Why we use a simulator? –Leverage a faster, more flexible software development cycle Permit more design space exploration Facilitates validation before H/W becomes available Level of abstraction is tailored by design task Possible to increase/improve system instrumentation Usually less expensive than building a real system 2

3 A Taxonomy of Simulation Tools Shaded tools are included in SimpleScalar Tool Set

4 Functional vs. Performance Functional simulators implement the architecture. –Perform real execution –Implement what programmers see Performance simulators implement the microarchitecture. –Model system resources/internals –Concern about time –Do not implement what programmers see

5 Trace- vs. Execution-Driven Trace-Driven –Simulator reads a ‘trace’ of the instructions captured during a previous execution –Easy to implement, no functional components necessary Execution-Driven –Simulator runs the program (trace-on-the-fly) –Hard to implement –Advantages Faster than tracing No need to store traces Register and memory values usually are not in trace Support mis-speculation cost modeling

6 SimpleScalar Tool Set Computer architecture research test bed –Compilers, assembler, linker, libraries, and simulators –Targeted to the virtual SimpleScalar architecture –Hosted on most any Unix-like machine

7 Advantages of SimpleScalar Highly flexible –functional simulator + performance simulator Portable –Host: virtual target runs on most Unix-like systems –Target: simulators can support multiple ISAs Extensible –Source is included for compiler, libraries, simulators –Easy to write simulators Performance –Runs codes approaching ‘real’ sizes

8 Simulator Suite Sim-FastSim-SafeSim-Profile Sim-Cache Sim-BPred Sim-Outorder -300 lines -functional -4+ MIPS -350 lines -functional w/checks -900 lines -functional -Lot of stats -< 1000 lines -functional -Cache stats -Branch stats lines -performance -OoO issue -Branch pred. -Mis-spec. -ALUs -Cache -TLB KIPS Performance Detail

9 Sim-Fast Functional simulation Optimized for speed Assumes no cache Assumes no instruction checking Does not support Dlite! Does not allow command line arguments <300 lines of code

10 Sim-Cache Cache simulation Ideal for fast simulation of caches (if the effect of cache performance on execution time is not necessary) Accepts command line arguments for: –level 1 & 2 instruction and data caches –TLB configuration (data and instruction) –Flush and compress – and more Ideal for performing high-level cache studies that don’t take access time of the caches into account

11 Sim-Bpred Simulate different branch prediction mechanisms Generate prediction hit and miss rate reports Does not simulate the effect of branch prediction on total execution time nottaken taken perfect bimod bimodal predictor 2lev 2-level adaptive predictor comb combined predictor (bimodal and 2-level)

12 Sim-Profile Program Profiler Generates detailed profiles, by symbol and by address Keeps track of and reports Dynamic instruction counts –Instruction class counts –Branch class counts –Usage of address modes –Profiles of the text & data segment

13 Sim-Outorder Most complicated and detailed simulator Supports out-of-order issue and execution Provides reports –branch prediction –cache –external memory –various configuration

Fetch Dispatch Register Scheduler Exe WritebackCommit I-Cache Memory Scheduler Mem Virtual Memory D-CacheD-TLB I-TLB Sim-Outorder HW Architecture

15 Sim-Outorder (Main Loop) sim_main() in sim-outorder.c ruu_init(); for(;;){ ruu_commit(); ruu_writeback(); lsq_refresh(); ruu_issue(); ruu_dispatch(); ruu_fetch(); } Executed once for each simulated machine cycle Walks pipeline from Commit to Fetch –Reverse traversal handles inter-stage latch synchronization by only one pass

16 RUU/LSQ in Sim-Outorder RUU (Register Update Unit) –Handles register synchronization/communication –Serves as reorder buffer and reservation stations –Performs out-of-order issue when register and memory dependences are satisfied LSQ (Load/Store Queue) –Handles memory synchronization/communication –Contains all loads and stores in program order Relationship between RUU and LSQ –Memory dependencies are resolved by LSQ –Load/Store effective address calculated in RUU

Specifying Sim-outorder -bpred -bpred:bimod -bpred:2lev … -config -dumpconfig 17 -fetch:ifqsize -instruction fetch queue size (in insts) -fetch:mplat - extra branch miss-prediction latency (cycles) … For Assignment #1, change at least l1size. $ sim-outorder –config

Benchmark SPEC CPU 2000 –Integer/Floating Point – –For homework: Alpha binaries, input data files 18 CFP2000 CINT artdata ref test train input output Directory organization src … … 164.gzip …

SimPoint Goal –To find simulation points that accurately representatives the complete execution program based on phase analysis Single Simulation Points (Standard for homework) –If the Simulation Point is 90, then you start simulating at instruction 90 * 100 million (9 billion) and stop simulating at instruction 9.1 billion. Multiple Simulation Points 19

20 References SimpleScalar Tutorial/Hack Guide –Read tutorial/Run, test, and debug WWW Computer Architecture –