Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin—Madison

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
A Dynamic Binary Translation Approach to Architectural Simulation Harold “Trey” Cain, Kevin Lepak, and Mikko Lipasti Computer Sciences Department Department.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors Ayose Falcón Alex Ramirez Mateo Valero HPCA-10 February 18, 2004.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Transformer: A Functional-Driven Cycle-Accurate Multicore Simulator 1 黃 翔 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin—Madison
An Architectural Evaluation of Java TPC-W Harold “Trey” Cain, Ravi Rajwar, Morris Marden, Mikko Lipasti University of Wisconsin-Madison
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Prophet/Critic Hybrid Branch Prediction B B B
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin
CS 352H: Computer Systems Architecture
Computer Organization CS224
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Agenda Why simulation Simulation and model Instruction Set model
The processor: Pipelining and Branching
Hardware Multithreading
Combining Simulators and FPGAs “An Out-of-Body Experience”
Comparison of Two Processors
Advanced Computer Architecture
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Virtual Memory Overcoming main memory size limitation
Avoiding Initialization Misses to the Heap
Introduction to Virtual Machines
Patrick Akl and Andreas Moshovos AENAO Research Group
Introduction to Virtual Machines
The University of Adelaide, School of Computer Science
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin—Madison

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Architecture Research? Genius is one percent inspiration and ninety-nine percent perspiration. --Thomas Edison This is not a talk about inspiration No new ideas or gimmicks This is a talk about perspiration Mostly graduate student perspiration Infrastructure, tools, methodology [CAECW Talk/Paper, February 2002]

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Performance Modeling Analytical models Queuing models Simulation Trace-driven Execution-driven Full system Why? Most widely used in academic research Perceived accuracy and precision

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Performance Modeling Performance Model Inputs Program Characteristics -instruction mix, miss rates Execution Trace -instruction, address, both Program binary + input sets Operating system, program(s), etc. Performance results Execution characteristics Bottlenecks Etc. Accuracy? Precision? garbage in, garbage out

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Talk Outline Introduction & Motivation Performance Modeling Precision, Accuracy, Flexibility PharmSim Overview Causes of Inaccuracy O/S Effects Coherent I/O (DMA) Wrong-path Effects Summary & Conclusions

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Precision, Accuracy, Flexibility Precision How closely simulator matches design Latency, bandwidth, resource occupancy, etc. Accuracy How closely simulation matches reality Requires precision Also requires replication of real-world conditions, inputs Flexibility? Enables exploration of broad design space

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Uses for Simulation Design and Implementation Verification High-level Design Microarchitectural Definition Design Space Exploration Quantitative Tradeoff AnalysisFunctional validation Late design changes Performance Validation Precision Flexibility Academic Research Accuracy???

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Causes of Inaccuracy Many possible causes Software differences Hardware differences System effects Time dilation: interaction with physical world Here, we consider: Operating system code DMA traffic Wrong-path effects

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Validating Accuracy How do we validate? Against real hardware with perf. counters Different “input” since O/S now present Also, post-mortem: too late Against HDL Same input as model, same error? Without full system simulation, cannot: Replicate runtime environment Cannot really validate accuracy Compensating errors mask inaccuracy Hence: build simulator that does not cheat

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti PharmSim Overview Device simulation, etc. from SimOS-PPC PharmSim replaces functional simulators Full OOO core model, values in rename registers Based on SimpleMP [Rajwar] Adds priv. mode, MMU, TLB, exceptions, interrupts, barriers, flushes, etc. Block Simple SimOS-PPC -AIX Disk driver -E’net driver Ethernet PharmSim -OOO Core -Gigaplane

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti PharmSim Pipeline DecodeExecute CommitMem FetchTranslate Substantially similar to IBM Power4 Some instructions “cracked” (1:2 expansion) Others (e.g. lmw) microcode stream Mem Stage Interface to 2-level cache model Sun Gigaplane XB snoopy MP coherence Caches contain values, must remain coherent No cheating! No “flat” memory model for reference/redirect

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Talk Outline Introduction & Motivation Performance Modeling Precision, Accuracy, Flexibility PharmSim Overview Causes of Inaccuracy O/S Effects Coherent I/O (DMA) Wrong-path Effects Summary & Conclusions

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Operating System Effects Fairly well-understood for commercial: Must account for O/S references For SPEC? Widely accepted: Safe to ignore O/S paths Most popular tool (Simplescalar) Intercepts system calls Emulates on host, updates “flat” memory Returns “magically” with caches intact Is this really OK?

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Operating System Effects References ModeledExample User-mode onlyAtom User + Shared librarySimplescalar with static link User + Sh Lib + O/SH/W bus trace User + Sh Lib + O/S + cache control ops PharmSim

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Operating System Effects Dramatic error (5.8x in mcf, 2-3x commonplace) Note compensating errors (e.g. crafty, gzip, perl) IPC error > 100% (more detail at ISCA) 5.8x

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Talk Outline Introduction & Motivation Performance Modeling Precision, Accuracy, Flexibility PharmSim Overview Causes of Inaccuracy O/S Effects Coherent I/O (DMA) Wrong-path Effects Summary & Conclusions

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Coherent I/O with DMA Proc Cache Proc Cache Memory PCI Bridge Graphics SCSI

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti DMA Traffic How do we support DMA? No “flat” memory image in simulator Lines may be in caches Invalidate (if DMA write) Flush (if DMA read) Must use existing coherence protocol Everything has to work correctly No subtle coherence bugs How much does this matter? Affects cache miss rates Introduces bus contention

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti DMA Traffic PharmSim incorporates accurate DMA engine: Issues bus invalidates, snoops Concurrent data transfer: No “magic” flat memory Bottom line: Unimportant for SPECINT Unimportant for SPECWEB, SPECJBB Others in progress Contrived multiprogrammed workload 4.8% of all coherence traffic due to I/O, 1% IPC effect Results understated due to “overbuilt” MP bus MP workloads likely much more sensitive Additional evaluation in progress

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Talk Outline Introduction & Motivation Performance Modeling Precision, Accuracy, Flexibility PharmSim Overview Causes of Inaccuracy O/S Effects Coherent I/O (DMA) Wrong-path Effects Summary & Conclusions

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-Path Execution Branch predictor predicts control flow Branch execute redirects mispredictions Extra instructions on wrong path

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-path Execution Multiple effects on unarchitected state Pollute/prefetch I-cache, D-cache, TLB Pollute/train branch predictor (BHR, PHT, RAS) PharmSim (current status): BHR is updated and repaired PHT is not updated speculatively RAS is updated, no repair No speculative TLB fill How can we filter wrong-path instructions? No “cheating”: don’t know branch outcomes

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Eliminating Wrong-Path Runahead PharmSim Branch Outcome Trace Right-path PharmSim On correctly predicted branch -continue fetching (BAU) On mispredicted branch -stall instruction fetch -restart once branch resolves

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-path Instructions Aggressive core model; 25%-40% wrong-path

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-path Memory Stalls Minor effect: better or worse

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-path RAS Accuracy Prediction accuracy degrades up to 29% Could add fixup logic

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Wrong-path IPC Negligible effect (0.9%) RAS mispredictions overlapped

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Summary PharmSim Simulator that does not cheat Can be used to validate assumptions, simplifications, abstractions Evaluated three effects on accuracy O/S: dramatic error, even for SPECINT DMA: not important for uniprocessors MP, bus-constrained results TBD Wrong path: unimportant

February 8, 2002Precise and Accurate Processor Simulation--Mikko Lipasti Conclusions Ignoring O/S effects fraught with danger Should always model O/S effects Trace-driven vs. execution-driven Traces with O/S much better Invest in Trace quality vs. Complexity of execution-driven simulation Precision without accuracy? Of questionable value Validation difficult due to compensating errors Hard to know if model is precise or accurate