André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998.

Slides:



Advertisements
Similar presentations
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Advertisements

Computer Organization and Architecture
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Lecture 6: Multicore Systems
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Multiscalar processors
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
Software Overview. Why review software? Software is the set of instructions that tells hardware what to do The reason for hardware is to execute a program.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
André Seznec Caps Team IRISA/INRIA HAVEGE HArdware Volatile Entropy Gathering and Expansion Unpredictable random number generation at user level André.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Computer Organization and Design Computer Abstractions and Technology
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.
1 CAPS Compilers Activities IRISA Campus Universitaire de Beaulieu Rennes.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
André Seznec Caps Team IRISA/INRIA 1 High Performance Microprocessors André Seznec IRISA/INRIA
Pipelining and Parallelism Mark Staveley
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
Sunpyo Hong, Hyesoon Kim
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Hyperthreading Technology
A Review of Processor Design Flow
Department of Computer Science University of California, Santa Barbara
Levels of Parallelism within a Single Processor
Lecture 14: Reducing Cache Misses
Hardware Multithreading
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Levels of Parallelism within a Single Processor
Department of Computer Science University of California, Santa Barbara
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Presentation transcript:

André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 2 Myself (1)  Senior researcher  Working on computer architecture for 15 years  Works on:  memory systems  pipeline structure  cache structures  branch prediction mechanisms  Simultaneous Multithreading

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 3 Myself (2)  Interested in computer architecture  For me, tools are the dark side of architecture!

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 4 Validating microarchitecture concepts  Just a description with some explanation:  beginning of the 80 ’s  not so bad  Analytical model ?  May work for coarse grain evaluation on networks, on multiprocessors,..  For microarchitecture, just be serious !!  Simulation  I have not found a better method !  but, I do not overtrust simulation results

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 5 Simulation: my own experience  : DSPA and multi-DSPA  : OPAC floating point coprocessor  : cache simulation  : processor simulation  : branch prediction  : Simultaneous Multithreading

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 6 DSPA and multi-DSPA  Decoupled Access Execute Architecture  Shared memory architecture  Original memory and interconnection network  Hardware FIFOs everywhere !!

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 7 DSPA simulation  Primitive !  Benchmarks:  Just a few numerical kernels  hand-coded assembly  Cycle accurate simulation!  Validation of the memory system

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 8 OPAC floating-point coprocessor  Floating-point coprocessor  Dedicated to compute-bound kernels  matrix operations: BLAS3 library  FFTs, convolutions,..  Built real hardware !  300 ICs board  a special-purpose VLSI sequencer

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 9 OPAC (2)  Developped in //:  HDL simulator on CAD tool  C simulator  pseudo-language + pseudo compiler  applications  Total interactions:  completely accurate simulator  design decisions based on: hardware constraints performance evaluation code generation

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 10 OPAC (3)  It was real fun !  We learned a lot of things !  Research impact: ??  Killer micros appeared at this period

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 11 Cache simulation  We begun in 1991  Among the first groups in Europe  We had to learn everything:  how to the get the traces  which benchmarks  how to simulate  what is important

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 12 Getting the first traces:  ATUM traces:  hardware monitored VAX traces  fttp available  very short !  DLX traces:  ftp availlable  SparcSim simulator  very very slow

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 13 Chosing the first benchmarks:  Picking our own applications !!  Not a worst choice than SPEC92 or SPEC95  But, reviewers like standards !

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 14 The first simulators:  Tried the Dinero simulator from Mark Hill  But, Skewed Associative Caches ?

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 15 First results:  Skewed-associative caches (1992)  simulation helps to convince the reviewers  good presentation and good figures are far more important  Five years later, simulation results just become noise  Semi-Unified caches (1993)  simulation needed  quantify the benefit  Unfortunately, we just picked the bad conference

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 16 Further cache studies  Using more accepted traces: SPEC92-95, SPLASH, etc  trace collection through spa:  Gordon Irlan  Quantifying the impact through simulations  Explaining and analyzing is more important

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 17 Other simulations  Cache simulation is a piece of cake  « Real architects » simulate complete processors  scalar processor  in-order superscalar orocessors  out-of-order superscalar processors  Simultaneous Multithreading processors  Next needed step: complete wrong path simulations

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 18 Two block ahead branch prediction  ASPLOS 96  Great idea !  Poor simulation methodology !  Performance impact ignored !

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 19 Two block ahead branch predictor (continued)  Now:  complete processor simulation  IBS traces (hardware monitored)  pro and cons understood: high instruction bandwidth misprediction penalty problem  Limitations:  No wrong path execution  simplified execution core  IBS traces (1993, 16Mhz processor, 16Mb memory)

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 20 Simultaneous Multithreading  Several processes sharing functional units in a superscalar processor.  First paper (Tullsen et al): 1995  We begun in //: 1993: one year too late

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 21 Simultaneous Multithreading (2)  We got results on:  branch prediction  cache behavior  In-order versus out-order execution  Methodology:  Trace-driven simulation  mixing traces from different processes

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 22 Simultaneous Multithreading (3)  Known limitations:  Operating system  Context switches  spectrum of applications  wrong path execution  However:  Solid results

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 23 Skewed Branch Predictors

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 24 Skewed Branch Predictors  The most complete analysis we have ever done:  Explanations  Simulations  Mathematical analysis  Likely to become a reference paper

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 25 What I have learned from my experience  Simulations help to:  convince : yourself (most important) most of the reviewers (that‘s life)  in-depth analysis: discover why it works explain to « real » architects

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 26 What I have learned from my experience (2)  Don ’t overtrust simulation results !  Be aware of:  just a measurement point

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 27 traces are ALWAYS limited:  toys applications and/or  lacks operating system activities and/or  old applications:  IBS traces:16 Mhz scalar processor - 16 Mb memory  Future processors: > 1 Ghz 10-way superscalar, >1 Gb memory

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 28 Simulators limitation  « Complete » simulation:  Complex ( tenths of thousands code lines)  CPU time consuming  Always some simplification assumption:  Sometimes valid,  Sometimes not..  Simulators are slow !

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 29 Simulator User Limitations  99 % of the simulation results are directly for the garbage:  Just a bug in the simulator !  « This idea was ridiculous ! » (two monthes of work)  Just lacking the interesting measure !  « Finally, I do not have place for this graph ! »

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 30 Trusting or distrusting YOUR OWN simulation results  Trust the tendencies:  « This mechanism has a better behavior than this other »  Always distrust absolute numbers:  « A 32-Kbyte cache is sufficient as it exhibits a 1 % miss rate » on your benchmark, with your tracing tool, without kernel, without context swiches,..

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 31 As a reviewer or PC member  I just do not trust simulation results that I do not understand  What is needed:  Clear explanation  Insight of why things are working  Insight of possible limitations  Simulations free papers are refreshing sometimes:  « Difference-bit cache » Toni Juan et al 1996

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 32 Needs for microarchitecture simulation today  Some existing processors performance were misestimated by a factor 20 % by the manufacturer  Wrong path execution  Kernel activities

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 33 Execution or trace-driven simulation ?  Trace-driven simulation sufficient for:  comparing two cache structures or two branch predictors  In-order processor simulation  Execution-driven required for:  precise out-of-order processor simulation  studying bandwidth impact

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 34 Which >?  We need real workloads:  all processes running on a workstation  user and kernel activities  on real data  Ideally, let us capture the whole activity of a workstation for a second each hour  Calvin

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 35 TOOLS, WHAT WE ARE DOING  Salto: a System for Assembly Languages Transformation and Optimizations  Calvin: Cloning Assembly Languages in View of Instrumentation Needs

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA Salto Overview  assembly source to source preprocessor  retargetable, exists for sparc, alpha, mips, Philips TM-1000, Pentium, TI C6x  can be used  to instrument or transform assembly code,  to schedule assembly code,  for register allocation, basic bloc layout, etc.  derive simulators  fine grain machine description  object-oriented interface

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA Salto Organisation Transformation tool SALTO interface C++ Machine Description assembly language

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 38 Calvin  With current Instrumentation tools codes are running slow  Just want to pay the penalty when collecting traces  Code cloning tracing allows it  Calvin Status:  built using Salto  work on single user application  Overall project: instrument a Linux workstation

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 39 Calvin  With current Instrumentation tools codes are running slow  Just want to pay the penalty when collecting traces  Code cloning tracing allows it  Calvin Status: work on single user application  Overall project: instrument a Linux workstation

Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 40 My conclusion  Architecture is fun  Simulation is  boring (always)  necessary (sometimes)  misleading (often)  As processors become more and more complex, « numbers » will become less and less accurate. Only trust tendencies.  Tools are more and more needed