Presentation is loading. Please wait.

Presentation is loading. Please wait.

André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998.

Similar presentations


Presentation on theme: "André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998."— Presentation transcript:

1 André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998

2 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 2 Myself (1)  Senior researcher  Working on computer architecture for 15 years  Works on:  memory systems  pipeline structure  cache structures  branch prediction mechanisms  Simultaneous Multithreading

3 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 3 Myself (2)  Interested in computer architecture  For me, tools are the dark side of architecture!

4 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 4 Validating microarchitecture concepts  Just a description with some explanation:  beginning of the 80 ’s  not so bad  Analytical model ?  May work for coarse grain evaluation on networks, on multiprocessors,..  For microarchitecture, just be serious !!  Simulation  I have not found a better method !  but, I do not overtrust simulation results

5 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 5 Simulation: my own experience  1986-88: DSPA and multi-DSPA  1988-91: OPAC floating point coprocessor  1991-96: cache simulation  1993-.. : processor simulation  1996-.. : branch prediction  1994-.. : Simultaneous Multithreading

6 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 6 DSPA and multi-DSPA  Decoupled Access Execute Architecture  Shared memory architecture  Original memory and interconnection network  Hardware FIFOs everywhere !!

7 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 7 DSPA simulation  Primitive !  Benchmarks:  Just a few numerical kernels  hand-coded assembly  Cycle accurate simulation!  Validation of the memory system

8 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 8 OPAC floating-point coprocessor  Floating-point coprocessor  Dedicated to compute-bound kernels  matrix operations: BLAS3 library  FFTs, convolutions,..  Built real hardware !  300 ICs board  a special-purpose VLSI sequencer

9 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 9 OPAC (2)  Developped in //:  HDL simulator on CAD tool  C simulator  pseudo-language + pseudo compiler  applications  Total interactions:  completely accurate simulator  design decisions based on: hardware constraints performance evaluation code generation

10 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 10 OPAC (3)  It was real fun !  We learned a lot of things !  Research impact: ??  Killer micros appeared at this period

11 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 11 Cache simulation  We begun in 1991  Among the first groups in Europe  We had to learn everything:  how to the get the traces  which benchmarks  how to simulate  what is important

12 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 12 Getting the first traces:  ATUM traces:  hardware monitored VAX traces  fttp available  very short !  DLX traces:  ftp availlable  SparcSim simulator  very very slow

13 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 13 Chosing the first benchmarks:  Picking our own applications !!  Not a worst choice than SPEC92 or SPEC95  But, reviewers like standards !

14 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 14 The first simulators:  Tried the Dinero simulator from Mark Hill  But, Skewed Associative Caches ?

15 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 15 First results:  Skewed-associative caches (1992)  simulation helps to convince the reviewers  good presentation and good figures are far more important  Five years later, simulation results just become noise  Semi-Unified caches (1993)  simulation needed  quantify the benefit  Unfortunately, we just picked the bad conference

16 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 16 Further cache studies  Using more accepted traces: SPEC92-95, SPLASH, etc  trace collection through spa:  Gordon Irlan  Quantifying the impact through simulations  Explaining and analyzing is more important

17 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 17 Other simulations  Cache simulation is a piece of cake  « Real architects » simulate complete processors  scalar processor  in-order superscalar orocessors  out-of-order superscalar processors  Simultaneous Multithreading processors  Next needed step: complete wrong path simulations

18 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 18 Two block ahead branch prediction  ASPLOS 96  Great idea !  Poor simulation methodology !  Performance impact ignored !

19 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 19 Two block ahead branch predictor (continued)  Now:  complete processor simulation  IBS traces (hardware monitored)  pro and cons understood: high instruction bandwidth misprediction penalty problem  Limitations:  No wrong path execution  simplified execution core  IBS traces (1993, 16Mhz processor, 16Mb memory)

20 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 20 Simultaneous Multithreading  Several processes sharing functional units in a superscalar processor.  First paper (Tullsen et al): 1995  We begun in //: 1993: one year too late

21 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 21 Simultaneous Multithreading (2)  We got results on:  branch prediction  cache behavior  In-order versus out-order execution  Methodology:  Trace-driven simulation  mixing traces from different processes

22 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 22 Simultaneous Multithreading (3)  Known limitations:  Operating system  Context switches  spectrum of applications  wrong path execution  However:  Solid results

23 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 23 Skewed Branch Predictors

24 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 24 Skewed Branch Predictors  The most complete analysis we have ever done:  Explanations  Simulations  Mathematical analysis  Likely to become a reference paper

25 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 25 What I have learned from my experience  Simulations help to:  convince : yourself (most important) most of the reviewers (that‘s life)  in-depth analysis: discover why it works explain to « real » architects

26 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 26 What I have learned from my experience (2)  Don ’t overtrust simulation results !  Be aware of:  just a measurement point

27 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 27 traces are ALWAYS limited:  toys applications and/or  lacks operating system activities and/or  old applications:  IBS traces:16 Mhz scalar processor - 16 Mb memory  Future processors: > 1 Ghz 10-way superscalar, >1 Gb memory

28 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 28 Simulators limitation  « Complete » simulation:  Complex ( tenths of thousands code lines)  CPU time consuming  Always some simplification assumption:  Sometimes valid,  Sometimes not..  Simulators are slow !

29 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 29 Simulator User Limitations  99 % of the simulation results are directly for the garbage:  Just a bug in the simulator !  « This idea was ridiculous ! » (two monthes of work)  Just lacking the interesting measure !  « Finally, I do not have place for this graph ! »

30 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 30 Trusting or distrusting YOUR OWN simulation results  Trust the tendencies:  « This mechanism has a better behavior than this other »  Always distrust absolute numbers:  « A 32-Kbyte cache is sufficient as it exhibits a 1 % miss rate » on your benchmark, with your tracing tool, without kernel, without context swiches,..

31 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 31 As a reviewer or PC member  I just do not trust simulation results that I do not understand  What is needed:  Clear explanation  Insight of why things are working  Insight of possible limitations  Simulations free papers are refreshing sometimes:  « Difference-bit cache » Toni Juan et al 1996

32 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 32 Needs for microarchitecture simulation today  Some existing processors performance were misestimated by a factor 20 % by the manufacturer  Wrong path execution  Kernel activities

33 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 33 Execution or trace-driven simulation ?  Trace-driven simulation sufficient for:  comparing two cache structures or two branch predictors  In-order processor simulation  Execution-driven required for:  precise out-of-order processor simulation  studying bandwidth impact

34 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 34 Which >?  We need real workloads:  all processes running on a workstation  user and kernel activities  on real data  Ideally, let us capture the whole activity of a workstation for a second each hour  Calvin

35 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 35 TOOLS, WHAT WE ARE DOING  Salto: a System for Assembly Languages Transformation and Optimizations  Calvin: Cloning Assembly Languages in View of Instrumentation Needs

36 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA Salto Overview  assembly source to source preprocessor  retargetable, exists for sparc, alpha, mips, Philips TM-1000, Pentium, TI C6x  can be used  to instrument or transform assembly code,  to schedule assembly code,  for register allocation, basic bloc layout, etc.  derive simulators  fine grain machine description  object-oriented interface

37 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA Salto Organisation Transformation tool SALTO interface C++ Machine Description assembly language

38 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 38 Calvin  With current Instrumentation tools codes are running slow  Just want to pay the penalty when collecting traces  Code cloning tracing allows it  Calvin Status:  built using Salto  work on single user application  Overall project: instrument a Linux workstation

39 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 39 Calvin  With current Instrumentation tools codes are running slow  Just want to pay the penalty when collecting traces  Code cloning tracing allows it  Calvin Status: work on single user application  Overall project: instrument a Linux workstation

40 Simulation: a user point of view André Seznec Caps Team IRISA/INRIA 40 My conclusion  Architecture is fun  Simulation is  boring (always)  necessary (sometimes)  misleading (often)  As processors become more and more complex, « numbers » will become less and less accurate. Only trust tendencies.  Tools are more and more needed


Download ppt "André Seznec Caps Team IRISA/INRIA 1 Simulation: a user point of view André Seznec IRISA/INRIA Sept. 1998."

Similar presentations


Ads by Google