CAPS team Compiler and Architecture for superscalar and embedded processors.

CAPS team Compiler and Architecture for superscalar and embedded processors

CAPS project 2 CAPS members  2 INRIA researchers: A. Seznec, P. Michaud  2 professors: F. Bodin, J. Lenfant  11 Ph D students: R. Amicel, R. Dolbeau, A. Monsifrot, L. Bertaux, K. Heydemann, L. Morin, G. Pokam, A. Djabelkhir, A. Fraboulet, O. Rochecouste, E.Toullec  3 engineers : S. Bihan, P. Villalon, J. Simonnet

CAPS project 3 CAPS themes  Two interacting activities  High performance microprocessor architecture  Performance oriented compilation

CAPS project 4 CAPS Grail  Performance at the best cost Progress in computer science and applications are driven by performance

CAPS project 5 CAPS path to the Grail  Defining the tradeoffs between:  what should be done through hardware  what can be done by the compiler  for maximum performance  or for minimum cost  or for minimum size, power..

CAPS project 6 Need for high-performance processors  Current applications  general purpose: scientific, multimedia, data bases …  embedded systems: cell phones, automotive, set-top boxes..  Future applications  don’t worry: users have a lot of imagination !  New software engineering techniques are CPU hungry:  reusability, generality  portability, extensibility (indirections, virtual machines)  safety (run-time verifications)  encryption/decryption

CAPS project 7 CAPS (ancient) background  « ancient » background in hardware and software management of ILP  decoupled pipeline architectures  OPAC, an hardware matrix floating-point coprocessor  software pipeline for LIW  « Supercomputing » background  interleaved memories  Fortran-S

CAPS project CAPS background in architecture  Solid knowledge in microprocessor architecture  technological watch on microprocessors  A. Seznec worked with Alpha Development Group in 1999-2000  Researches in cache architecture  Researches in branch prediction mechanisms

CAPS project 9 CAPS background in compilers  Software optimizations for cache memories  Numerical algorithms on dense structures  Optimizing data layout  Many prototype environments for parallel compilers:  CT++ (with CEA): image processing C++ library for a SIMD architecture,  Menhir: a parallel compiler for MatLab  IPF (with Thomson-LER): Fortran Compiler for image processing on Maspar  Sage (with Indiana): Infrastusture for source level transformation

CAPS project 10 We build on  SALTO: System for Assembly-Language Transformations and Optimizations  retargetable assembly source to source preprocessor  Erven Rohou’s Ph. D  TSF:  Scripting language for program transformation on top of ForeSys (Simulog)  Yann Mevel’s Ph. D

CAPS project 11 Salto overview  Assembly source to source preprocessor  Fine grain machine description  Independent from compilers Transformation tool SALTO interface C++ Machine Description assembly language

CAPS project 12 Compiler activities  Code optimizations for embedded applications  infrastructures rather than compilers  optimizing compiler strategies rather than new code optimizations  Global constraints  performance /code sizes/ low power (starting)  Focus on interactive tools rather than automatic  code tuning  case based reasoning  assembly code optimizations

CAPS project 13 Computer aided hand tuning  Automatic optimization has many shortcomings  rather provide the user with a testbed to hand-tune applications  Target applications  Fortran codes and embedded C applications  Our approach  case based reasoning  static code analysis and pattern matching  profiling  learning techniques  the user is the ultimate responsible

CAPS project 14 CAHT Prototype built on Foresys: Fortran interactive front-end (from Simulog) TSF: Scripting language for program transformation Sage++: Infrastusture for source level transformation

CAPS project 15 Analysis and Tuning tool for Low Level Assembly and Source code (with Thomson Multimedia)  ATLLAS objectives :  Has the compiler done a good job ?  Try to match source and optimized assembly at fine grain  Development/analysis environment:  Models for both source and assembly  Global and local analysis (WCET, …) at both levels  Interactive environment for codes visualization and manual/ automatic analysis and optimization  Built using Salto and Sage++:  Retargetable with compilers and architectures

CAPS project 16 ATLLAS - Analysis and Tuning tool for Low Level Assembly and Source code : Tuning method Good ? Half-Automatic or Manual Source Optimisations Atllas compilation profiling End Yes Half-Automatic or Manual Assembly Optimisations Source CodeAssembly Code Post-Processing Processing Support C ode matching analysis and evaluations Graphic Display of Ass. And Src. Code

CAPS project 17 Assembly Level Infrastrure for Software Enhancement (with STmicroelectonics)  ALISE  enhanced SALTO for code optimization: better integration with code generation –interface with front-end –interface for profiling data targets global optimization based on component software optimization engines  Answer to a real need from industry:  A retargetable infrastructure

CAPS project 18 ALISE  Environment for:  global assembly code optimization  providing optimization alternatives  Support for new embedded processors  ISAs with ILP support (VLIW, EPIC)  Predicated instructions  Functional unit clusters,..

CAPS project 19 ALISE Architecture Description D to M Architecture Model Intermediate representation Opt 1Opt 2Opt n P to IR Text Input IR to Ass (Emit) Optimized Program High Level API Interfaces External Infrastructure User interface G.U.I. Intermediate Code External Infrastructure

CAPS project 20 Preprocessor for media processors (MEDEA+ Mesa project)  Multimedia instructions on embedded and general- purpose processors but :  no consensus on MMD instructions among constructors: saturated arithmetic or not, different instructions, …  Multimedia instructions are not well handled by compilers: but performance is very dependent

CAPS project 21 Preprocessor for media processors: our approach  C source to source preprocessor  user oriented idioms recognition:  easy to retarget  target dedicated recognition  exploiting loop parallelism  vectorization techniques  multiprocessor systems  available soon  Collaboration with Stmicroelectonics

CAPS project 22 Iterative compilation  Embedded systems:  Compile time is not critical  Performance/code size/power are critical  One can often relate on profiling  Classical compiler: local optimizations  but constraints are GLOBAL  Proof of concept for code sizes (Rohou ’s Ph. D)  new Ph. D. beginning in september 2000

CAPS project 23 High performance instruction set simulation  Embedded processors:  // development of silicon, ISA, compiler and applications  Need for flexible instruction set simulation:  high performance  simulation of large codes  debugging  retargetable to experiment: new ISA various microarchitecture options  First results: up to 50x faster than ad-hoc simulator

CAPS project 24 ABSCISS: Assembly Based System for Compiled Instruction Set Simulation C SourceTriMedia Assembly tmcc TriMedia Binary ABSCISS tmsim tmas gcc C/C++ Source Compiled simulator Architecture Description

CAPS project 25 Enabling superscalar processor simulation  Complete O-O-O microprocessor simulation:  10000-100000 slower than real hardware  can not simulate realistic applications, but slices  even fast mode emulation is slow (50-100x): simulation generally limited to slices at the beginning of the application representativeness ?  Calvin2 + DICE:  combines direct execution with simulation  really fast mode: 1-2x slowdown  enables simulating slices distributed over the whole application

CAPS project 26 DICE Host ISA Emulator User analysis routines Calvin2 + DICE Original code SPARC V9 assembly code calvin2 Static Code Annotation Tool checkpoint Switching event Emulation mode Switching event

CAPS project 27 Moving tools to IA64  New 64bit ISA from Intel/HP:  Explicitly Parallel Instruction Computing  Predicated Execution  Advanced loads (i.e. speculative)  A very interesting platform for research !!  Porting SALTO and Calvin2+DICE approach to IA64  Exploring new trade-offs enabled by instruction sets:  predicting the predicates ?  advanced loads against predicting dependencies  ultimate out-of-order execution against compiler

CAPS project 28 Low power, compilation, architecture, … (just beginning :=)  Power consumption becomes a major issue:  Embedded and general purpose  Compilation (setting a collaboration with STmicroelectronics/Stanford/Milan):  Is it different from performance optimization ?  Global constraint optimization  Instruction Set Architecture support ?  Architecture:  High order bits are generally null, …  registers and memory  ALUs

CAPS project 29 Caches and branch predictors  International CAPS visibility in architecture =  skewed associative cache  + decoupled sectored cache  + multiple block ahead branch prediction  + skewed branch predictor  Continue recurrent work on these topics:  multiple block ahead + tradeoffs complexity/accuracy

CAPS project 30 Simultaneous Multithreading  Sharing functional units among several processes  Among the first groups working on this topic  S. Hily’s Ph. D.  SMT behavior well understood for independent threads  now, focus on // threads from a single application  Current research directions:  speculative multithreading ultimate performance with a single thread through predicting threads  performance/complexity tradeoffs: SMT/CMP/hybrid

CAPS project 31 « Enlarging » the instruction window (supported by Intel)  In an O-O-O processor, fireable instructions are chosen in a window of a few tens of RISC-like instructions.  Limitations are:  size of the window  number of physical registers  Prescheduling:  separate data flow scheduling from resource arbitration.  coarser units of work ?  Reducing the number of physical registers:  how to detect when a physical register is dead ?  Per group validation ? revisiting CISC/RISC war ?

CAPS project 32 Unwritten rule on superscalar processor designs  For general purpose registers: Any physical register can be the source or the result of any instruction executed on any functional unit

CAPS project 33 4-cluster WSRS architecture (supported by Intel) S0 C0 S1 C1 S2 C2 S3 C3 S2 Half the read ports, one fourth the write ports Register file: Silicon area x 1/8 Power x 1/2 Access time x 0.6 Gains on: bypass network selection logic

CAPS project 34 Multiprocessor on a chip  Not just replicating board level solutions !  A way to manage a large on-chip cache capacity:  how can a sequential application use efficiently a distributed cache ?  architectural supports for distributing a sequential application on several processors ?  how should instructions and data be distributed ?

CAPS project 35 HIPSOR HIgh Performance SOftware Random number generation  Need for unpredicable random number generation:  sequences that cannot be reproduced  State of the art:  < 100 bit/s using the operating system  75Kbit/s using hardware generator on Pentium III  Internal state of a superscalar can not be reproduced  use this state to generate unpredictable random numbers

CAPS project 36 HIPSOR (2)  1000’s of unmonitorable states modified by OS interrupts  Hardware clock counter to indirectly probe these states  Combined with in-line pseudo-random number generation  100 Mbit/s unpredictable random numbers ARC INRIA with CODES

CAPS team Compiler and Architecture for superscalar and embedded processors.

Similar presentations

Presentation on theme: "CAPS team Compiler and Architecture for superscalar and embedded processors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CAPS team Compiler and Architecture for superscalar and embedded processors.

Similar presentations

Presentation on theme: "CAPS team Compiler and Architecture for superscalar and embedded processors."— Presentation transcript:

Similar presentations

About project

Feedback