Hy-C A Compiler Retargetable for 2014 and beyond Philip Sweany 4/29/2014
In the Beginning We compilers specific to: – One language (C, Fortran, Lisp, Java, Ruby, …) – One computer architecture (x86, Mips, …) First retargetable compiler concept was to mix M front-ends (language-specific) and N back- ends (architecture-specific) pcc (portable C compiler)
Then … Two approaches – Generic --- build compiler for an instruction set (ISA) that generates “ok” code for lots of architectures e.g. gcc, llvm – Build a very detailed architecture language AND include ISA description as well e.g. LisaTek
Problem Supporting retargetable ISA (gcc) alone is daunting. Adding hardware description on to it makes retargeting almost impossible. Until now, that is.
Hybrid Computing Heterogeneous processors on single chip – “CPU” – FPGA – ASIC – N “CPU”s, M FPGAs, K ASICs Tradeoffs of performance, power, flexibility Wave of the future (?) (or maybe the present?)
CPU 1 CPU 2 CPU m Multi-CPU FPGA 1 FPGA 2 FPGA n Multi-FPGA Shared Memory Generic Hybrid Architecture
System Specification Partitioning CPU Compiler FPGA Synthesis CPU Power-Performance Model FPGA Power-Performance Model Source Code Generic Hy-C Tools Optimization Control Objectives/Constraints
Veyron Tesla Ducati Multi-CPU Shared Memory OMAP Resources (old)
OMAP Processor Resources Chiron – 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) – Power 600uW / MHz Tesla – DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP – Power 200uW / MHz Ducati – 200 MHz (targeted for control, low latency code) – Power 100uW / MHz
StrongArm C64x FPGA Shared Memory “Canonical” Resources WimpyArm
“Canonical” Processor Resources StrongArm – 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) – Power 600uW / MHz C64x – DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP – Power 200uW / MHz WimpyArm – 200 MHz (targeted for control, low latency code) – Power 100uW / MHz FPGA fabric
System Specification Partitioning Strong Wimpy Source Code Hy-C for Canonical Chip Optimization Control Objectives/Constraints C64x FPGA
Open Issue(s) How should we describe the architecture? How will we build “programs” for the FPGA? How should we describe the optimization constraints? How/when shall we implement this beast? How will we evaluate the “performance” of the generated code?