Download presentation
Presentation is loading. Please wait.
1
Just-in-Time Compilation for FPGA Processor Cores This work was supported in part by the National Science Foundation (CNS1016792) and by the Semiconductor Research Corporation (GRC 2143.001) Andrew Becker 1, Scott Sirowy 2, Frank Vahid Department of Computer Science and Engineering University of California, Riverside {abecker | ssirowy | vahid}@cs.ucr.edu 1. Now at EPFL 2. Now at ESRI
2
Andrew Becker 2 of 20 Motivation SystemC useful capture language Concurrency, structure, timing Simulation typical, but in-system I/O often useful Design/synthesis to FPGA may take hours/days and require advanced tools Switches/LEDs Cameras/displays In-system I/O Simulation
3
Andrew Becker 3 of 20 Background Want rapid design iteration with in-system I/O Compile design description; avoid design/synthesis Previously: Hybrid approach—SystemC bytecode class CLK_GEN : public sc_module { sc_in clock; … CLK_GEN(){ … class CLK_GEN : public sc_module { sc_in clock; … CLK_GEN(){ … SystemC Code Compiler process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … Bytecode Simulator (no in-system I/O) Design/synthesis (time-consuming) … Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09]
4
Andrew Becker 4 of 20 Background Emulate bytecode in engine on FPGA Fast compilation Bytecode also portable (FPGA-device independent) Compiler FPGA Emulation Engine process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … Bytecode Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09] In-system I/O class CLK_GEN : public sc_module { sc_in clock; … CLK_GEN(){ … class CLK_GEN : public sc_module { sc_in clock; … CLK_GEN(){ …
5
Andrew Becker 5 of 20 Emulation Engine Discrete event simulator C code on a processor (Currently Microblaze soft-core; could be hard-core) Support-circuits for architectural features, peripheral I/O Processor Core UART LEDs Buttons Instruction Mem. Read Signal Memory Write Signal Memory Peripheral Bus Event Kernel Frame Buffer
6
Andrew Becker 6 of 20 Caveat Emptor Emulation is slow On soft-core, is even slower than PC simulation Won't meet many real-time constraints
7
Andrew Becker 7 of 20 This work – Speed up emulator First analyzed emulator performance
8
Andrew Becker 8 of 20 Low-Hanging Fruit 69% of time spent emulating bytecode Two strategies to reduce Reduce each instruction’s emulation time Reduce instruction memory latency
9
Andrew Becker 9 of 20 First Step Reduce instruction emulation time Optimize event kernel? Processor Core UART LEDs Buttons Instruction Mem. Read Signal Memory Write Signal Memory Peripheral Bus Event Kernel Frame Buffer
10
Andrew Becker 10 of 20 First Step Reduce instruction emulation time Optimize event kernel? Just-in-time (JIT) compile bytecode to native processor code, done transparently by event kernel Processor Core UART LEDs Buttons Instruction Mem. Read Signal Memory Write Signal Memory Peripheral Bus Event Kernel Frame Buffer
11
Andrew Becker 11 of 20 Just-in-Time Compilation of Bytecode Implemented SystemC-bytecode to Microblaze JIT compiler 3x speedup; still portable Tunable delay/jitter Still want more speed process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start:ADDI $2 $2 1 ADDI $3 $0 7 … process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start:ADDI $2 $2 1 ADDI $3 $0 7 … Emulation Engine Machine Code Event Kernel Machine Code Bytecode IMM 0xDEAD LWI $11 $0 0xBEEF BGTI $11 Start BRAI Done Start: … IMM 0xDEAD LWI $11 $0 0xBEEF BGTI $11 Start BRAI Done Start: … Machine Code Emulation Engine JIT
12
Andrew Becker 12 of 20 Further Improvement Reduce instruction memory latency Add dedicated small, fast memory for JIT code on a fast, local bus Unique JIT possibility due to FPGA configurability
13
Andrew Becker 13 of 20 Architecture Changes Processor Core UART LEDs Buttons Instr. Mem. Read Signal Memory Write Signal Memory Peripheral Bus Emulation Engine Local Memory Bus JIT Mem. Frame Buffer
14
Andrew Becker 14 of 20 Even Further Improvement 23% of time spent maintaining signal queue What can be done? Optimize signal queue maintenance code?
15
Andrew Becker 15 of 20 Common Denominator FPGA offers configurability Engine designer can make tradeoffs Trade hardware resources for speed FPGA Emulation Engine FPGA Emulation Engine Extra Resources
16
Andrew Becker 16 of 20 Common Denominator FPGA offers configurability Engine designer can make tradeoffs Trade hardware resources for speed Add another soft-core? FPGA Emulation Engine FPGA Emulation Engine Extra Resources
17
Andrew Becker 17 of 20 Even Further Improvement 23% of time spent maintaining signal queue What can be done? Optimize signal queue maintenance code? Offload job to coprocessor Again, unique JIT option due to FPGA configurability
18
Andrew Becker 18 of 20 Architecture Changes Processor Core UART LEDs Buttons Instr. Mem. Read Signal Memory Write Signal Memory Peripheral Bus Emulation Engine Local Memory Bus JIT Mem. Signal Queue Emulation Memory Controller Frame Buffer
19
Andrew Becker 19 of 20 Experimental Results
20
Andrew Becker 20 of 20 Conclusions Approach rapid design iteration with in-system I/O Uses Education (typically loose timing constraints) System prototypes that can tolerate real-time slowdown (e.g., slow frame rate) Portable and flexible Engine design sets speed, not compiler or CAD flow This work: 15x speedup via normal JIT (3x) + FPGA-specific JIT (5x) But, still orders of magnitude slower than design/synthesis Future work: Bytecode accelerators, JIT synthesis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.