Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing Customized ISA Processors using High Level Synthesis

Similar presentations


Presentation on theme: "Designing Customized ISA Processors using High Level Synthesis"— Presentation transcript:

1 Designing Customized ISA Processors using High Level Synthesis
Sam Skalicky, Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2 Outline Motivation Background Our Approach Implementation Flow
Experiments & Results Conclusion

3 Motivation Wide availability of soft processors
MB, NIOS, MIPS, custom, etc… Reconfigurable logic allows for extreme configurability Processor configurability, not enough Pipeline stages, cache, mult/div, float, peripherals Typical app utilizes subset of all ISA instructions MB ISA (144), MIPS (153) Kernels (Linear algebra, encryption) use less than 20

4 Background Classic processor design
Low level: HDL High level: LISA, Lava, Bluespec, Chisel C/C++ processor simulators available MIPS => SPIM, MB => ISS, … High level synthesis tools much more capable VivadoHLS, LegUp, ImpulseC, Synphony, …

5 Our Approach Take C/C++ processor simulator
Implements ISA Only necessary instructions Produce HDL implementation using HLS Customize the implementation using directives Pipelining, register partitioning, etc…

6 Implementation Flow

7 Sample Architecture C/C++
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; unsigned PC = 0; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; }

8 Sample Architecture HDL

9 Experiments Create customized processors from C MIPS simulator code (based on SPIM) Kernels: Dot product & AES Apply HLS directives to improve design Compare to common soft processors Using Xilinx VivadoHLS & Vivado tools Goal: evaluate this approach in terms of ease of use, resource utilization, performance

10 Experiments

11 Experiments Analyze kernel code to determine which instructions to implement Customize architecture code Base HLS design Apply HLS directives Improved HLS design Compare results

12 Experiments - directives
Partition void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Pipeline

13 Results – General Observations
Base processors were multi-cycle Pipelined processors were not fully pipelined Initiation interval > 1 due to hazards VivadoHLS only stalls on top level interfaces (FIFO) Registers implemented as BRAM Separate function units (no ALU)

14 Results – Dot product _________
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Base design – no directives 3-9 cycles per instruction All functional units pipelined

15 Results – Dot product _________
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Pipeline Improved design – pipelining 8 cycles per instruction 4 cycle initiation interval 2 simultaneous instructions

16 Results – AES ___ Base design – no directives
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Base design – no directives 3-4 cycles per instruction All functional units combinational Bit manipulations

17 Results – AES ___ Improved design – pipelining
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Pipeline Improved design – pipelining 4 cycles per instruction 3 cycle initiation interval 2 simultaneous instructions

18 Results – AES ___ Improved design – pipelining & register partitioning
void datapath(unsigned IM[], unsigned DM[]) { int reg[32]; main loop:while(true) { unsigned instr = IM[PC]; switch(OPCODE(instr)) { case 0x00: //Begin 0x00 R−Type switch(FUNCT(instr)) { #ifdef ADD INST case 0x20: reg[RD(instr)] = reg[RS(instr)] + reg[RT(instr)]; break; #endif } break; //End 0x00 R−Type #ifdef LW INST case 0x23: reg[RT(instr)] = DM[reg[RS(instr)] + IMM(instr)]; #endif #ifdef SW INST case 0x2b: DM[reg[RS(instr)] + IMM(instr)] = reg[RT(instr)]; #endif } //End Instruction decoding PC += 1; } Partition Pipeline Improved design – pipelining & register partitioning 3 cycles per instruction 2 cycle initiation interval 2 simultaneous instructions

19 Results Custom ISA HLS designs Existing soft processors: MB, MIPSfpga
Separate processor for each kernel & combined Existing soft processors: MB, MIPSfpga MB: minimal & standard Implemented using Vivado Digilent Nexys4 board, Artix-7 100T

20 Results

21 Results v1 – base v2 – pipelining
v3 – pipelining & ____register partitioning std – standard min – minimal

22 Results - Summary Execution time was never more than 2.2x MB
Base designs used 3x less resources than minimal MB, 6x less than standard MB Dot product used more FFs, similar LUTs, similar slices Pipelining improve performance to 1.7x MB for DP & 1.5x for AES Combined design was limited by DP instructions (MULT)

23 Conclusion Presented an approach for designing custom ISA processors, HLS for ease of use HLS produced HDL used 1/6th resources of standard MicroBlaze processor Reasonable trade-off in terms of performance Very minimal user effort required In resource limited designs, customized soft processors can be produced quickly and easily

24


Download ppt "Designing Customized ISA Processors using High Level Synthesis"

Similar presentations


Ads by Google