SPREE Tutorial Peter Yiannacouras April 13, 2006
Processors on FPGAs You all used FPGAs (ECE241) Adders 7-segment decoders Etc. We are putting whole microprocessors on them We call these soft processors
Hard Versus Soft Processors Soft Processor Written in HDL Programmed onto chip Hard Processors Made of transistors Costs millions to make Verilog Faster Smaller Less Power
Processors and FPGA Systems We aim to improve soft processors by customizing them FPGAs are a common platform for digital systems Memory Interface UART Custom Logic Ethernet Performs coordination and even computation Better processors => less hardware to design Soft Processor
Our Research Problem Soft processors have worse Area Speed Power But are Flexible use to counteract HOW??? Customize the processor’s architecture ie. Intel vs AMD ie. Motorola vs HOW????
Research Goals 1. Understand tradeoffs in soft processors Eg. A hardware multiplier is big but can perform multiplies fast 2. Customize it to the application Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area We developed SPREE, software to help us do both
SPREE SPREE System (Soft Processor Rapid Exploration Environment) Verilog ISADatapath ■ Input: Processor description 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■ SPREE System ■ Output: Synthesizable Verilog Processor Description
Input: Instruction Set Architecture (ISA) Description SPREE Verilog ■ ISA ■ Datapath FETCH RFREAD ADD RFWRITE RFREAD MIPS ADD – add rd, rs, rt ■ Graph of Generic Operations (GENOPs) ■ Edges indicate flow of data ISA currently fixed (subset of MIPS I)
Input: Datapath Description SPREE RTL ■ ISA ■ Datapath Mul IfetchReg File ALU Write Back Mul IfetchReg File ALU Shifter Data Mem SPREE Component Library Mul Ifetch Reg file ALU Write Back Data Mem ■ Interconnection of hand-coded components ■ Allows efficient synthesis ■ Described using C++
Component Selection Select by name Names looked up in library Stored in cpugen/rtl_lib RTLComponent *ifetch=new RTLComponent("ifetch"); RTLComponent *reg_file=new RTLComponent("reg_file");
Datapath Wiring Example rd rs rt offset Ifetch dst a_reg a_data b_reg b_data writedata Regfile proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg"); opA result opB ALU
SPREE generator (spegen) SPREE System + Backend (Soft Processor Rapid Exploration Environment) Verilog Processor Description 1. Area 2. Clock Frequency 3. Power 4. Cycle Count Quartus II CAD Software (specadflow) Modelsim Verilog Simulator (spebenchmark) Benchmarks Mint MIPS Simulator (simulator/run) Compare traces
Walking through an Example (see README.txt) Choose a pre-built processor cpugen/src/arch lists all the processors Let’s choose pipe3_serialshift 3-stage pipeline with serial shifter
Using SPREE on a Processor Generate, benchmark, synthesize % spegen pipe3_serialshift % spebenchmark pipe3_serialshift % specadflow pipe3_serialshift % specompare pipe3_serialshift ← Generates Verilog ← Runs benchmarks ← Synthesizes processor ← Display results
spegen – Generating Processors Input: Processor description Syntax: spegen Output: A folder named after the processor Hand-coded Verilog modules system.v Generated hookup and control OUT.cpugen stages per instruction Hazard window/branch penalty test_bench.v test bench for Modelsim simulation
Benchmarking Run programs on the processor Measure time taken till completion Verify functionality Can do this without knowing anything about the benchmarks themselves
spebenchmark – Benchmarking Input: Processor implementation Syntax: spebenchmark Output: (ideally) Cycle counts of all benchmarks Traces: /tmp/modelsim_trace.txt ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort... Success! Cycle count=2994 Simulating crc... Success! Cycle count= Simulating des... Success! Cycle count=5129 Simulating fft... Success! Cycle count=5077 Simulating fir... Success! Cycle count=
Benchmarking – under the hood Modelsim Verilog Simulator (spebenchmark) Compiler (gcc - MIPS) Mint MIPS Simulator (simulator/run) Compare traces Verilog Binary Executable C source benchmarks Trace Cycle Count /tmp/modelsim_trace.txt /tmp/modelsim_store_trace.txt applications/ /mint spebenchmark
specompiler - Setup compiler Choose the path to your compiler (prebuilt) Default: /jayar/b/b0/yiannac/spe/compiler GCC 3.3.3, software division Another: /jayar/b/b0/yiannac/spe/compiler-softmul GCC 3.3.3, software division and software multiplication specompiler will: 1. Compile all benchmarks (and store binaries) 2. Simulate all benchmarks (and store traces) % specompiler /jayar/b/b0/yiannac/spe/compiler-softmul After this point, you can just run spebenchmark
spebenchmark - failure Shows discrepancy between MINT and Modelsim ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort... Error: Trace does not match, Cycle count=381 Discrepancy found at ps Modelsim: PC= | IR= | 05: Mint: PC=040000b8 | IR=8c47004c | 07: destination register value being written Clues to where the error occurred
spebenchmark - waveforms Can see any signal within the processor % sim_gui bubble_sort pipe3_serialshift
Modelsim LEARN IT!!! Quartus Simulator is vastly inferior, and even unusable for our purposes
The Testbench (test_bench.v) What is it? The stimulus and monitor for your circuit SPREE automatically generates And hence it works right away Handcoding your own processor means You have to interface with the test bench Once you have the testbench you can use spebenchmark
Manual Interfacing with the Testbench test_bench.v regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data Your soft processor Need only 6 wires To track writes to register file and data mem
specadflow – Synthesis Input: Processor implementation Syntax: specadflow Performs a “seed sweep” Average several runs since results are noisy Run several instances of quartus Across several machines in parallel
specadflow Output Output: Synthesis results (hidden) Summary output Started Tue 6:27PM, Waiting for processes: Finished Tue 6:33PM Waiting on eda writer Area (LEs or ALUTs) Clock Frequency (MHz) Estimated Energy/cycle dissipated (nJ/cycle)
Any Questions? Technical support, ask me
Setup/Install Copy and unpack the SPREE tarball: /jayar/b/b0/yiannac/spree.tar.gz Build all the SPREE software Follow instructions in INSTALL.txt If there’s any errors, me % cd spree % make
SPREE Directory Structure spree applicationscpugen modelsim quartussimulatorcompiler Benchmarks C source binutils gcc newlib the cpu generator + processor descriptions Verilog simulator MIPS simulator synthesis
Setup cluster Choose the cluster you’re using aenao – high performance, limited access eecg – any eecg-connected machine Edit quartus/machines.txt Put a list of 11 or so good eecg machines % specluster eecg% specluster aenao OR