SPREE Tutorial Peter Yiannacouras April 13, 2006.

Slides:



Advertisements
Similar presentations
VHDL Design of Multifunctional RISC Processor on FPGA
Advertisements

Adding the Jump Instruction
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
Lab Assignment 2: MIPS single-cycle implementation
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Processor Technology and Architecture
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Configurable System-on-Chip: Xilinx EDK
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Altera’s Quartus II Installation, usage and tutorials Gopi Tummala Lab/Office Hours : Friday 2:00 PM to.
CPEN Digital System Design Chapter 10 – Instruction SET Architecture (ISA) © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall.
CS / Schlesinger Lec1.1 1/20/99©UCB Spring 1999 Computer Architecture Lecture 1 Introduction and Five Components of a Computer Spring, 1999 Arie Schlesinger.
Shift Instructions (1/4)
ECE Department: University of Massachusetts, Amherst Lab 1: Introduction to NIOS II Hardware Development.
Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
 Purpose of our project  Get real world experience in ASIC digital design  Use same tools as industry engineers  Get practical experience in microprocessor.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Data Parallel FPGA Workloads: Software Versus Hardware Peter Yiannacouras J. Gregory Steffan Jonathan Rose FPL 2009.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Organization and Design Computer Abstractions and Technology
General Concepts of Computer Organization Overview of Microcomputer.
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
Computer Architecture CPSC 350
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
ECE 445 – Computer Organization
TEAM FRONT END ECEN 4243 Digital Computer Design.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Microarchitecture. Outline Architecture vs. Microarchitecture Components MIPS Datapath 1.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
Teaching Digital Logic courses with Altera Technology
COM181 Computer Hardware Lecture 6: The MIPs CPU.
M211 – Central Processing Unit
1 VHDL & Verilog Simulator. Modelsim. 2 Change the directory to where your files exist (All of the files must be in a same folder). Modelsim.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Introduction to the FPGA and Labs
CS161 – Design and Architecture of Computer Systems
Lecture 18: Pipelining I.
ECE354 Embedded Systems Introduction C Andras Moritz.
Application-Specific Customization of Soft Processor Microarchitecture
Morgan Kaufmann Publishers
Performance of Single-cycle Design
Morgan Kaufmann Publishers
Processor (I).
Computer Architecture CSCE 350
Serial versus Pipelined Execution
Pipelining in more detail
The Processor Lecture 3.4: Pipelining Datapath and Control
Guest Lecturer TA: Shreyas Chand
Systems Architecture I
Levels in Processor Design
ARM ORGANISATION.
Computer Architecture
Levels in Processor Design
Application-Specific Customization of Soft Processor Microarchitecture
Presentation transcript:

SPREE Tutorial Peter Yiannacouras April 13, 2006

Processors on FPGAs You all used FPGAs (ECE241) Adders 7-segment decoders Etc. We are putting whole microprocessors on them We call these soft processors

Hard Versus Soft Processors Soft Processor Written in HDL Programmed onto chip Hard Processors Made of transistors Costs millions to make Verilog Faster Smaller Less Power

Processors and FPGA Systems We aim to improve soft processors by customizing them FPGAs are a common platform for digital systems Memory Interface UART Custom Logic Ethernet Performs coordination and even computation Better processors => less hardware to design Soft Processor

Our Research Problem Soft processors have worse  Area  Speed  Power But are Flexible use to counteract HOW??? Customize the processor’s architecture ie. Intel vs AMD ie. Motorola vs HOW????

Research Goals 1. Understand tradeoffs in soft processors Eg. A hardware multiplier is big but can perform multiplies fast 2. Customize it to the application Eg. Bubble sort doesn’t use multiplies, therefore remove hardware multiplier and save on area We developed SPREE, software to help us do both

SPREE SPREE System (Soft Processor Rapid Exploration Environment) Verilog ISADatapath ■ Input: Processor description 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■ SPREE System ■ Output: Synthesizable Verilog Processor Description

Input: Instruction Set Architecture (ISA) Description SPREE Verilog ■ ISA ■ Datapath FETCH RFREAD ADD RFWRITE RFREAD MIPS ADD – add rd, rs, rt ■ Graph of Generic Operations (GENOPs) ■ Edges indicate flow of data ISA currently fixed (subset of MIPS I)

Input: Datapath Description SPREE RTL ■ ISA ■ Datapath Mul IfetchReg File ALU Write Back Mul IfetchReg File ALU Shifter Data Mem SPREE Component Library Mul Ifetch Reg file ALU Write Back Data Mem ■ Interconnection of hand-coded components ■ Allows efficient synthesis ■ Described using C++

Component Selection Select by name Names looked up in library Stored in cpugen/rtl_lib RTLComponent *ifetch=new RTLComponent("ifetch"); RTLComponent *reg_file=new RTLComponent("reg_file");

Datapath Wiring Example rd rs rt offset Ifetch dst a_reg a_data b_reg b_data writedata Regfile proc.addConnection(ifetch,"rs",reg_file,"a_reg"); proc.addConnection(ifetch,"rt",reg_file,"b_reg"); opA result opB ALU

SPREE generator (spegen) SPREE System + Backend (Soft Processor Rapid Exploration Environment) Verilog Processor Description 1. Area 2. Clock Frequency 3. Power 4. Cycle Count Quartus II CAD Software (specadflow) Modelsim Verilog Simulator (spebenchmark) Benchmarks Mint MIPS Simulator (simulator/run) Compare traces 

Walking through an Example (see README.txt) Choose a pre-built processor cpugen/src/arch lists all the processors Let’s choose pipe3_serialshift 3-stage pipeline with serial shifter

Using SPREE on a Processor Generate, benchmark, synthesize % spegen pipe3_serialshift % spebenchmark pipe3_serialshift % specadflow pipe3_serialshift % specompare pipe3_serialshift ← Generates Verilog ← Runs benchmarks ← Synthesizes processor ← Display results

spegen – Generating Processors Input: Processor description Syntax: spegen Output: A folder named after the processor Hand-coded Verilog modules system.v Generated hookup and control OUT.cpugen stages per instruction Hazard window/branch penalty test_bench.v test bench for Modelsim simulation

Benchmarking Run programs on the processor Measure time taken till completion Verify functionality Can do this without knowing anything about the benchmarks themselves

spebenchmark – Benchmarking Input: Processor implementation Syntax: spebenchmark Output: (ideally) Cycle counts of all benchmarks Traces: /tmp/modelsim_trace.txt ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort... Success! Cycle count=2994 Simulating crc... Success! Cycle count= Simulating des... Success! Cycle count=5129 Simulating fft... Success! Cycle count=5077 Simulating fir... Success! Cycle count=

Benchmarking – under the hood Modelsim Verilog Simulator (spebenchmark) Compiler (gcc - MIPS) Mint MIPS Simulator (simulator/run) Compare traces  Verilog Binary Executable C source benchmarks Trace Cycle Count /tmp/modelsim_trace.txt /tmp/modelsim_store_trace.txt applications/ /mint spebenchmark

specompiler - Setup compiler Choose the path to your compiler (prebuilt) Default: /jayar/b/b0/yiannac/spe/compiler GCC 3.3.3, software division Another: /jayar/b/b0/yiannac/spe/compiler-softmul GCC 3.3.3, software division and software multiplication specompiler will: 1. Compile all benchmarks (and store binaries) 2. Simulate all benchmarks (and store traces) % specompiler /jayar/b/b0/yiannac/spe/compiler-softmul After this point, you can just run spebenchmark

spebenchmark - failure Shows discrepancy between MINT and Modelsim ******* Benchmarking pipe3_serialshift ******** Simulating bubble_sort... Error: Trace does not match, Cycle count=381 Discrepancy found at ps Modelsim: PC= | IR= | 05: Mint: PC=040000b8 | IR=8c47004c | 07: destination register value being written Clues to where the error occurred

spebenchmark - waveforms Can see any signal within the processor % sim_gui bubble_sort pipe3_serialshift

Modelsim LEARN IT!!! Quartus Simulator is vastly inferior, and even unusable for our purposes

The Testbench (test_bench.v) What is it? The stimulus and monitor for your circuit SPREE automatically generates And hence it works right away Handcoding your own processor means You have to interface with the test bench Once you have the testbench you can use spebenchmark

Manual Interfacing with the Testbench test_bench.v regfile_we regfile_dst regfile_data datamem_we datamem_addr datamem_data Your soft processor Need only 6 wires To track writes to register file and data mem

SPREE generator (spegen) SPREE System + Backend (Soft Processor Rapid Exploration Environment) Verilog Processor Description 1. Area 2. Clock Frequency 3. Power 4. Cycle Count Quartus II CAD Software (specadflow) Modelsim Verilog Simulator (spebenchmark) Benchmarks Mint MIPS Simulator (simulator/run) Compare traces 

specadflow – Synthesis Input: Processor implementation Syntax: specadflow Performs a “seed sweep” Average several runs since results are noisy Run several instances of quartus Across several machines in parallel

specadflow Output Output: Synthesis results (hidden) Summary output Started Tue 6:27PM, Waiting for processes: Finished Tue 6:33PM Waiting on eda writer Area (LEs or ALUTs) Clock Frequency (MHz) Estimated Energy/cycle dissipated (nJ/cycle)

Any Questions? Technical support, ask me

EXTRAS

Setup/Install Copy and unpack the SPREE tarball: /jayar/b/b0/yiannac/spree.tar.gz Build all the SPREE software Follow instructions in INSTALL.txt If there’s any errors, me % cd spree % make

SPREE Directory Structure spree applicationscpugen modelsim quartussimulatorcompiler Benchmarks C source binutils gcc newlib the cpu generator + processor descriptions Verilog simulator MIPS simulator synthesis

Setup cluster Choose the cluster you’re using aenao – high performance, limited access eecg – any eecg-connected machine Edit quartus/machines.txt Put a list of 11 or so good eecg machines % specluster eecg% specluster aenao OR