Bluespec SystemVerilog™

Slides:



Advertisements
Similar presentations
Bus Specification Embedded Systems Design and Implementation Witawas Srisa-an.
Advertisements

Simulation executable (simv)
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
CS-334: Computer Architecture
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Part A Final Presentation.
Group 7 Jhonathan Briceño Reginal Etienne Christian Kruger Felix Martinez Dane Minott Immer S Rivera Ander Sahonero.
OCP: Open Core Protocol Marta Posada ESA/ESTEC June 2006.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
MICROPROCESSOR INPUT/OUTPUT
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
Top Level View of Computer Function and Interconnection.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Electrocardiogram (ECG) application operation – Part B Performed By: Ran Geler Mor Levy Instructor:Moshe Porian Project Duration: 2 Semesters Spring 2012.
EEE440 Computer Architecture
Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
Constructive Computer Architecture Tutorial 2 Advanced BSV Sizhuo Zhang TA Oct 9, 2015T02-1http://csg.csail.mit.edu/6.175.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
SVA Encapsulation in UVM enabling phase and configuration aware assertions by Mark Litterick Verification Consultant Verilab GmbH, Munich, Germany.
Input Output Techniques Programmed Interrupt driven Direct Memory Access (DMA)
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Caches-2 Constructive Computer Architecture Arvind
ATLAS Pre-Production ROD Status SCT Version
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
SOFTWARE DESIGN AND ARCHITECTURE
Direct Attached Storage and Introduction to SCSI
Bluespec-6: Modeling Processors
Direct Memory address and 8237 dma controller LECTURE 6
DMA CONTROLLER 8257 Features: It is a 4-channel DMA.
Chapter 3 Top Level View of Computer Function and Interconnection
COMP2121: Microprocessors and Interfacing
Getting Started with Programmable Logic
Constructive Computer Architecture Tutorial 7 Final Project Overview
Computer System Overview
Burst read Valid high until ready high
Module 2: Computer-System Structures
ECEG-3202 Computer Architecture and Organization
Remote Management of the Field Programmable Port Extender (FPX)
Caches-2 Constructive Computer Architecture Arvind
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Created by Vivi Sahfitri
Module 2: Computer-System Structures
ECE 352 Digital System Fundamentals
Modeling Processors Arvind
Modular Refinement Arvind
Programmable Interrupt Controller (PIC)
Module 2: Computer-System Structures
Module 2: Computer-System Structures
William Stallings Computer Organization and Architecture 7th Edition
Caches-2 Constructive Computer Architecture Arvind
Presentation transcript:

Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface

Overview A DMA controller using Bluespec is presented Including several examples showing one possible refinement flow First model shows a simple 1 channel, 1 port model (222 lines) Final model shows a 2 channel, 2 port model, with pipelined and concurrent transactions (308 lines) Testbenches included for all models 

General Setup Testbench DMA Target master slave master slave

DMA All DMA models contain a configuration port (slave), to read & write configuration and status registers One or two memory ports are included Configuration/Status registers Source and destination address register Transfer count Enable and current status Port selection

File Organization DMA.bsv – The DMA model TestBench.bsv – The testbench sysTestBenchBench.out.expected – expected simulations results Socket_IFC.bsv – defines a simple socket protocol, interfaces, structures, and utility functions. EdgeFIFOs.bsv – some specialized FIFOs used throughout the designs Targets.bsv – A simple target module used for testing Makefile – usual

Socket Interface We use a simple Socket interface which is similar to OCP Request and Responses are decoupled Several requests may be outstanding (pipelined)

Socket Interface Target Initiator Master interface Slave interface reqOp reqAddr reqData reqInfo reqAccept Request side Target Initiator respOp respAddr respData respInfo respAccept Response side

Socket_Ifc.bsv Defines structures for Request and Responses Interfaces for the Master and Slave typedef struct { RespOp respOp; RespInfo respInfo; RespAddr respAddr; RespData respData; } Socket_Resp interface Socket_master_req_ifc; method ReqOp getReqOp (); method ReqInfo getReqInfo (); method ReqAddr getReqAddr (); method ReqData getReqData (); method Action reqAccept (); endinterface

Socket_Ifc.bsv Conversion functions from FIFO interfaces to Interface Utilities to connect Master and slave interface Convenience functions for debug function Socket_master_ifc fifos_to_master_ifc (FIFOF#(Socket_Req) reqs, FIFOF#(Socket_Resp) resps);

Edge FIFOs Specialized FIFOs Pipeline FIFOs – a pipeline register with a FIFO interface. Bypass FIFOs – a 1 element FIFO which allows non-registered operations Unguarded versions – Disables Bluespec’s implicit conditions; needed for socket protocol to provide data every cycle

Targets Contains just a “dummy” target to act like a “memory” for testing Minimum 2 cycle latency requests Rules for Read and Write requests Slave responses

Model development details V0 – Simple FSM based DMA controller. 1 channel, 1 bus V1 – Rule-based DMA controller, allowing pipelined requests V2 – 2 Memory port, allowing pipelined concurrent read and write request V3 – Modified version of V2 showing Bluespec’s elaboration features V4 – 2 channels, 2 ports, concurrent and pipelined transactions

V0 – simple DMA FSM Slave Master config Rules for Read and Write config/status resisters Config requests Slave Config responses Idle Write Finish Read mmu Pipeline FIFO Master Non-idle arc write a mmu request or read a response Bypass FIFO

VO DMA behavior 8 cycles to move each word 2 cycle in memory 2 cycle in DMA (enqueue request, and grad data) cycle: 25 Target mem: Socket_Req{RD, 001, 001001, 0000000000000000}} cycle: 26 cycle: 27 cycle: 28 cycle: 29 Target mem: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 30 cycle: 31 cycle: 32 cycle: 33 Target mem: Socket_Req{RD, 001, 001002, 0000000000000000}}

V0 thoughts Most cycles are spent waiting for the mmu to respond Read requests cannot overlap with write requests V1 decouples all these activities Read and write requests start anytime one is needed (and all pre-conditions are met) Responses are taken and acted upon when they arrive.

DMA V1 Each read request passes the write address to the write side via a FIFO Reads or Writes can start at any time rule startRead (dmaEnabledR && readCntrR > currentReadR ) ; let req = Socket_Req {reqAddr : readAddrR, reqData : 0, reqOp : RD, reqInfo : 1}; mmuReqF.enq( req ) ; // Enqueue the Write destination address destAddrF.enq( destAddrR ) ; // increment addresses, decrement the counter. readAddrR <= readAddrR + 1 ; currentReadR <= currentReadR + 1 ; destAddrR <= destAddrR + 1 ; endrule

V1 DMA write rule Implicit conditions check FIFO states as needed Rule urgency puts writes before reads (* descending_urgency = "startWrite, startRead" *) rule startWrite ( True ) ; let wreq = Socket_Req {reqAddr : destAddrF.first, reqData : responseDataF.first, reqOp : WR, reqInfo : 2 }; // tag info with 2 // enqueue the request. mmuReqF.enq( wreq ) ; // remove wdata from the fifos destAddrF.deq ; responseDataF.deq ; endrule

V1 Behavior Fully pipelined behavior Achieves maximum throughput - 2 cycles per word Target mem: Socket_Req{RD, 001, 001000, 0000000000000000}} cycle: 18 Target mem: Socket_Req{RD, 001, 001001, 0000000000000000}} cycle: 19 Target mem: Socket_Req{RD, 001, 001002, 0000000000000000}} cycle: 20 Target mem: Socket_Req{RD, 001, 001003, 0000000000000000}} cycle: 21 Target mem: Socket_Req{WR, 002, 005000, 0000100000001000}} cycle: 22 Target mem: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 23 Target mem: Socket_Req{WR, 002, 005002, 0000100200001002}} cycle: 24 Target mem: Socket_Req{WR, 002, 005003, 0000100300001003}} cycle: 25 Target mem: Socket_Req{RD, 001, 001004, 0000000000000000}}

V2 – A second memory port Port can be second memory, bus, or peripheral, separate read/write ports. Hardware additions: Second master interface to DMA New FIFOs for interface Configuration register to mark port Duplicated rules for each port Bluespec’s Rule analysis insures safe use of shared hardware – MMUs, FIFOs, etc. Muxes and control logic added automatically

V2 DMA behavior Concurrent read and writes on different memories Peak throughput – 1 word per cycle Pipeline behavior maintained cycle: 23 Target memA: Socket_Req{WR, 002, 005000, 0000100000001000}} cycle: 24 Target memB: Socket_Req{RD, 001, 001004, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005001, 0000100100001001}} cycle: 25 Target memB: Socket_Req{RD, 001, 001005, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005002, 0000100200001002}} cycle: 26 Target memB: Socket_Req{RD, 001, 001006, 0000000000000000}} Target memA: Socket_Req{WR, 002, 005003, 0000100300001003}} cycle: 27 Target memB: Socket_Req{RD, 001, 001007, 0000000000000000}}

V3 – Bluespec Elaboration Bluespec allow manipulation of most objects (e.g., Rules, FIFOs) during elaboration Reduces cut & paste code, allows better reuse V3 defines function to generate rules for each mmu port function Rules generatePortDMARules (Bool rdPortCond, Bool wrPortCond, FIFOF#(Socket_Req) requestF, FIFOF#(Socket_Resp) responseF );

V3 Results Same behavior as V2 Real difference is about 26 lines of code out of 227 lines. Less than 10 % for a second mmu port.

V4 – multiple channels Each channel is separate DMA engine – has it own read/write address, config/status registers, etc. V4 model changes constructs from scalar to vector, e.g. Rule generation function used to create second set of rule second channel Bluespec’s rules manages concurrency // The destination address registers Vector#(NumChannels,Reg#(ReqAddr)) destAddrRs <- replicateM( mkReg(0) );

V4 Behavior Multiple channels, multiple ports, fully pipelined, concurrent reads and writes across channels Target memB: Socket_Req{RD, 000, 001026, 0000000000000000}} Target memA: Socket_Req{WR, 001, 005023, 0000102300001023}} cycle: 135 Target memB: Socket_Req{RD, 000, 001027, 0000000000000000}} Target memA: Socket_Req{RD, 002, 002017, 0000000000000000}} cycle: 136 Target memB: Socket_Req{WR, 003, 007016, 0000201600002016}} Target memA: Socket_Req{WR, 001, 005024, 0000102400001024}}

Summary Concurrency automatically analyzed and control logic automatically synthesized 4 unique DMA architectures minimum development effort – compare lines of code Allow rapid exploration and analysis of different architectures V0 V1 V2 V3 V4 222 227 298 266 308