ARMOR Asynchronous RISC Microprocessor הטכניון - מכון טכנולוגי לישראל המעבדה למערכות ספרתיות מהירות הפקולטה להנדסת חשמל Submitted by: Tziki Oz-Sinay, Ori Lempel Supervised by: Rony Mitleman Final Presentation (First Semester)
General Overview The benefits of asynchronous VLSI circuit design include: Elimination of clock skew problems Average case performance Adaptivity to processing and environmental variations Lower system power requirement Reduced noise
Project Description ARMOR core SDRAM 64KB SDRAM 64KB PCI Interface Data Cache Inst Cache Watch Window (debug) Program Code (assembler) Xilinx VertexPro
ARMOR Architecture Register Set: The ARMOR provides 8 16-bit general-purpose data registers Memory Management: –Separate instruction memory and data memory, each having an address space of up to 64Kbytes. –Both memory spaces are layed out in Little-Endian format.
Instruction Set OpCode 4 RxRyImm 336 OpCode 4 RxImm 39 OpCode 4 RxRy 336 OpCode 4 Imm 12 mov Rx, Ry add Rx, Ry sub Rx, Ry or, Rx, Ry and Rx, Ry movi Rx, Imm addi Rx, Imm subi Rx, Imm lw Rx, Ry, Imm sw Rx, Ry, Imm bez Rx, Ry, Imm jump Imm
ARMOR Pipeline Instruction Fetch Decode Rename Mem Access Write Back Execute Retire PC[15:0] Inst[15:0] Op[3:0] LDst[3:0] LSrc[3:0] Imm[15:0] Op[3:0] PDst[4:0] SrcVal1[15:0 ] SrcVal2[15:0] Imm[15:0] DataIn[15:0] PDst[4:0] Addr[15:0] ReadWrite# ALU0PDst[4:0] ALU0Res[15:0] ALU1PDst[4:0] ALU1Res[15:0] MemPDst[4:0] DataOut[15:0] LDst[2:0] Val15:0] Op[3:0] PDst[4:0] SrcVal1[15:0 ] SrcVal2[15:0] Imm[15:0] BranchDecision Out Of Order Engine SYNCHRONIZATION SDRAM Addr[15:0] DataIn[15:0] DataOut[15:0] SDRAM
Out-Of-Order Engine ROB RRF RAT RS0RS1 ALU0ALU1 DATA CACHE BranchDecision to IFU Inst from ID In Order Out of Order branches non-mem inst mem inst non-branch inst
Development Platform Two development environments were examined: Balsa – a language for synthesising large asynchronous circuits and systems, compiles to a small, parametric, set of handshake components. Petrify - a synthesis tool for Petri Nets and asynchronous controllers The Balsa environment was chosen
import [balsa.types.basic] type word is 16 bits procedure buffer (input i : word; output o : word) is variable x : word begin loop i -> x ;-- Input communication o <- x-- Output communication end library mechanism type declaration channel declarations procedure definition implies latch repeat forever output local variable x to output channel read input channel into local variable x sequential operation Example: Single Place Buffer
Single-place buffer # x T ; T io activation channel repeater sequencer variable transferrer Buffer Handshake Circuit
# Single-place buffer Repeater is activated x T ; T io Buffer Handshake Circuit
; # Single-place buffer Sequencer handshakes to left transferrer x TT io Buffer Handshake Circuit
; # Single-place buffer Transferrer requests data from environment x TT io Buffer Handshake Circuit
x ; # Single-place buffer Data transferred to variable x TT io Buffer Handshake Circuit
x ; # Single-place buffer Variable handshake completes TT io Buffer Handshake Circuit
x ; # Single-place buffer Transferrer handshake completes to environment TT io Buffer Handshake Circuit
x ; # Single-place buffer Transferrer handshake completes TT io Buffer Handshake Circuit
x ; # Single-place buffer Sequencer handshakes to right transferrer TT io Buffer Handshake Circuit
x ; # Single-place buffer Transferrer reads variable TT io Buffer Handshake Circuit
x ; # Single-place buffer Transferrer outputs to environment TT io Buffer Handshake Circuit
x ; # Single-place buffer Sequencer initiated handshakes complete TT io Buffer Handshake Circuit
x ; # Single-place buffer Sequencer completes its activation handshake TT io Buffer Handshake Circuit
Single-place buffer Repeater initiates another transfer, repeat x ; # TT io Buffer Handshake Circuit
repeater sequencer transferrers register internal channel names I/O ports 8-Bit Buffer Handshake Diagram
ARMOR IFU Handshake Diagram
ARMOR IFU Balsa Simulation
Milestones Reached Thorough ramp-up made on asynchronous circuit design – algorithms and methodologies Development platform selected –Balsa over Petrify ARMOR architecture defined Micro-Architecture Specification (MAS) completed –functional block partition, datapath interface defined –asynchronous handshaking protocol defined Detailed asynchronous pseudo-code implementation written Balsa code writing, dynamic simulation and synthesis started –IFU, ID, ALU completed
What’s The Holdup ? Original plan was to demonstrate a complete (in- order) datapath flow through the pipeline on silicon Up until now, we have not been able to burn any Balsa generated netlists (translated to both EDIF and Verilog formats) on the VertexPro Root cause – we are missing several Balsa standard-cell library files on which the netlists are based. FPGA burning cannot be achieved until we gain access to these files !!!
Plans for Second Semester Our tasks include (in order of priority): Completion of Balsa coding for the ARMOR's core Dynamic simulation (and validation) of each one of the ARMOR's logical units and (if possible) of the full chip Writing the ARMOR's compiler (to be implemented in C) Burning of the ARMOR core on the VertexPro FPGA and connection to the outside (synchronized) world via the code/data SDRAM modules