Download presentation
Presentation is loading. Please wait.
Published byTerence Harrington Modified over 6 years ago
1
Introduction to VLSI Programming Lecture 7: Introduction to the DLX
(course 2IN30) Prof. dr. ir.Kees van Berkel
2
VLSI programming for … Low costs: introduce resource sharing.
Low delay (high throughput): introduce parallelism. Low energy (low power): reduce activity; … 2/27/2019 Kees van Berkel
3
VLSI programming for low costs
Keep it simple!! Introduce resource sharing: commands, auxiliary variables, expressions, operators. Enable resource sharing, by: reducing parallelism making similar commands equal 2/27/2019 Kees van Berkel
4
Procedure definition vs declaration
Procedure definition: P = proc (). S provides a textual shorthand (expansion) each call generates copy of resource, i.e. no sharing Procedure declaration: P : proc (). S defines a sharable resource each call generates access to this resource 2/27/2019 Kees van Berkel
5
Hints and Tips: optimization
When asked to optimize for area (low cost) it is allowed to invest time (execution time, extra iterations, …) When asked to optimize for speed, it is allowed to invest area (pipeline stages, parallelism, …) 2/27/2019 Kees van Berkel
6
Hints and Tips: a known bug
Statement of form if –x then S0 else S1 fi During simulation wrong alternative is selected (e.g. S0 when x = true) Work around: remove negation: if x then S1 else S0 fi 2/27/2019 Kees van Berkel
7
Instruction Set Architecture
ISA is interface between hardware and software. Hence, a good ISA: allows easy programming (compilers, OS, ..); allows efficient implementations (hardware); has a long lifetime (survives many HW generations); is general purpose. 2/27/2019 Kees van Berkel
8
ISA classification Code sequence for C:= A+B 2/27/2019 Kees van Berkel
9
Reduced Instruction Set Computer
1980: Patterson and Ditzel: “The Case for RISC” fixed 32-bit instruction set, with few formats load-store architecture large register bank (32 registers), all general purpose On processor organization: hard-wired decode logic pipelined execution single clock-cycle execution 2/27/2019 Kees van Berkel
10
RISC processors Advantages: smaller die size (single chip processor)
shorter development time (simplicity) higher performance Disadvantages: poor code density cannot execute X86 code 2/27/2019 Kees van Berkel
11
A “Typical” RISC 32-bit instructions, 3 fixed formats
32 general purpose registers, 32-bit 3 address arithmetic instructions, reg-reg single address mode for load/store: “address + displacement” simple branch conditions; delayed branch 2/27/2019 Kees van Berkel
12
DLX (“Deluxe”) (AMD 29K + DECstation HP850 + IBM801 + Intel i860 + MIPS M/120A + MIPS M/ Motorola 88K + RISC I + SGI 4D/60 + SPARCstation-1 + Sun 4/110 + Sun-4/260) / 13 = DLX Other RISC examples include: Cray-1,2,3, AMD2900, DEC Alpha, ARM. 2/27/2019 Kees van Berkel
13
DLX instruction formats
, , , , Opcode Reg-reg ALU operations rs1 rd rs2 function R-type Opcode loads, stores, conditional branch, .. rs1 rd Immediate I-type offset Opcode Jump, jump and link, trap, return from exception J-type 2/27/2019 Kees van Berkel
14
Example instructions 2/27/2019 Kees van Berkel
15
GCD in GCL x,y:= X,Y ; do xy if x>y x:= x-y
[] x<y y:= y-x fi od { R: x=gcd(X,Y) } 2/27/2019 Kees van Berkel
16
GCD in DLX assembler pre: LW R1,4(R0) R1:=Mem[4+0]
loop: SUB R3,R1,R2 R3:=R1-R2 BEQZ R3,”exit” if (R3=0) then PC:=“exit” SLT R4,R1,R2 R4:=(R1<R2) BEQZ R4,”pos2” if (R4=0) then PC:=“pos2” pos1: SUB R2,R2,R1 R2:=R2-R1 J “loop” PC:=“loop” pos2: SUB R1,R1,R2 R1:=R1-R2 exit: SW 20(R0),R1 Mem[20+0]:=R1 HLT 2/27/2019 Kees van Berkel
17
DLX instruction mixes [from H&P, Figs 2.26, 2.27] 2/27/2019
Kees van Berkel
18
DLX interface, state Instruction memory Mem (Data memory) address r0
pc r1 r2 DLX CPU Reg instruction data r/w r31 clock interrupt 2/27/2019 Kees van Berkel
19
DLX: “Moore machine” (ignoring interrupts)
Reg[0],pc := 0,0 ; do Mem[Reg[rs1 +immediate], pc, Reg[rd] := if SW Reg[rd] fi , if J pc+4+offset [] BEQZ if Reg[rs]=0 pc+4 +immediate [] Reg[rs]#0 pc+4 fi [] else pc+4 fi , if LW Mem[rs1+immediate] [] ADD ALU(add, Reg[rs1], Reg[rs2]) fi od 2/27/2019 Kees van Berkel
20
DLX: 5-step sequential execution
2/27/2019 Kees van Berkel
21
DLX: 5-step sequential execution
IF ID EX MM WB Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem 2/27/2019 Kees van Berkel
22
Bibliography Computer Architecture; a Quantitative Approach (3rd Ed.); John L Hennessy & David A Patterson; Morgan Kaufmann Publishers Inc, 1996. ARM System Architecture; Steve Furber; Addison Wesley, 1996. DSP Processor Fundamentals, Architectures and Features; Phil Lapsey et al (Berkeley Design Technology Inc.), IEEE, 1996. newscenter/archive/2004/handshake.html 2/27/2019 Kees van Berkel
23
Some references www.handshakesolutions www.arm.com/news/6936.html
newscenter/archive/2004/handshake.html 2/27/2019 Kees van Berkel
24
Pipelining in Tangram (cntd)
Output sequence b identical for P0, P1, and P2. P0 and P1 have same communication behavior; P1 is larger, slower, and warmer. P2 vs P1: similar in size, energy, and latency, but up to 3 times higher throughput, depending on (relative) complexity of f0, f1, f2. 2/27/2019 Kees van Berkel
25
DLX: 5-step sequential execution
IF ID EX MM WB Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem 2/27/2019 Kees van Berkel
26
DLX: pipelined execution
Time [in clock cycles] IF ID EX MM WB Program execution [instructions] 2/27/2019 Kees van Berkel
27
DLX: pipelined execution
Instruction Fetch Inst.Decode EXecute Memory Write Back 4 0? pc Instr. mem Reg Mem 2/27/2019 Kees van Berkel
28
DLX system organization
RAMaddr datatoRAM datafromRAM ROMaddr ROMdata dlx(…) system boundary rom(…) ram(…) files: RAMout RAMin system_dlx(…) file: gcd.bin 2/27/2019 Kees van Berkel
29
dlx0.ht #include types.ht & dlx0 : export proc ( ROMaddr!chan adtype
& ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S & datafromRAM?chan S30 ) . begin … RF: ram array U5 of S30 end 2/27/2019 Kees van Berkel
30
system_dlx0.ht #include "dlx0.ht" & dlx0 : proc ( ROMaddr!chan adtype
& ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ) . import & env_dlx4 : main proc ( & ROMfile? chan word & RAMinfile? chan S30 & RAMfile! chan S30 /* <<address,data>> */ ) . begin next slide end 2/27/2019 Kees van Berkel
31
system_dlx0.ht : main body
begin & ROMaddr : chan adtype & ROMdata : chan word & RAMaddr : chan rwadtype & datatoRAM : chan S30 & datafromRAM: chan S30 … & ROMinterface : proc() . begin .. end & RAMinterface : proc() . begin .. end | initialise() ; ROMinterface() || RAMinterface() || dlx0( ROMaddr, ROMdata, RAMaddr, datatoRAM, datafromRAM ) end 2/27/2019 Kees van Berkel
32
script htcomp -B system_dlx0
htsim -limit 1000 system_dlx0 gcd.bin RAMin RAMout htview system_dlx0 2/27/2019 Kees van Berkel
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.