Download presentation
Presentation is loading. Please wait.
1
Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir. Ad Peeters] at the Technical University of Eindhoven, the Netherlands
2
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-102 TU/e VLSI programming for … Low costs: –introduce resource sharing. Low delay (high throughput): –introduce parallelism. Low energy (low power): –reduce activity; …
3
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-103 TU/e VLSI programming for high performance Keep it simple!! Make the analysis; focus on bottlenecks Introduce parallelism: expressions, commands, loops, pipelining Enable parallelism, by reducing dependencies such as resource sharing
4
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-104 TU/e Expression-level parallelism Examples: balancing: (v+w)+(x+y) is faster than v+w+x+y substitution: z:= g(f(x)) is faster than y:= f(x) ; z:= g(y) carry-select adder carry-save multiplier
5
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-105 TU/e Command level parallelism If S2 does not depend on outcome of S1 then S1 ; S2 can be transformed into S1 || S2. (dependencies: data, sharing, synchronization) This reduces computation time , unless ordering is enforced through external synchronization. (S1 ; S2 ) = (;) + (S1) + (S2) (S1 || S2 ) = (||) + max( (S1), (S2))
6
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-106 TU/e Exposure of cmd-level parallelism Let *[S] be a shorthand for forever do S od Assume S0 must precede S1 and S1 must precede S2; How to speedup *[ S0 ; S1 ; S2 ] ? *[ S0 ; S1 ; S2 ] = { loop unfolding } S0 ; *[S1 ; S2 ; S0 ] = { S0 does not depend on S1} S0 ; *[S1 ; (S2 || S0) ]
7
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-107 TU/e wagging *[a?x ; b!f(x)] ={ loop unrolling, renaming } *[a?x ; b!f(x) ; a?y ; b!f(y) ] ={ loop folding } a?x ; *[b!f(x) ; a?y ; b!f(y) ; a?x] {increases slack by 1} a?x ; *[(b!f(x) || a?y) ; (b!f(y) || a?x)]
8
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-108 TU/e Parallel reads from REG file Let RF be a register file. Then x:= RF[i] ; y:= RF[j] cannot be parallelized. (Register files have a single read port.) Parallel read actions can be realized by doubling the register file: > := > { write } and > := > { read }
9
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-109 TU/e Pipelining in Tangram Compare three programs: P0: *[ a?x0 ; b!f2(f1(f0(x0))) ] P1: *[ a?x0; x1:= f0(x0) ; x2:= f1(x1) ; b!f2(x2) ] P2: *[ a?x0 ; a1!f0(x0) ] || *[ a1?x1 ; a2!f1(x1) ] || *[ a2?x2 ; b!f2(x2) ]
10
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1010 TU/e Pipelining in Tangram (cntd) Output sequence b identical for P0, P1, and P2. P0 and P1 have same communication behavior; P1 is larger, slower, and warmer. P2 vs P1: similar in size, energy, and latency, but up to 3 times higher throughput, depending on (relative) complexity of f0, f1, f2.
11
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1011 TU/e A Processor Example: DLX (“Deluxe”) (AMD 29K + DECstation 3100 + HP850 + IBM801 + Intel i860 + MIPS M/120A + MIPS M/1000 + Motorola 88K + RISC I + SGI 4D/60 + SPARCstation-1 + Sun 4/110 + Sun-4/260) / 13 = DLX Other RISC examples include: Cray-1,2,3, AMD2900, DEC Alpha, ARM.
12
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1012 TU/e DLX instruction formats Opcode loads, stores, conditional branch,.. rs1 rd Immediate I-type offset Opcode Jump, jump and link, trap, return from exception J-type Opcode Reg-reg ALU operations rs1 rdrs2 function R-type 31 26, 25 21, 20 16, 15 11, 10 0
13
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1013 TU/e Example instructions
14
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1014 TU/e GCD in DLX assembler pre:LWR1,4(R0)R1:=Mem[4+0] LWR2,8(R0)R2:=Mem[8+0] loop: SUBR3,R1,R2R3:=R1-R2 BEQZR3,”exit”if (R3=0) then PC:=“exit” SLTR4,R1,R2R4:=(R1<R2) BEQZR4,”pos2”if (R4=0) then PC:=“pos2” pos1:SUBR2,R2,R1R2:=R2-R1 J“loop”PC:=“loop” pos2:SUBR1,R1,R2R1:=R1-R2 J“loop”PC:=“loop” exit:SW20(R0),R1Mem[20+0]:=R1 HLT
15
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1015 TU/e DLX interface, state Instruction memory Mem (Data memory) pc address instruction address data r/w clockinterrupt r0 r1 r2 r31 DLX CPU Reg
16
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1016 TU/e DLX: “Moore machine” (ignoring interrupts) Reg[0],pc := 0,0 ; do Mem[Reg[rs1 +immediate], pc, Reg[rd] := if SW Reg[rd] fi, if J pc+4+offset [] BEQZ if Reg[rs]=0 pc+4 +immediate [] Reg[rs]#0 pc+4 fi [] else pc+4 fi, if LW Mem[rs1+immediate] [] ADD ALU(add, Reg[rs1], Reg[rs2]) fi od
17
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1017 TU/e DLX: 5-step sequential execution Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem IFIDEXMM WB
18
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1018 TU/e DLX: pipelined execution IFIDEXMMWB IFIDEXMMWB IFIDEXMM IFIDEX IFIDEXMMWB IFIDEXMMWB Time [in clock cycles] 1 2 3 4 5 6 7 8... Program execution [instructions]
19
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1019 TU/e DLX: pipelined execution Reg pc 0? Instr. mem 4 Mem Instruction FetchInst.DecodeEXecuteMemory Write Back
20
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1020 TU/e DLX system organization dlx(…) rom(…)ram(…) system_dlx(…) file: gcd.bin files: RAMout RAMin RAMaddr datatoRAM datafromRAM ROMaddr ROMdata system boundary
21
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1021 TU/e dlx0.ht #include types.ht & dlx0 : export proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ). begin … RF: ram array U5 of S30 end
22
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1022 TU/e system_dlx0.ht #include "dlx0.ht" & dlx0 : proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ). import & env_dlx4 : main proc ( & ROMfile? chan word & RAMinfile? chan S30 & RAMfile! chan S30 /* > */ ). begin next slide end
23
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1023 TU/e system_dlx0.ht : main body begin & ROMaddr : chan adtype & ROMdata : chan word & RAMaddr : chan rwadtype & datatoRAM : chan S30 & datafromRAM: chan S30 … & ROMinterface : proc(). begin.. end & RAMinterface : proc(). begin.. end | initialise() ; ROMinterface() || RAMinterface() || dlx0( ROMaddr, ROMdata, RAMaddr, datatoRAM, datafromRAM ) end
24
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1024 TU/e script htcomp system_dlx0 htsim -limit 1000 system_dlx0 RAMin RAMout htview system_dlx0 Htmap system_dlx0
25
Philips Research, Kees van Berkel, Ad Peeters, 2002-09-1025 TU/e DLX0: instruction loop do -halted then ROMaddr!PC ; ROMdata?ir ; PC:=PC+4 {auxPC:=PC+4 ; PC:=PCaux} ; case (ir cast Itype.0) is > then LW() or > then SW() or > then if (ir cast Rtype.4 = 1) then SLT() fi or > then BEQZ() or > then J() or > then halted:=true si od
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.