Introduction to VLSI Programming Lecture 7: Introduction to the DLX

Slides:

Advertisements

Similar presentations

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.

Advertisements

RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.

OMSE 510: Computing Foundations 4: The CPU!

ELEN 468 Advanced Logic Design

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Choice for the rest of the semester New Plan –assembler and machine language –Operating systems Process scheduling Memory management File system Optimization.

CPEN Digital System Design Chapter 10 – Instruction SET Architecture (ISA) © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall.

DLX Instruction Format

Introduction to Silicon Programming in the Tangram/Haste language Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single-Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

ECE 445 – Computer Organization

CDA 3101 Fall 2013 Introduction to Computer Organization

1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.

CS 211: Computer Architecture Lecture 2 Instructor: Morris Lancaster.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)

Lecture 2: Instruction Set Architecture part 1 (Introduction) Mehran Rezaei.

Introduction to Computer Organization Pipelining.

May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.

Computer Architecture & Operations I

Computer Architecture & Operations I

Computer Organization CS224

MIPS Instruction Set Advantages

Morgan Kaufmann Publishers

/ Computer Architecture and Design

ELEN 468 Advanced Logic Design

CMSC 611: Advanced Computer Architecture

Morgan Kaufmann Publishers

RISC Concepts, MIPS ISA Logic Design Tutorial 8.

Single Clock Datapath With Control

CDA 3101 Spring 2016 Introduction to Computer Organization

School of Computing and Informatics Arizona State University

Instructions - Type and Format

Introduction to VLSI Programming Lecture 9: High Performance DLX

Introduction to VLSI Programming Lecture 7: Introduction to the DLX

Systems Architecture II

MIPS Processor.

Morgan Kaufmann Publishers The Processor

Rocky K. C. Chang 6 November 2017

Introduction to VLSI Programming Lecture 8: High Performance (DLX)

An Introduction to pipelining

The Processor Lecture 3.1: Introduction & Logic Design Conventions

Guest Lecturer TA: Shreyas Chand

Systems Architecture I

Computer Architecture

Lecture 14: Single Cycle MIPS Processor

Processor: Multi-Cycle Datapath & Control

What is Computer Architecture?

Arrays versus Pointers

Instruction Set Principles

What is Computer Architecture?

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

What is Computer Architecture?

Introduction to VLSI Programming High Performance DLX

Computer Architecture

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

Lecture 4: Instruction Set Design/Pipelining

Guest Lecturer: Justin Hsia

CSE378 Introduction to Machine Organization

MIPS Processor.

Chapter 4 The Von Neumann Model

Presentation transcript:

Introduction to VLSI Programming Lecture 7: Introduction to the DLX (course 2IN30) Prof. dr. ir.Kees van Berkel

VLSI programming for … Low costs: introduce resource sharing. Low delay (high throughput): introduce parallelism. Low energy (low power): reduce activity; … 2/27/2019 Kees van Berkel

VLSI programming for low costs Keep it simple!! Introduce resource sharing: commands, auxiliary variables, expressions, operators. Enable resource sharing, by: reducing parallelism making similar commands equal 2/27/2019 Kees van Berkel

Procedure definition vs declaration Procedure definition: P = proc (). S provides a textual shorthand (expansion) each call generates copy of resource, i.e. no sharing Procedure declaration: P : proc (). S defines a sharable resource each call generates access to this resource 2/27/2019 Kees van Berkel

Hints and Tips: optimization When asked to optimize for area (low cost) it is allowed to invest time (execution time, extra iterations, …) When asked to optimize for speed, it is allowed to invest area (pipeline stages, parallelism, …) 2/27/2019 Kees van Berkel

Hints and Tips: a known bug Statement of form if –x then S0 else S1 fi During simulation wrong alternative is selected (e.g. S0 when x = true) Work around: remove negation: if x then S1 else S0 fi 2/27/2019 Kees van Berkel

Instruction Set Architecture ISA is interface between hardware and software. Hence, a good ISA: allows easy programming (compilers, OS, ..); allows efficient implementations (hardware); has a long lifetime (survives many HW generations); is general purpose. 2/27/2019 Kees van Berkel

ISA classification Code sequence for C:= A+B 2/27/2019 Kees van Berkel

Reduced Instruction Set Computer 1980: Patterson and Ditzel: “The Case for RISC” fixed 32-bit instruction set, with few formats load-store architecture large register bank (32 registers), all general purpose On processor organization: hard-wired decode logic pipelined execution single clock-cycle execution 2/27/2019 Kees van Berkel

RISC processors Advantages: smaller die size (single chip processor) shorter development time (simplicity) higher performance Disadvantages: poor code density cannot execute X86 code 2/27/2019 Kees van Berkel

A “Typical” RISC 32-bit instructions, 3 fixed formats 32 general purpose registers, 32-bit 3 address arithmetic instructions, reg-reg single address mode for load/store: “address + displacement” simple branch conditions; delayed branch 2/27/2019 Kees van Berkel

DLX (“Deluxe”) (AMD 29K + DECstation 3100 + HP850 + IBM801 + Intel i860 + MIPS M/120A + MIPS M/1000 + Motorola 88K + RISC I + SGI 4D/60 + SPARCstation-1 + Sun 4/110 + Sun-4/260) / 13 = DLX Other RISC examples include: Cray-1,2,3, AMD2900, DEC Alpha, ARM. 2/27/2019 Kees van Berkel

DLX instruction formats 31 26, 25 21, 20 16, 15 11, 10 0 Opcode Reg-reg ALU operations rs1 rd rs2 function R-type Opcode loads, stores, conditional branch, .. rs1 rd Immediate I-type offset Opcode Jump, jump and link, trap, return from exception J-type 2/27/2019 Kees van Berkel

Example instructions 2/27/2019 Kees van Berkel

GCD in GCL x,y:= X,Y ; do xy  if x>y  x:= x-y [] x<y  y:= y-x fi od { R: x=gcd(X,Y) } 2/27/2019 Kees van Berkel

GCD in DLX assembler pre: LW R1,4(R0) R1:=Mem[4+0] loop: SUB R3,R1,R2 R3:=R1-R2 BEQZ R3,”exit” if (R3=0) then PC:=“exit” SLT R4,R1,R2 R4:=(R1<R2) BEQZ R4,”pos2” if (R4=0) then PC:=“pos2” pos1: SUB R2,R2,R1 R2:=R2-R1 J “loop” PC:=“loop” pos2: SUB R1,R1,R2 R1:=R1-R2 exit: SW 20(R0),R1 Mem[20+0]:=R1 HLT 2/27/2019 Kees van Berkel

DLX instruction mixes [from H&P, Figs 2.26, 2.27] 2/27/2019 Kees van Berkel

DLX interface, state Instruction memory Mem (Data memory) address r0 pc r1 r2 DLX CPU Reg instruction data r/w r31 clock interrupt 2/27/2019 Kees van Berkel

DLX: “Moore machine” (ignoring interrupts) Reg[0],pc  := 0,0 ; do Mem[Reg[rs1 +immediate], pc, Reg[rd]  :=  if SW  Reg[rd] fi , if J  pc+4+offset [] BEQZ  if Reg[rs]=0  pc+4 +immediate [] Reg[rs]#0  pc+4 fi [] else  pc+4 fi , if LW  Mem[rs1+immediate] [] ADD  ALU(add, Reg[rs1], Reg[rs2]) fi  od 2/27/2019 Kees van Berkel

DLX: 5-step sequential execution 2/27/2019 Kees van Berkel

DLX: 5-step sequential execution IF ID EX MM WB Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem 2/27/2019 Kees van Berkel

Bibliography Computer Architecture; a Quantitative Approach (3rd Ed.); John L Hennessy & David A Patterson; Morgan Kaufmann Publishers Inc, 1996. ARM System Architecture; Steve Furber; Addison Wesley, 1996. DSP Processor Fundamentals, Architectures and Features; Phil Lapsey et al (Berkeley Design Technology Inc.), IEEE, 1996. www.handshakesolutions.com www.arm.com/news/6936.html www.research.philips.com/ newscenter/archive/2004/handshake.html 2/27/2019 Kees van Berkel

Some references www.handshakesolutions www.arm.com/news/6936.html www.research.philips.com/ newscenter/archive/2004/handshake.html 2/27/2019 Kees van Berkel

Pipelining in Tangram (cntd) Output sequence b identical for P0, P1, and P2. P0 and P1 have same communication behavior; P1 is larger, slower, and warmer. P2 vs P1: similar in size, energy, and latency, but up to 3 times higher throughput, depending on (relative) complexity of f0, f1, f2. 2/27/2019 Kees van Berkel

DLX: 5-step sequential execution IF ID EX MM WB Reg A B Imm ir npc pc aluo cond lmd 0? Instr. mem 4 Mem 2/27/2019 Kees van Berkel

DLX: pipelined execution Time  [in clock cycles] 1 2 3 4 5 6 7 8 ... IF ID EX MM WB Program execution  [instructions] 2/27/2019 Kees van Berkel

DLX: pipelined execution Instruction Fetch Inst.Decode EXecute Memory Write Back 4 0? pc Instr. mem Reg Mem 2/27/2019 Kees van Berkel

DLX system organization RAMaddr datatoRAM datafromRAM ROMaddr ROMdata dlx(…) system boundary rom(…) ram(…) files: RAMout RAMin system_dlx(…) file: gcd.bin 2/27/2019 Kees van Berkel

dlx0.ht #include types.ht & dlx0 : export proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ) . begin … RF: ram array U5 of S30 end 2/27/2019 Kees van Berkel

system_dlx0.ht #include "dlx0.ht" & dlx0 : proc ( ROMaddr!chan adtype & ROMdata?chan word & RAMaddr!chan rwadtype & datatoRAM!chan S30 & datafromRAM?chan S30 ) . import & env_dlx4 : main proc ( & ROMfile? chan word & RAMinfile? chan S30 & RAMfile! chan S30 /* <<address,data>> */ ) . begin next slide end 2/27/2019 Kees van Berkel

system_dlx0.ht : main body begin & ROMaddr : chan adtype & ROMdata : chan word & RAMaddr : chan rwadtype & datatoRAM : chan S30 & datafromRAM: chan S30 … & ROMinterface : proc() . begin .. end & RAMinterface : proc() . begin .. end | initialise() ; ROMinterface() || RAMinterface() || dlx0( ROMaddr, ROMdata, RAMaddr, datatoRAM, datafromRAM ) end 2/27/2019 Kees van Berkel

script htcomp -B system_dlx0 htsim -limit 1000 system_dlx0 gcd.bin RAMin RAMout htview system_dlx0 2/27/2019 Kees van Berkel