University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture.

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 2 Basic Knowledge Relative timing of the elements is important

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 3 Programmers visible state Von Neumann architecture, both instructions and data in memory %eax %ecx %edx %ebx %esi %edi %esp %ebp Program registers PC Memory CC

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 4 Program counter The program counter holds the address of the instruction currently executed The next instruction has to be collected from memory (slow!) Kernel virtual memory Memory mapped region for shared libraries Run-time heap (created at runtime by malloc) User stack (created at runtime) Unused 0 Memory invisible to user code 0xc0000000 0x08048000 0x40000000 Read/write data Read-only code and data Loaded from the hello executable file printf() function 0xffffffff PC or

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 5 Processing a single instruction Fetch –Read the instruction (1-5 bytes) from memory Decode –Reads the values from the registers Execute –Perform a arithmetic/logic operation OR Test the jump conditions Memory –Read/Write to memory Write back –Update the registers PC update –Set the address of the next instruction

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 6 Seq. architecture Hardware connected with named wires (word & bytes, byte & bits, bit) Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back Register file Register file AB M E PC Instruction memory Instruction memory PC increment PC increment rBicodeifunrA PC valCvalP Need regids Need valC Instr valid Align Split Bytes 1-5 Byte 0

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 7 Stage Computation: ALU Operation –Formulate instruction execution as sequence of simple steps –Use same general form for all instructions OPl rA, rB icode:ifun  M 1 [PC] rA:rB  M 1 [PC+1] valP  PC+2 Fetch Read instruction byte Read register byte Compute next PC valA  R[rA] valB  R[rB] Decode Read operand A Read operand B valE  valB ifun valA Set CC Execute Perform ALU operation Set condition code register Memory R[rB]  valE Write back Write back result PC  valP PC update Update PC

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 8 Stage Computation: procedure call –Use ALU to decrement stack pointer –Store incremented PC call Dest icode:ifun  M 1 [PC] valC  M 4 [PC+1] valP  PC+5 Fetch Read instruction byte Read destination address Compute return point valB  R[ %esp ] Decode Read stack pointer valE  valB + –4 Execute Decrement stack pointer M 4 [valE]  valP Memory Write return value on stack R[ %esp ]  valE Write back Update stack pointer PC  valC PC update Set PC to destination

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 9 Stage Computation: jump –Compute both addresses –Choose based on setting of condition codes and branch condition XX/ifun jXX Dest icode:ifun  M 1 [PC] valC  M 4 [PC+1] valP  PC+5 Fetch Read instruction byte Read destination address Fall through address Decode Bch  Cond(CC,ifun) Execute Take branch? Memory Write back PC  Bch ? valC : valP PC update Update PC

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 10 Branch conditions jmp 70 jle 71 jl 72 je 73 jne 74 jge 75 jg 76 JXX Condition CodesDescription 1Direct jump (SFÔF) | ZFLess or equal <= SFÔFLess < ZFEqual == ~ ZFNon equal != ~ (SFÔF) & ~ ZFGreater or equal >= ~ (SFÔF)Greater >

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 11 Datapaths & Control Logic –ALU fun: select function –ALU A: select Input A –ALU B: select Input B –Set CC: Should condition code register be loaded? Execute Logic

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 12 Control logic: ALU A valE  valB + –4 Decrement stack pointer No operation valE  valB + 4 Increment stack pointer valE  valB + valC Compute effective address valE  valB OP valA Perform ALU operation OPl rA, rB Execute rmmovl rA, D(rB) popl rA jXX Dest call Dest ret Execute valE  valB + 4 Increment stack pointer int aluA = [ icode in { IRRMOVL, IOPL } : valA; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC; icode in { ICALL, IPUSHL } : -4; icode in { IRET, IPOPL } : 4; # Other instructions don't need ALU ];

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 13 Hardware structure This can be translated in silicon Instruction memory Instruction memory PC increment PC increment CC ALU Data memory Data memory New PC rB dstEdstM ALU A ALU B Mem. control Addr srcAsrcB re a d w ri t e ALU fun. Fetch Decode Execute Memory Write back dat a out Register file Register file AB M E Bch dstEdstMsrcAsrcB icodeifunrA PC valCvalP valBvalA Data valE valM PC newPC

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 14 Sequential is too slow Clock has to slow enough to let the signal propagate through all wires and transistors Critical path: the slowest path between any two storage devices Clk........................

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 15 Pipelining Divide the operations in stages and allow to start the next operation if the first operation is ready with first stage Increase the throughput, increase latency RegReg RegReg RegReg 100 ps20 ps100 ps20 ps100 ps20 ps Comb. logic A Comb. logic B Comb. logic C Clock

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 16 Insert registers between stages Pipeline registers means extra silicon and delay PC increment PC increment CC ALU Data memory Data memory Fetch Decode Execute Memory Write back Register file Register file AB M E valP d_sr cA, d_sr cB valA, valB aluA, aluB BchvalE Addr, Data valM PC W_valE, W_valM, W_dstE, W_dstM W_icode, W_valM icode, ifun, rA, rB, valC E M W F D valP f_PC predP C Instruction memory Instruction memory M_ico de, M_Bc h, M_val A 123456789 FDEM WFDEM W FDEMW FDEMW FDEMW Cycl e 5 W I1 M I2 E I3 D I4 F I5

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 17 Data hazards Additional pipeline control is needed to prevent unintended interactions between instructions Stalling (wait a few stages till hazard is gone) Data forwarding (passing value to E before M/W) Pipeline architecture already used for i386 http://www.pcmech.com/show/processors/35/ http://www.pcmech.com/show/processors/35/

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 18 Pipeline efficiency Pipeline control can prevent many, but not all interactions between instructions → bubbles For the model described in the book: Load / Use hazards (20% of load instr. → 1 bubble) Mispredicted branches (40% of jmp instr. → 2 bubbles) Return from procedure calls (100% of ret instr. → 3 bubbles)

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 19 Today’s architectures Superscalar (Pentium) (often two instructions/cycle) Dynamic execution (P6) (three instructions out-of-order/cycle) Explicit parallelism (Itanium) (six execution units)

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 20 Metrics of performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Scaling of algorithms Each metric has a place and a purpose, and each can be optimized

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 21 Summary Shown that an instruction set architecture can be translated onto multiple processor architectures –Complicated control logic on datapaths –Compilers have optimize the control logic for multiple machines/targets –A programmer can add/frustrate compiler

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 22 Assignment Practice Problem 4.21 (page 314) Calculate the throughput and latency of a n-stage pipeline for the given 6 blocks A 80 ps B 30 ps C 60 ps D 50 ps E 70 ps F 10 ps R e g 20 ps

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture.

Similar presentations

Presentation on theme: "University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture.

Similar presentations

Presentation on theme: "University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture."— Presentation transcript:

Similar presentations

About project

Feedback