Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and do not collect £200.

Similar presentations


Presentation on theme: "Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and do not collect £200."— Presentation transcript:

1 Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and do not collect £200

2

3 Registers ALU GPR Architecture (General Purpose Register) Let’s compute 3 + 2 = 5 again ! 32 5 3 2 2 3 5 5 Bus Y Bus X Bus W Put 3 on bus X Put 2 on bus Y Stuff X and Y into ALU ALU adds X and Y SLU send result to bus W Put bus W into Mem Our programmer needs to do this !

4

5 GRP Machine Details Memory Registers r11 r0 r1 r2 r3 r4 r10 ALU.. 0 8 16 24 32.. Load from Memory Store to Memory Load reg from mem Add reg to reg into reg Store reg in mem Our programmer needs to do this !

6

7 Accumulator Architecure Memory ALU.. 0 8 16 24 32.. Get 3 from Memory and ADD ! 2 4 8 1. Assume 8 is already in the accumulator. The programmer writes Accumulator 3 8 Add 3 2. The ALU does 3 + 8 = 11 and writes the result back into the accumulator 3

8

9 Let’s build a Computer Let’s take a RISC. What do we need ? Memory Registers ALU Control Circuits A programming language A good Name - Simple Although Meaningful

10

11 What’s needed to build Sam-4 ? PC Code Memory Code Memory – to store the program Arithmetic – Logic Unit to do the maths business Registers to hold results of computations X Y W Y W r1 r2 r0 X Data Memory 0 1 7 mar mdr Data memory to hold source and results of our work

12

13 Program Memory PC = 4 12 8 4 0 Code Memory add halt store load add Memory stores program instructions at a sequence of byte addresses. Each instruction is 32 bits, so the addresses increment by 4 bytes. Here the Program Counter input address 4 to the memory which reads out the data word (32 bits) at address 4. This is the inst- ruction ‘add’ Address in Data out

14

15 Registers, Registers 1. Registers Store data at addresses. Yep, that’s Memory ! 3. Multiport Registers have an input port (W) where data is send to be written into the register file. 2. There are TWO read ports (X and Y) where data can be simultaneously read out of the reg file. 4. The addresses for the read ports (X and Y) and the write port (W) come in here. X Y W Y W r1 r2 r0 X

16

17 Data Memory 0 1 7 mar mdr Here’s the memory The Memory Data Register (MDR) is a parking place for data coming and going from the memory. The Memory Address Register holds the address of the data location selected for read or write e,g, 7 7

18

19 Here’s Sam Data Memory Instruction reg Code Memory ALU r1 r2 r0 X Y W XY W 0 1 7 mar mdr

20

21 Fetch-Execute Cycle 1. Fetch instruction from memory 2. Decode the opcode and read any registers 3. Do any ALU operations 5. Write back results to registers (Much more Clever and Useful) add r3,r2,r1 Get contents of address 1 4. Do any Memory Access ALU <- r1 ALU <- r2 ALU add None needed r3 <- ALU

22

23 First Example ld r0, [1] ld r1, [2] add r2,r1,r0 st r2, [7] Load r0 with data at address 1 Load r1 with data at address 2 Add r0 and r1. Put result in r2 Store r2 in memory address 7 Note each of these instructions runs through 5 steps of its own F-E Cycle

24

25 1. Instruction Fetch Ld r0,[1] Code Memory Data Memory ALU r1 r2 r0 Ld 0 1 PC = 0 X Y W XY0 1 7 mar mdr

26

27 2. Decode, Reg Ops Data Memory + Code Memory ALU r1 r2 r0 Ld r0,[1] Ld 0 1 PC = 4 1 X Y W XY0 1 7 mar mdr

28

29 3. ALU Operation Code Memory Data Memory ALU r1 r2 r0 Ld r0,[1] Ld 0 1 PC = 4 1 1 1 X Y W XY0 1 7 mar mdr

30

31 4. Memory Access Code Memory Data Memory ALU r1 r2 r0 Ld r0,[1] Ld 0 1 PC = 4 1 1 0 7 X Y W XY0 1 7 mar mdr

32

33 5. Register Write Code Memory Data Memory ALU r1 r2 r0 Ld r0,[1] Ld 0 1 PC = 4 1 0 7 X Y W XY mar mdr W

34

35 1. Instruction Fetch Data Memory Code Memory ALU r1 r2 r0 X Y W XY W 0 1 7 add r2,r0,r1 add 2 0 1 PC = 4 mar mdr

36

37 PC = 8 2. Decode, Reg Ops Y Data Memory + Code Memory ALU r1 r2 r0 X W XY W 0 1 7 add r2,r0,r1 add 2 0 1 mar mdr

38

39 3. ALU Operation Data Memory Code Memory ALU r1 r2 r0 X Y W XY W 0 1 7 add r2,r0,r1 add 2 0 1 PC = 8 mar mdr

40

41 4. Memory Access Data Memory Code Memory ALU r1 r2 r0 X Y W XY W 0 1 7 add r2,r0,r1 add 2 0 1 PC = 8 mar mdr

42

43 5. Register Write W Data Memory Code Memory ALU r1 r2 r0 X Y W XY0 1 7 add r2,r0,r1 add 2 0 1 PC = 8 mar mdr

44

45 Instruction Encoding Example addrdrsrtunused rd <- rs + rt e.g. add r3, r1, r2 means r3 = r1 + r2 010110000110001000001unused All Sam’s instructions take up 32 bits. Sam’s instructions start with the opcode then the destination reg- ister then the source register opcode destination Source regs First 6 bits for the opcode. 321 6555Nr of Bits11

46

47 The Instruction Register 010110000100000100011unused Code Memory Add r2,r1,r3 add 2 1 3 312 Loaded with the instruction, the IR decodes this into bits which drive the CPU digital logic circuits ? Electronic Wires

48

49 Control Path 001010000100000100011unused 000101000100000100011unused add r2, r1, r3 sub r2, r1, r3 ALU + + - - The add instruction is decoded and produces digital signals which select the + function in the ALU Add ! Subtract ! The sub function decoded produces different digital signals r1 r3 r1r3

50

51 Sam and MIPS are 32 bit 001010001100100100011unused 001010101001111110010101011011111 00101000010000010101001111111011 opcoderdrsrtunused opcoderdrs16-bit address add rd,rs,rt ldr rd,[rs+c] ldr rd,[c] opcode26-bit address 32 bits wide

52

53 Other Arithmetic Instructions subrdrsrtunused rd <- rs - rt opcode destination Source regs Same coding applies to other arithmetic instructions sub r3,r2,r1 and r2,r1,r0 or r5,r1,r2 6555Nr of Bits

54

55 unused A simple ‘Load’ instruction ‘Load into rd the contents of memory at address which is in reg rs.’ Simple! 7 696 2315 1154 1453 2 1 0 ldr r9, [r1] 3 145r9 145 rs rdld opcode destination Single source reg 1. Let’s say have already loaded r1 with 3 2. Get data from mem at addr r1 (=3) 2. Load the data into r9 memory

56

57 A more complex ‘Load’ constant crsrdldr opcode destination Source Load register rd with the contents of memory which you find at address r1 + c. 7 696 2315 1154 1453 2 1 0 ldr r9, [r1 + 2] 3 + 2 5 231r9 231 The mem address is formed as a sum memory

58

59 … and a ‘Store’ instruction constant crsrdstr opcode destination Source Note here the data is moved from destination to store. Confusing? Mm. 7 696 1965 1154 1453 2 1 0 str r9, [ r1 + 2 ] 3 + 2 5 196r9 196 1. Get data from r1 2. Write it to memory What’s this?

60

61 ‘Load Immediate’ Constant Crdldi opcode destination In load immediate we get the constant C immediately following the opcode into the reg. ldi r9, 5 5 5r9 All reference to memory has gone! Load ‘5’ straight into r9

62

63 A Summary So Far … Example add r3,r1,r1add rd,rs,rt str r6,[r1 + 1]str rd, [rs + c] str r0, [r1]st rd, [rs] ldr r2,[r3 + 4]ldr rd, [rs + c] ldr r2,[4]ld rd, [rs] ldi r0,3ldi rd,C Now it’s time to move on and look in detail at the hierarchy of computer languages – to see the influence on the ISA.

64

65 Electronics Assembling a Spreadsheet ld r0, [ g ] ld r1, [ h ] add r2,r0,r1 st r2, [ f ] Main() { int f,g,h; f = g + h; } Excel Application HLL Imple- mentation ISA Assembler The Great Idea here is that the ISA we need at the bottom must serve the grand master at the top, the Application. The ISA must support the HLL implementation

66

67 Arrays (= Tables) How do we sum the array of numbers in column B? 1. We would use the instruction ld r1,[r0 + B] where B=3, the start address of the array 2. Then we load r0 with 0 then 1, then 2, … to scan down the array Ld r0, 0 Ld r3, 0 Ld r1, [r0 + 3] r0 (=0) +3 = 3

68

69 Arrays (= Tables) How do we sum the array of numbers in column B? Inc r0 Ld r1, [r0 + 3] add r3,r3,r1 Get next cell, lad its value and add it to the sum, in r3 1.Increment r1 to get the next data value inc r1 (0 + 1 = 1) 2. ld r2,[r0 + B] where B=3, the start address of the array but now r- contains 0

70

71 Making Decisions if(c == 10) b = b + 2;2; Let’s say we want to add 2 to a number B if another number C is equal to 10 You mean, ‘If C = 10, then add 2 to B’ Yep Here’s how we would do it in C … addi r3,r3,2 bne r2,r1,36 … … ldi r1,10 36 32 28 24 20 16 Branch around the add Branch if not equal r1 r2 to addr 36 What about SAM? First load the test number 10

72

73 Loops ldi r2, 0 ldi r1, 4 ldi r0, 0 8 4 0 bne r0,r1,12 addi r0, r0, 1 addi r2, r2, 3 20 16 12 Let’s say we want to make the sequence 0,3,6,9,12 and stop. 0 1 2 3 4 0 3 6 9 12 We take 4 steps and each step add 3 x = x + 3 So we need a register to keep track of the number of steps (r0) And a register to hold the sum at each step r0r2 Branch unless r0 = r1 = 4

74

75 CBP 2005Comp3070 Computer Architecture 75 Some x86 instructions mov ax, [bx + c] mov [ax], bx add ax, bx add [bx], ax These look rather like Sam’s RISC ops But this is not. Here the contents of ax is being added straight into memory ! The x86 is a register – memory ISA and Sam is a register – register ISA ldi r1, a ldi r2, b add r3,r1,r2 st r3, b mov ax, a add b,ax Let’s compare the RR and RM ISA’s. Clearly RR needs more memory while the RM uses stronger operations Sam Intel x86

76

77 CBP 2005Comp3070 Computer Architecture 77 Intel Instruction Format IA-32 Format

78

79 CBP 2005Comp3070 Computer Architecture 79 Variable Length Instructions 0%10%20%30% 1 2 3 4 5 6 7 8 9 10 Expresso Gcc Spice Nasa All Sam’s instructions had the same length, 32 bits. This is also true for other RISC ISA’s such as SPARC and MIPS. Compare this with the x86 instruction vary from 1 to 17 bytes. Here’s some stats. Instruction Length (bytes) Frequency of use Clearly long complex instructions are used infrequently But the use does depend on the app.

80

81 CBP 2005Comp3070 Computer Architecture 81 Instruction Timing T1T2T3T4T5 Fetc h Deco de, Reg Op ALU Op Mem Access Reg Write All Sam’s instructions occur in 5 clock cycles One Clock Cycle Time 1 Gigahertz SPARC in 1 second are 1 GigaClockCycles That’s 10 9 cycles That’s 1,000,000,000 cycles That’s 200,000,000 add ops !

82

83 CBP 2005Comp3070 Computer Architecture 83 Variable Time Instructions Here’s a timing diagram for an Intel add T1T2T3T4T5 Fetch Decode, Reg Op ALU Mem Access Reg Write T1T2T3T4T5 Fetch Decode, Reg Op ALU Mem Access Reg Write add ax, [bx + c] [bx + c] ax = ax + mem[] We need two adds. The first to get the address summed up … … and the second to actually add memory to register ax

84

85 CBP 2005Comp3070 Computer Architecture 85 strcmp(str, Greenspan); Potent x86 Instructions mov x,2Immediate to memory6 xlat xTranslate al via table1 imul xMultiply memory with ax4 inc xIncrement memory by 14 Repne scasbScan string for match !various Greenspan 1.Application 2.High-Level Language (‘C’ ) 3.Intel ISA code

86

87 CBP 2005Comp3070 Computer Architecture 87 Top 10 Intel x86 Instructions RankInstructionUsage 1load22% 2 conditional branch20% 3 arithmetic / logic19% 4compare16% 5store12 % 6move reg - reg4% 7 call - return2% We see that most instructions are Simple load, store, calculate, branch. None of Intel’s potent stuff figures here. So why did Intel design instructions no-one uses ?

88

89 CBP 2005Comp3070 Computer Architecture 89 ISA R&D into the 80’s 1980 Berkeley Patterson RISC (SPARC) 1981 Stanford Hennessy MIPS - Easy to Decode Ops - Fast Issue Rate - Only load and Store references memory - Lots of registers Emerging Design Guidelines Let’s downshift and make things simpler … Use simple instructions, load, store, add Many of these will do one x86 potent op Need more memory, but memory is cheap More CPU cycles, but can still be faster

90

91 CBP 2005Comp3070 Computer Architecture 91 Intel Architecture Looks Great from the outside … … but is a golden mishmash with history of add-ons

92

93 CBP 2005Comp3070 Computer Architecture 93 RISC Architecture Minimalist Functional

94

95 CBP 2005Comp3070 Computer Architecture 95 Summary … so far RISC Minimalist Something like Zen All instructions the same length in memory Small number of instructions Small number of addressing modes Simple instructions 5 clock cycles SPARC, MIPS CISC Different Length in memory Large number of instructions Huge number of addressing modes Complex Instructions Variable number of clock cycles. Intel

96

97 CBP 2005Comp 3070 Computer Architecture 97 Today the consequences of … Intel (CISC) MIPS (RISC)

98

99 CBP 2005Comp 3070 Computer Architecture 99 Laundry Model Washer Drier Store Basket Wardrobe

100

101 CBP 2005Comp 3070 Computer Architecture 101 Process Steps A. Wash then Dry idle running time 9.0010.0011.00 1.Load the washer at 9.00 2.Done at 10, load the drier 3.Drier Done at 11

102

103 CBP 2005Comp 3070 Computer Architecture 103 Sequential Process 3 loads takes 6 hours time 9.00 15.00 11.00 1.Load washer at 9.00 2.Done at 10, load drier 3.Drier Done at 11 4.Reload washer at 11 5.Done at 12, load drier 6.Drier done at 13 7.Reload washer at 13 8.Done at 14, load drier 9.Done at 15 13.00

104

105 CBP 2005Comp 3070 Computer Architecture 105 Overlapping Process 3 loads takes 4 hours time 9.00 15.00 11.00 1.Load washer at 9.00 2.Done at 10, load drier reload washer 3.Both Done at 11. Reload drier reload washer 4.Both done at 12. Reload drier 5.Drier done at 13 13.00 From 10.00 till 11.00 both washer and dryer running concurrently

106

107 CBP 2005Comp 3070 Computer Architecture 107 Washing Pipeline Filling time 9.0011.0013.0015.0017.00 18.00 5 loads in 9 hours 5 Cycles !!! 1.Get washing 2.Wash 3.Dry 4.Store 5.Put away

108

109 CBP 2005Comp 3070 Computer Architecture 109 Can we Pipeline SAM ? Data Memory Instruction reg Code Memory ALU r1 r2 r0 X Y W XY W 0 1 7 mar mdr 1.Fetch 2.Dec/R eg 3.ALU 4.Mem 5.RW

110

111 CBP 2005Comp 3070 Computer Architecture 111 Pipelined Sam4 Data Memory 0 1 7 X Y W Y W r1 r2 r0 X Code Memory 1.Fetch2.Dec/R eg 3.ALU4.Mem5.RW Buffer time

112

113 CBP 2005Comp 3070 Computer Architecture 113 5 Stages in Pipeline ALU Mem Reg MemReg add r3,r1,r2 r1,r2 r3 add Let’s take the instruction add r3,r1,r2 and show which stage is needed for each part of the instruction. 1.Fetch 2.Dec/ Reg 3.ALU4.Mem5.RW time

114

115 CBP 2005Comp 3070 Computer Architecture 115 ld r0 Memr3 Two Instructions ld r3,[r0+2] Two instructions into the pipeline add r4,r1,r2 ALU add r1,r2 r4 r0 2 time

116

117 CBP 2005Comp 3070 Computer Architecture 117 Structural Hazard ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg Here we are being asked to read from memory and write to it simultaneously. Impossible! Write (store) Read (fetch) Solution – Use separate code and data memories add r4,r1,r2 st r0,[5]

118

119 CBP 2005Comp 3070 Computer Architecture 119 Hazardous Washing time 9.0011.0013.0015.0017.00 18.00 Washing basket containes both clean and dirty washing!

120

121 CBP 2005Comp 3070 Computer Architecture 121 Code and Data Memories ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg

122

123 CBP 2005Comp 3070 Computer Architecture 123 add r1,r2 r3 Data Hazard add r3,r1,r2 but need r3 here EARLIER ! add r4,r1,r3 add r1,r3 r4 r3 set here time

124

125 CBP 2005Comp 3070 Computer Architecture 125 Data Hazard ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg add r3,r1,r2 add r4,r1,r3 Need value of r3 for second instruction before the first is complete.

126

127 CBP 2005Comp 3070 Computer Architecture 127 Pipeline Stalls ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg Mem ALU Reg MemReg Stall ALU Mem Reg MemReg add r3,r1,r2 add r4,r1,r3 Resolve Hazard – Insert delay into second instruction stream. ‘Stall’ Cycles. But this needs extra electronics on the chip. Complex and Costly.

128

129 CBP 2005Comp 3070 Computer Architecture 129 Forwarding ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg add r3,r1,r2 add r4,r1,r3 Need value of r3 for second instruction before the first is complete. So build in extra circuits to get the data as soon as it is available from the ALU

130

131 CBP 2005Comp 3070 Computer Architecture 131 Compiler resolves Hazard ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg add r3,r1,r2 add r4,r1,r3 Compile can detect possible hazard and insert 2 nops (‘no ops’) ALU Mem Reg MemReg ALU Mem Reg MemReg nop

132

133 CBP 2005Comp 3070 Computer Architecture 133 Example op code regs alu mem reg write ld r1,[7] ld r2,[8] add r3,r1,r2 ld r1 [7] ld r2 [8] add r1, r2 r3

134


Download ppt "Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and do not collect £200."

Similar presentations


Ads by Google