Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connecting Computer Modules

Similar presentations


Presentation on theme: "Connecting Computer Modules"— Presentation transcript:

1 Connecting Computer Modules

2 All the units must be connected
Connecting All the units must be connected Different type of connection for different type of unit Memory Input/Output CPU

3 Computer Modules

4 Receives and sends data Receives addresses (of locations)
Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing

5 Input/Output Connection(1)
Similar to memory from computer’s viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer

6 Input/Output Connection(2)
Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control)

7 CPU Connection Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts

8 Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP)

9 A communication pathway connecting two or more devices
What is a Bus? A communication pathway connecting two or more devices Usually broadcast Often grouped A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels Power lines may not be shown

10 Width is a key determinant of performance
Data Bus Carries data Remember that there is no difference between “data” and “instruction” at this level Width is a key determinant of performance 8, 16, 32, 64 bit

11 Identify the source or destination of data
Address bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g has 16 bit address bus giving 64k address space

12 Control and timing information
Control Bus Control and timing information Memory read/write signal Interrupt request Clock signals

13 Bus Interconnection Scheme

14 Big and Yellow? What do buses look like?
Parallel lines on circuit boards Ribbon cables Strip connectors on mother boards e.g. PCI Sets of wires

15 Lots of devices on one bus leads to:
Single Bus Problems Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance If aggregate data transfer approaches bus capacity Most systems use multiple buses to overcome these problems

16 Traditional (ISA) (with cache)

17 High Performance Bus

18 Bus Types Dedicated Multiplexed Separate data & address lines
Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages More complex control Ultimate performance

19 Bus Arbitration More than one module controlling the bus e.g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed

20 Centralised Arbitration
Single hardware device controlling bus access Bus Controller Arbiter May be part of CPU or separate

21 Distributed Arbitration
Each module may claim the bus Control logic on all modules

22 Co-ordination of events on bus Synchronous
Timing Co-ordination of events on bus Synchronous Events determined by clock signals Control Bus includes clock line A single 1-0 is a bus cycle All devices can read clock line Usually sync on leading edge Usually a single cycle for an event

23 Synchronous Timing Diagram

24 Asynchronous Timing – Read Diagram

25 Asynchronous Timing – Write Diagram

26 PCI Bus Peripheral Component Interconnection Intel released to public domain 32 or 64 bit 50 lines

27 PCI Bus Lines (required)
Systems lines Including clock and reset Address & Data 32 time mux lines for address/data Interrupt & validate lines Interface Control Arbitration Not shared Direct connection to PCI bus arbiter Error lines

28 PCI Bus Lines (Optional)
Interrupt lines Not shared Cache support 64-bit Bus Extension Additional 32 lines Time multiplexed 2 lines to enable devices to agree to use 64-bit transfer JTAG/Boundary Scan For testing procedures

29 Transaction between initiator (master) and target Master claims bus
PCI Commands Transaction between initiator (master) and target Master claims bus Determine type of transaction e.g. I/O read/write Address phase One or more data phases

30 PCI Read Timing Diagram

31 PCI Bus Arbitration

32 Memory Structures Mix chapter 4 and 5

33 Characteristics of Memory
Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation

34 Location CPU Internal External

35 Capacity Word size Number of words The natural unit of organization
or Bytes

36 Unit of Transfer Internal External Addressable unit
Usually governed by data bus width External Usually a block which is much larger than a word Addressable unit Smallest location which can be uniquely addressed Word internally Cluster on M$ disks

37 Access Methods (1) Sequential Direct
Start at the beginning and read through in order Access time depends on location of data and previous location e.g. tape Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk

38 Access Methods (2) Random Associative
Individual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM Associative Data is located by a comparison with contents of a portion of the store e.g. cache

39 Internal or Main memory
Memory Hierarchy Registers In CPU Internal or Main memory May include one or more levels of cache “RAM” External memory Backing store

40 Memory Hierarchy - Diagram

41 Performance Access time Memory Cycle time Transfer Rate
Time between presenting the address and getting the valid data Memory Cycle time Time may be required for the memory to “recover” before next access Cycle time is access + recovery Transfer Rate Rate at which data can be moved

42 Physical Types Semiconductor Magnetic Optical Others RAM Disk & Tape
CD & DVD Others Bubble Hologram

43 Physical Characteristics
Decay Volatility Erasable Power consumption

44 Organization Physical arrangement of bits into words Not always obvious e.g. interleaved

45 The Bottom Line How much? How fast? How expensive? Capacity
Time is money How expensive?

46 Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape

47 Internal Memory

48 Semiconductor Memory Types

49 Semiconductor Memory RAM
Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Static or dynamic

50 Memory Cell Operation

51 Bits stored as charge in capacitors Charges leak
Dynamic RAM Bits stored as charge in capacitors Charges leak Need refreshing even when powered Simpler construction Smaller per bit Less expensive Need refresh circuits Slower Main memory Essentially analogue Level of charge determines value

52 Dynamic RAM Structure

53 Address line active when bit read or written Write
DRAM Operation Address line active when bit read or written Transistor switch closed (current flows) Write Voltage to bit line High for 1 low for 0 Then signal address line Transfers charge to capacitor Read Address line selected transistor turns on Charge from capacitor fed via bit line to sense amplifier Compares with reference value to determine 0 or 1 Capacitor charge must be restored

54 Bits stored as on/off switches No charges to leak
Static RAM Bits stored as on/off switches No charges to leak No refreshing needed when powered More complex construction Larger per bit More expensive Does not need refresh circuits Faster Cache Digital Uses flip-flops

55 Static RAM Structure

56 Transistor arrangement gives stable logic state State 1
Static RAM Operation Transistor arrangement gives stable logic state State 1 C1 high, C2 low T1 T4 off, T2 T3 on State 0 C2 high, C1 low T2 T3 off, T1 T4 on Address line transistors T5 T6 is switch Write – apply value to B & compliment to B Read – value is on line B

57 SRAM v DRAM Both volatile Dynamic cell Static
Power needed to preserve data Dynamic cell Simpler to build, smaller More dense Less expensive Needs refresh Larger memory units Static Faster Cache

58 Microprogramming (see later) Library subroutines
Read Only Memory (ROM) Permanent storage Nonvolatile Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

59 Written during manufacture Programmable (once)
Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory Erase whole memory electrically

60 Organization in detail
A 16Mbit chip can be organised as 1M of 16 bit words A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array Reduces number of address pins Multiplex row address and column address 11 pins to address (211=2048) Adding one more pin doubles range of values so x4 capacity

61 Refreshing Refresh circuit included on chip Disable chip Count through rows Read & Write back Takes time Slows down apparent performance

62 Typical 16 Mb DRAM (4M x 4)

63 Packaging

64 Module Organization

65 Module Organization (2)

66 Detected using Hamming error correcting code
Error Correction Hard Failure Permanent defect Soft Error Random, non-destructive No permanent damage to memory Detected using Hamming error correcting code

67 Error Correcting Code Function

68 Advanced DRAM Organization
Basic DRAM same since first RAM chips Enhanced DRAM Contains small SRAM as well SRAM holds last line read (c.f. Cache!) Cache DRAM Larger SRAM component Use as cache or serial buffer

69 Synchronous DRAM (SDRAM)
Access is synchronized with an external clock Address is presented to RAM RAM finds data (CPU waits in conventional DRAM) Since SDRAM moves data in time with system clock, CPU knows when data will be ready CPU does not have to wait, it can do something else Burst mode allows SDRAM to set up stream of data and fire it out in block DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)

70 IBM 64Mb SDRAM

71 SDRAM Operation

72 Adopted by Intel for Pentium & Itanium Main competitor to SDRAM
RAMBUS Adopted by Intel for Pentium & Itanium Main competitor to SDRAM Vertical package – all pins on one side Data exchange over 28 wires < cm long Bus addresses up to 320 RDRAM chips at 1.6Gbps Asynchronous block protocol 480ns access time Then 1.6 Gbps

73 RAMBUS Diagram

74 Cache Memory

75 This would need no cache This would cost a very large amount
So you want fast? It is possible to build a computer which uses only static RAM (see later) This would be very fast This would need no cache How can you cache cache? This would cost a very large amount

76 Locality of Reference During the course of the execution of a program, memory references tend to cluster e.g. loops

77 Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

78 Cache operation - overview
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

79 Cache Design Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

80 Size does matter Cost Speed More cache is expensive
More cache is faster (up to a point) Checking cache for data takes time

81 Typical Cache Organization

82 Mapping Function Cache of 64kByte Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes 16MBytes main memory 24 bit address (224=16M)

83 Each block of main memory maps to only one cache line
Direct Mapping Each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

84 Direct Mapping Address Structure
Tag s-r Line or Slot r Word w 14 2 8 24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier 8 bit tag (=22-14) 14 bit slot or line No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

85 Direct Mapping Cache Line Table
Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

86 Direct Mapping Cache Organization

87 Direct Mapping Example

88 Direct Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s – r) bits

89 Direct Mapping pros & cons
Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

90 Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

91 Fully Associative Cache Organization

92 Associative Mapping Example

93 Associative Mapping Address Structure
Word 2 bit Tag 22 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC FFF

94 Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

95 Set Associative Mapping
Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

96 Set Associative Mapping Example
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

97 Two Way Set Associative Cache Organization

98 Set Associative Mapping Address Structure
Word 2 bit Tag 9 bit Set 13 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF

99 Two Way Set Associative Mapping Example

100 Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

101 Replacement Algorithms (1) Direct mapping
No choice Each block only maps to one line Replace that line

102 Replacement Algorithms (2) Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

103 Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

104 Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

105 Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

106 Pentium 4 Cache 80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium 4 – L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative

107 Pentium 4 Diagram (Simplified)

108 Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

109 Pentium 4 Design Reasoning
Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate

110 Power PC Cache Organization
601 – single 32kb 8 way set associative 603 – 16kb (2 x 8kb) two way set associative 604 – 32kb 610 – 64kb G3 & G4 64kb L1 cache 8 way set associative 256k, 512k or 1M L2 cache two way set associative

111 PowerPC G4

112 Comparison of Cache Sizes


Download ppt "Connecting Computer Modules"

Similar presentations


Ads by Google