Presentation is loading. Please wait.

Presentation is loading. Please wait.

331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Buses and I/O system [Adapted from Dave Patterson’s.

Similar presentations


Presentation on theme: "331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Buses and I/O system [Adapted from Dave Patterson’s."— Presentation transcript:

1 331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Buses and I/O system [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

2 331 Week 13.2Spring 2006 Head’s Up  This week’s material l Buses: Connecting I/O devices -Reading assignment – PH 8.4 l Memory hierarchies -Reading assignment – PH 7.1 and B.8-9  Reminders l Next week’s material l Basics of caches -Reading assignment – PH 7.2

3 331 Week 13.3Spring 2006 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)

4 331 Week 13.4Spring 2006 Input and Output Devices  I/O devices are incredibly diverse wrt l Behavior l Partner l Data rate DeviceBehaviorPartnerData rate (KB/sec) Keyboardinputhuman0.01 Mouseinputhuman0.02 Laser printeroutputhuman200.00 Graphics displayoutputhuman60,000.00 Network/LANinput or output machine500.00-6000.00 Floppy diskstoragemachine100.00 Magnetic diskstoragemachine2000.00-10,000.00

5 331 Week 13.5Spring 2006 Magnetic Disk  Purpose l Long term, nonvolatile storage l Lowest level in the memory hierarchy -slow, large, inexpensive  General structure l A rotating platter coated with a magnetic surface l Use a moveable read/write head to access the disk  Advantages of hard disks over floppy disks l Platters are more rigid (metal or glass) so they can be larger l Higher density because it can be controlled more precisely l Higher data rate because it spins faster l Can incorporate more than one platter

6 331 Week 13.6Spring 2006 Organization of a Magnetic Disk  Typical numbers (depending on the disk size) l 1 to 15 (2 surface) platters per disk with 1” to 8” diameter l 1,000 to 5,000 tracks per surface l 63 to 256 sectors per track -the smallest unit that can be read/written (typically 512 to 1,024 B) l Traditionally all tracks have the same number of sectors -Newer disks with smart controllers can record more sectors on the outer tracks (constant bit density) Platters Track Sector

7 331 Week 13.7Spring 2006 Magnetic Disk Characteristic  Cylinder: all the tracks under the heads at a given point on all surfaces  Read/write data is a three-stage process: l Seek time: position the arm over the proper track (6 to 14 ms avg.) -due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number l Rotational latency: wait for the desired sector to rotate under the read/write head (½ of 1/RPM) l Transfer time: transfer a block of bits (sector) under the read-write head (2 to 20 MB/sec typical) l Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms) Sector Track Cylinder Head Platter

8 331 Week 13.8Spring 2006 Magnetic Disk Examples CharacteristicSun X6713AToshiba MK2016 Disk diameter (inches)3.52.5 Capacity73 GB20 GB MTTF (k hr’s)1,200300 # of platters - heads2 - 4 # cylinders16,383 # B/sector - # sectors/track512 - 63 Rotation speed (RPM)10,0004,200 Max. - Avg. seek time (ms)? - 6.624 - 13 Avg. rot. latency (ms)37.14 Transfer rate (PIO)35 MB/sec16.6 MB/sec Power (watts)< 2.5 Volume (in 3 )4.01 Weight (oz)3.49

9 331 Week 13.9Spring 2006 I/O System Interconnect Issues  A bus is a shared communication link (a set of wires used to connect multiple subsystems) l Performance l Expandability l Resilience in the face of failure – fault tolerance Processor Receiver Main Memory Keyboard bus

10 331 Week 13.10Spring 2006 Performance Measures  Latency (execution time, response time) is the total time from the start to finish of one instruction or action l usually used to measure processor performance  Throughput – total amount of work done in a given amount of time l aka execution bandwidth l the number of operations performed per second  Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time l the bit width of the operation * rate of the operation l usually used to measure I/O performance

11 331 Week 13.11Spring 2006 I/O System Expandability Cache Memory Memory - I/O Bus Main Memory I/O Controller Disk I/O Controller I/O Controller Terminal Network interrupt signals  Usually have more than one I/O device in the system l each I/O device is controlled by an I/O Controller Processor

12 331 Week 13.12Spring 2006 Bus Characteristics  Control lines l Signal requests and acknowledgments l Indicate what type of information is on the data lines  Data lines l Data, complex commands, and addresses  Bus transaction consists of l Sending the address l Receiving (or sending) the data Data Lines Control Lines

13 331 Week 13.13Spring 2006 Output (Read) Bus Transaction  Defined by what they do to memory l read = output: transfers data from memory (read) to I/O device (write) Processor Main Memory Control Data Step 1: Processor sends read request and read address to memory Processor Control Data Main Memory Step 2: Memory accesses data Processor Control Data Main Memory Step 3: Memory transfers data to disk

14 331 Week 13.14Spring 2006 Input (Write) Bus Transaction  Defined by what they do to memory l write = input: transfers data from I/O device (read) to memory (write) Processor Main Memory Control Data Step 1: Processor sends write request and write address to memory Processor Control Data Main Memory Step 2: Disk transfers data to memory

15 331 Week 13.15Spring 2006 Advantages and Disadvantages of Buses  Advantages l Versatility: -New devices can be added easily -Peripherals can be moved between computer systems that use the same bus standard l Low Cost: -A single set of wires is shared in multiple ways  Disadvantages l It creates a communication bottleneck -The bus bandwidth limits the maximum I/O throughput l The maximum bus speed is largely limited by -The length of the bus -The number of devices on the bus l It needs to support a range of devices with widely varying latencies and data transfer rates

16 331 Week 13.16Spring 2006 Types of Buses  Processor-Memory Bus (proprietary) l Short and high speed l Matched to the memory system to maximize the memory- processor bandwidth l Optimized for cache block transfers  I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE) l Usually is lengthy and slower l Needs to accommodate a wide range of I/O devices l Connects to the processor-memory bus or backplane bus  Backplane Bus (industry standard, e.g., PCI) l The backplane is an interconnection structure within the chassis l Used as an intermediary bus connecting I/O busses to the processor-memory bus

17 331 Week 13.17Spring 2006 A Two Bus System  I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses) l Processor-memory bus: mainly for processor-memory traffic l I/O busses: provide expansion slots for I/O devices ProcessorMemory Processor-Memory Bus I/O Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus

18 331 Week 13.18Spring 2006 A Three Bus System  A small number of Backplane Buses tap into the Processor- Memory Bus l Processor-Memory Bus is used for processor memory traffic l I/O buses are connected to the Backplane Bus  Advantage: loading on the Processor-Memory Bus is greatly reduced ProcessorMemory Processor-Memory Bus Bus Adaptor Backplane Bus Bus Adaptor Bus Adaptor I/O Bus

19 331 Week 13.19Spring 2006 I/O System Example (Apple Mac 7200) Cache Memory PCI Main Memory I/O Controller I/O Controller Graphic Terminal Network Processor  Typical of midrange to high-end desktop system in 1997 PCI Interface/ Memory Controller I/O Controller I/O Controller SCSI bus Disk CDRom Tape Processor-Memory Bus Serial portsAudio I/O

20 331 Week 13.20Spring 2006 Example: Pentium System Organization Processor-Memory Bus PCI Bus I/O Busses Memory controller (“Northbridge”) http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&

21 331 Week 13.21Spring 2006 Synchronous and Asynchronous Buses  Synchronous Bus l Includes a clock in the control lines l A fixed protocol for communication that is relative to the clock l Advantage: involves very little logic and can run very fast l Disadvantages: -Every device on the bus must run at the same clock rate -To avoid clock skew, they cannot be long if they are fast  Asynchronous Bus l It is not clocked, so requires handshaking protocol (req, ack) -Implemented with additional control lines l Advantages: -Can accommodate a wide range of devices -Can be lengthened without worrying about clock skew or synchronization problems l Disadvantage: slow(er)

22 331 Week 13.22Spring 2006 Asynchronous Handshaking Protocol 1. Memory sees ReadReq, reads addr from data lines, and raises Ack 2. I/O device sees Ack and releases the ReadReq and data lines 3. Memory sees ReadReq go low and drops Ack 4. When memory has data ready, it places it on data lines and raises DataRdy 5. I/O device sees DataRdy, reads the data from data lines, and raises Ack 6. Memory sees Ack, releases the data lines, and drops DataRdy 7. I/O device sees DataRdy go low and drops Ack  Output (read) data from memory to an I/O device. I/O device signals a request by raising ReadReq and putting the addr on the data lines 1 2 3 ReadReq Data Ack DataRdy addrdata 4 56 7

23 331 Week 13.23Spring 2006 Key Characteristics of Two Bus Standards CharacteristicFirewire (1394)USB 2.0 TypeI/O Data bus width(signals) 42 Clockingasynchronous Theoretical Peak bandwidth 50 MB/sec (Firewire 400) or 100 MB/sec (Firewire 800) 0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high) Hot plugableYesyes Max. devices63127 Max. length (copper wire) 4.5 meters5 meters

24 331 Week 13.24Spring 2006 Memory Hierarchy Technologies  Random Access l “Random” is good: access time is the same for all locations l DRAM: Dynamic Random Access Memory -High density (1 transistor cells), low power, cheap, slow -Dynamic: need to be “refreshed” regularly (~ every 8 ms) l SRAM: Static Random Access Memory -Low density (6 transistor cells), high power, expensive, fast -Static: content will last “forever” (until power turned off) l Size: DRAM/SRAM ­ 4 to 8 l Cost/Cycle time: SRAM/DRAM ­ 8 to 16  “Non-so-random” Access Technology l Access time varies from location to location and from time to time (e.g., Disk, CDROM)

25 331 Week 13.25Spring 2006 Classical SRAM Organization (~Square) rowdecoderrowdecoder row address data word RAM Cell Array word (row) select bit (data) lines Each intersection represents a 6-T SRAM cell Column Selector & I/O Circuits column address One memory row holds a block of data, so the column address selects the requested word from that block

26 331 Week 13.26Spring 2006 data bit Classical DRAM Organization (~Square Planes) rowdecoderrowdecoder row address Column Selector & I/O Circuits column address data bit word (row) select bit (data) lines Each intersection represents a 1-T DRAM cell The column address selects the requested bit from the row in each plane data word... RAM Cell Array

27 331 Week 13.27Spring 2006 RAM Memory Definitions  Caches use SRAM for speed  Main Memory is DRAM for density l Addresses divided into 2 halves (row and column) -RAS or Row Access Strobe triggering row decoder -CAS or Column Access Strobe triggering column selector  Performance of Main Memory DRAMs l Latency: Time to access one word -Access Time: time between request and when word arrives -Cycle Time: time between requests -Usually cycle time > access time l Bandwidth: How much data can be supplied per unit time -width of the data channel * the rate at which it can be used

28 331 Week 13.28Spring 2006 Classical DRAM Operation  DRAM Organization: l N rows x N column x M-bit l Read or Write M-bit at a time l Each M-bit access requires a RAS / CAS cycle Row Address CAS RAS Col AddressRow AddressCol Address 1st M-bit Access 2nd M-bit Access N rows N cols DRAM M bits Row Address Column Address M-bit Output Cycle Time

29 331 Week 13.29Spring 2006 Ways to Improve DRAM Performance  Memory interleaving  Fast Page Mode DRAMs – FPM DRAMs l www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.h tm www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.h tm  Extended Data Out DRAMs – EDO DRAMs l www.chips.ibm.com/products/memory/88H2011/88H2011.pdf www.chips.ibm.com/products/memory/88H2011/88H2011.pdf  Synchronous DRAMS – SDRAMS l www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D. htm www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D. htm  Rambus DRAMS l www.rambus.com/developer/quickfind_documents.html www.rambus.com/developer/quickfind_documents.html l www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669 B.htm www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669 B.htm  Double Data Rate DRAMs – DDR DRAMS l www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323H A.htm www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323H A.htm ...

30 331 Week 13.30Spring 2006 Increasing Bandwidth - Interleaving Access pattern without Interleaving: Start Access for D1 CPUMemory Start Access for D2 D1 available Access pattern with 4-way Interleaving: CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2 Cycle Time Access Time Access Bank 0 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again D2 available

31 331 Week 13.31Spring 2006 Problems with Interleaving  How many banks? l Ideally, the number of banks  number of clocks we have to wait to access the next word in the bank l Only works for sequential accesses (i.e., first word requested in first bank, second word requested in second bank, etc.)  Increasing DRAM sizes => fewer chips => harder to have banks l Growth bits/chip DRAM : 50%-60%/yr  Only can use for very large memory systems (e.g., those encountered in supercomputer systems)

32 331 Week 13.32Spring 2006 N rows N cols DRAM Column Address M-bit Output M bits N x M “SRAM” Row Address Fast Page Mode DRAM Operation  Fast Page Mode DRAM l N x M “SRAM” to save a row Row Address CAS RAS Col Address 1st M-bit Access Col Address 2nd M-bit3rd M-bit4th M-bit  After a row is read into the SRAM “register” l Only CAS is needed to access other M-bit blocks on that row l RAS remains asserted while CAS is toggled


Download ppt "331 Week 13.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Buses and I/O system [Adapted from Dave Patterson’s."

Similar presentations


Ads by Google