August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday, August 1, 2001 *This lecture was derived from material in the text (Chap. 8). All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
August 1, 2001Systems Architecture II2 Introduction Objective: To understand the basic principles of different I/O devices and to develop protocols for connecting I/O devices to processors and memory. To analyze and compare performance of I/O devices and communication protocols. Topics –Design issues and importance of I/O –I/O devices keyboard and monitor mouse magnetic disk network –Buses synchronous vs. asynchronous handshaking protocol bus arbitration
August 1, 2001Systems Architecture II3 Interfacing Processors and Peripherals Design Issues: –performance –resilience –expandability –different devices Performance –access latency –throughput transfer bandwidth transfers per second Interface –bus protocols –interrupts
August 1, 2001Systems Architecture II4 Types and Characteristics of I/O Devices Diverse devices –Behavior (input vs. output) –Partner (human vs. machine) –data rate
August 1, 2001Systems Architecture II5 Magnetic Disk Rotating disk with magnetic surface –3600 to 10,000 RPM –$0.10 per MB hard disk organized into platters each surface made up of tracks – tracks divided into sectors – –512 bytes per sector
August 1, 2001Systems Architecture II6 Disk Performance Average disk access time = –avg. seek time + avg. rotational delay + transfer time + controller overhead Avg. seek time (time to move head to track) –may be 25% of manufacturer reported time due to locality Avg. rotational delay –0.5 rotation/RPM Transfer time –depends on rotation speed, sector size, track density –caching used to improve transfer rate What is the average time to read a 512-byte sector from a typical disk rotating at 5400 RPM? –Average seek time = 12ms –Transfer rate = 5 MB/sec –Controller overhead = 2ms
August 1, 2001Systems Architecture II7 Disk Performance Average seek time –12 ms (advertised - averaged over all possible seeks) –3 ms (measured typically 25% of advertised) Average rotational delay –.5 rotation/5400 RPM =.5 rotation/(5400 RPM/60 sec/min) = sec = 5.6 ms Transfer time –.5KB/(5MB/sec) = sec =.1 ms Controller time –2ms Average disk access time – ms = 19.7 ms – ms = 10.7 ms
August 1, 2001Systems Architecture II8 Buses Shared communication link which uses one or more wires to connect multiple subsystems –versatile –low cost –can be bottleneck –physical limits (length of wire) –conflicting goals (fast bus access vs. high bandwidth) –must support a range of devices Types of buses –Processor-memory (short, high-speed, custom) –Backplane (high speed, often standardized, e.g. PCI) –I/O (lengthy, not directly connected to memory, multiple types of devices, often standardized, e.g. SCSI)
August 1, 2001Systems Architecture II9 Bus Configurations
August 1, 2001Systems Architecture II10 Bus Input and Output Output Operation a) Read request b) memory access c) memory transfer Input Operation a) Write request b) memory transfer
August 1, 2001Systems Architecture II11 Synchronous vs. Asynchronous Synchronous –use a clock and a synchronous protocol –fast and small –but every device must operate at same rate and –clock skew requires the bus to be short Asynchronous –don’t use a clock and instead use handshaking –can accommodate a wide variety of devices –can be lengthened
August 1, 2001Systems Architecture II12 Handshaking Protocol ReadReq –Used to indicate a read request for memory. Address put on the data lines at the same time DataRdy –Used to indicate that data is now ready on the data lines. Data is placed on data lines at the same time (set by either memory or device depending on whether it is an output or input operation) Ack –Used to acknowledge the ReadReq of DataRdy signals ReadReq and DataRdy asserted until the other party has seen the control lines and the data lines have been read. This indication is made by asserting the Ack signal.
August 1, 2001Systems Architecture II13 Handshaking Protocol
August 1, 2001Systems Architecture II14 FSM Control for Handshaking Protocol
August 1, 2001Systems Architecture II15 Performance Comparison Synchronous bus –50 ns clock –32 data bits –200 ns memory access Time –send address: 50 ns –Read memory: 200 ns –send data: 50 ns –total time = 300 ns Bandwidth –4 bytes/300 ns = 4 MB/0.3 sec = 13.3 MB/sec Asynchronous bus –40 ns per handshake –32 data bits –200 ns memory access Time –step 1: 40 ns –Max(steps 2,3,4,Read): 200 ns –Steps 5,6,7: 120 ns –total time = 360 ns Bandwidth –4 bytes/360 ns = 4 MB/0.36 sec = 11.1 MB/sec
August 1, 2001Systems Architecture II16 Improving Bus Performance Data bus width: By increasing the width of the data bus, transfers of multiple words take fewer bus cycles Separate vs. multiplexed address and data lines: Separate data and address lines will improve performance of writes since the address and data can be sent at the same time. Block transfers: Allowing the bus to transfer multiple words in back to back bus cycles without sending an address or releasing the bus reduces the time to transfer a large block.
August 1, 2001Systems Architecture II17 Performance Example Memory supports block access of bit words 64-bit synchronous bus clocked at 200 MHz (5 ns clock) with each 64-bit transfer taking 1 cycle and 1 cycle to send an address Two cycles between each bus operation 200 ns memory access time for 1st 4 words and 20 ns for each additional set of 4 words. Find the sustained bandwidth and latency to read 256 words for transfers that use 4-word blocks and 16-word blocks.
August 1, 2001Systems Architecture II18 Performance Example 4-word block transfer –1 clock cycle to send address –200 ns/(5ns/cycle) = 40 cycles to read data from memory (4 words) –2 cycles to send data (4 words) –2 cycles idle before next transfer –Total time 45 cycles (256/4) transfers = 2880 cycles = 14,400 ns –Bandwidth = (256 4) bytes/14,400 ns = MB/sec 16-word block transfer –1 clock cycle to send address –200 ns = 40 cycles to read first 4 words of data from memory –20 ns = 4 cycles to read each of the remaining 3 sets of 4 words –Send 4 words of data (4 times) 2 cycles to send 4 words of data (overlap with next read) 2 cycles idle (overlap with next read) –Total time = 4 cycles (256/16) transfers = 912 cycles = 4560 ns –Bandwidth = (256 4) bytes/4560 ns = MB/sec
August 1, 2001Systems Architecture II19 Bus Arbitration Need a mechanism to determine which device can use the bus at a given time Use a bus master to initiate and control all bus requests. –Simplest scheme uses a single bus master –Multiple bus masters are more efficient Deciding which bus master gets to use the bus next is called bus arbitration. Use priority scheme but want to maintain fairness.
August 1, 2001Systems Architecture II20 Bus Arbitration (detail) Daisy chain arbitration (e.g. VME) –Grant lines run from highest priority to lowest –High-priority device that wants access intercepts bus grant signal –Simple but cannot assure fairness and may limit speed Centralized, parallel arbitration (e.g. PCI) –Devices independently request the bus through multiple request lines –Centralized arbiter chooses which device will act as a master –The central arbiter is required and may become a bottleneck Distributed arbitration by self-selection (e.g. NuBus in Mac II) –Devices independently request the bus through multiple request lines –Devices identify themselves to the bus and broadcast their priority –Each device determines independently if it is the high-priority requestor –Drawback: requires more lines for request signals Distributed arbitration by collision (e.g. Ethernet) –Devices independently request the bus, which results in a collision. –A scheme is used for selecting among colliding parties to be a master.
August 1, 2001Systems Architecture II21 Single Bus Master (Processor)
August 1, 2001Systems Architecture II22 Daisy Chain Arbitration