I/0 devices.

I/0 devices

Recap: Busses Fundamental tool for designing and building computer systems divide the problem into independent components operating against a well defined interface processor, memory, I/O compose the components efficiently Shared collection of wires command, address, data Communication path between multiple subsystems Inexpensive Limited bandwidth Layers of a bus specification mechanical, electrical, signalling, timing, transactions

The Big Picture: Where are We Now?
Today’s Topic: I/O Systems Network Processor Processor Input Input Control Control Memory Memory Datapath Datapath Output Output

I/O System Design Issues
Performance Expandability Resilience in the face of failure Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Graphics Network interrupts

I/O Device Examples Device Behavior Partner Data Rate (KB/sec)
Keyboard Input Human Mouse Input Human Line Printer Output Human Floppy disk Storage Machine Laser Printer Output Human Optical Disk Storage Machine Magnetic Disk Storage Machine 5,000.00 Network-LAN Input or Output Machine – 1,000.00 Graphics Display Output Human 30,000.00

I/O System Performance
I/O System performance depends on many aspects of the system (“limited by weakest link in the chain”): The CPU The memory system: Internal and external caches Main Memory The underlying interconnection (buses) The I/O controller The I/O device The speed of the I/O software (Operating System) The efficiency of the software’s use of the I/O devices Two common performance metrics: Throughput: I/O bandwidth Response time: Latency Even if we look at performance alone, I/O performance is not as easy to quantify as CPU performance because I/O performance depends on many other aspect of the system. Besides the obvious factors (the last four) such as the speed of the I/O devices and their controllers and software, the CPU, the memory system, as well as the underlying interconnect also play a major role in determining the I/O performance of a computer. Two common I/O performance metrics are I/O throughput, also known as I/O bandwidth and I/O response time. Also known as I/O latency.

Simple Producer-Server Model
Queue Server Throughput: The number of tasks completed by the server in unit time In order to get the highest possible throughput: The server should never be idle The queue should never be empty Response time: Begins when a task is placed in the queue Ends when it is completed by the server In order to minimize the response time: The queue should be empty The server will be idle Response time and throughput are related by this producer-server model. Throughput is the number of tasks completed by the server in unit time while response time begins when a task is placed in the queue and ends when it is completed by the server. In order to get the highest possible throughput, the server should never be idle so the queue should never be empty. But in order to minimize response time, you want the queue to be empty so the server is idle and can serve you as soon as place the order. So obviously, like many other things in life, throughput and response time is a tradeoff.

Throughput versus Response Time
Time (ms) 300 200 100 This is shown here in this response time versus percentage of maximum throughput plot. In order to get the last few percentage of maximum throughput, you really have to pay a steep price in response time. Notice here the horizontal scale is in percentage of maximum throughput: that is this tradeoff (curve) is in terms of relative throughput. The absolute maximum throughput can be increased without sacrificing response time. 20% 40% 60% 80% 100% Percentage of maximum throughput

Throughput Enhancement
Server Queue Producer Queue Server In general throughput can be improved by: Throwing more hardware at the problem reduces load-related latency Response time is much harder to reduce

Magnetic Disk Purpose: Long term, nonvolatile storage
Large, inexpensive, and slow Lowest level in the memory hierarchy Two major types: Floppy disk Hard disk Both types of disks: Rely on a rotating platter coated with a magnetic surface Use a moveable read/write head to access the disk Advantages of hard disks over floppy disks: Platters are more rigid ( metal or glass) so they can be larger Higher density because it can be controlled more precisely Higher data rate because it spins faster Can incorporate more than one platter Registers Cache Memory Disk The purpose of the magnetic disk is to provide long term, non-volatile storage. Disks are large in terms of capacity, inexpensive, but slow so they reside at the lowest level in the memory hierarchy. There are 2 types of disks: floppy and hard drives. Both types relay on a rotating platter coasted with a magnetic surface and a movable head is used to access the disk. The advantages of hard disks over floppy disks are: (a) Platters are made of metal or glass so they are more rigid and can be larger. (b) Hard disk also has higher density because it can be controlled more precisely. (c) Hard disk also has higher data rate because it can spin faster. (d) Finally, each hard disk drive can incorporate more than one platter.

Organization of a Hard Magnetic Disk
Platters Track Sector Typical numbers (depending on the disk size): 500 to 2,000 tracks per surface 32 to 128 sectors per track A sector is the smallest unit that can be read or written Traditionally all tracks have the same number of sectors: Constant bit density: record more sectors on the outer tracks Here is a primitive picture showing you how a disk drive can have multiple platters. Each surface on the platter are divided into tracks and each track is further divided into sectors. A sector is the smallest unit that can be read or written. By simple geometry you know the outer track have more area and you would thing the outer tack will have more sectors. This, however, is not the case in traditional disk design where all tracks have the same number of sectors. Well, you will say, this is dumb but dumb is the reason they do it . By keeping the number of sectors the same, the disk controller hardware and software can be dumb and does not have to know which track has how many sectors. With more intelligent disk controller hardware and software, it is getting more popular to record more sectors on the outer tracks. This is referred to as constant bit density.

Magnetic Disk Characteristic
Track Sector Cylinder: all the tracks under the head at a given point on all surfaces Read/write data is a three-stage process: Seek time: position the arm over the proper track Rotational latency: wait for the desired sector to rotate under the read/write head Transfer time: transfer a block of bits (sector) under the read-write head Average seek time as reported by the industry: Typically in the range of 8 ms to 12 ms (Sum of the time for all possible seek) / (total # of possible seeks) Due to locality of disk reference, actual average seek time may: Only be 25% to 33% of the advertised number Cylinder Platter Head To read write information into a sector, a movable arm containing a read/write head is located over each surface. The term cylinder is used to refer to all the tracks under the read/write head at a given point on all surfaces. To access data, the operating system must direct the disk through a 3-stage process. (a) The first step is to position the arm over the proper track. This is the seek operation and the time to complete this operation is called the seek time. (b) Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. This is referred to as the rotational latency. (c) Finally, once the desired sector is under the read/write head, the data transfer can begin. The average seek time as reported by the manufacturer is in the range of 8 ms to 12ms and is calculated as the sum of the time for all possible seeks divided by the number of possible seeks. This number is usually on the pessimistic side because due to locality of disk reference, the actual average seek time may only be 25 to 33% of the number published.

Typical Numbers of a Magnetic Disk
Track Sector Rotational Latency: Most disks rotate at 3,600 to 7200 RPM Approximately 16 ms to 8 ms per revolution, respectively An average latency to the desired information is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM Transfer Time is a function of : Transfer size (usually a sector): 1 KB / sector Rotation speed: 3600 RPM to 7200 RPM Recording density: bits per inch on a track Diameter typical diameter ranges from 2.5 to 5.25 in Typical values: 2 to 12 MB per second Cylinder Platter Head As far as rotational latency is concerned, most disks rotate at 3,600 RPM or approximately 16 ms per revolution. Since on average, the information you desired is half way around the disk, the average rotational latency will be 8ms. The transfer time is a function of transfer size, rotation speed, and recording density. The typical transfer speed is 2 to 4 MB per second. Notice that the transfer time is much faster than the rotational latency and seek time. This is similar to the DRAM situation where the DRAM access time is much shorter than the DRAM cycle time. ***** Do anybody remember what we did to take advantage of the short access time versus cycle time? Well, we interleave!

Disk I/O Performance Request Rate Service Rate l m Disk Controller
Queue Processor Disk Controller Disk Queue Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queueing Delay Estimating Queue Length: Utilization = U = Request Rate / Service Rate Mean Queue Length = U / (1 - U) As Request Rate -> Service Rate Mean Queue Length -> Infinity Similarly for disk access, to take advantage of the short transfer time relative to the seek and rotational delay, we can have multiple disks. Since the transfer time is often a small fraction of a full disk access, the controller in higher performance system will disconnect the data path from the disk while it is seeking (upper) so the other disks (lower) can transfer their data to memory. Furthermore, in order to handle a sudden burst of disk access requests gracefully, each disk can have its own queue to accept more request before the previous request is completed. How long does the queue have to be? Well we can calculate based on queuing theory. If the Utilization is defined as Request Rate over Service Rate, then the Mean Q Length is the utilization rate over 1 minus the utilization rate. What happens if the utilization rate approaches one? That is when the request rate approaches the service rate? Well, the mean queue length will go to infinity. In other words, no matter how long you make the queue, the queue will overflow and the processor needs to stall to wait for the queue to be empty before making any more request. ***** Where have we seen a similar situation before where we are putting things into the Q faster than we can empty it?

Example 512 byte sector, rotate at 5400 RPM, advertised seeks is 12 ms, transfer rate is 4 MB/sec, controller overhead is 1 ms, queue idle so no service time Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queueing Delay Disk Access Time = 12 ms / 5400 RPM KB / 4 MB/s + 1 ms + 0 Disk Access Time = 12 ms / 90 RPS / 1024 s + 1 ms + 0 Disk Access Time = 12 ms ms ms + 1 ms + 0 ms Disk Access Time = 18.6 ms If real seeks are 1/3 advertised seeks, then its 10.6 ms, with rotation delay at 50% of the time!

Reliability and Availability
Two terms that are often confused: Reliability: Is anything broken? Availability: Is the system still available to the user? Availability can be improved by adding hardware Reliability can only be improved by: Bettering environmental conditions Building more reliable components Building with fewer components Improve availability may come at the cost of lower reliability This bring us to two terms that are often confused: reliability and availability. Here is the proper distinction: reliability asks the question is anything broken? Availability, on the other hand, ask the question is the system still availability to the user? Adding hardware can therefore improve availability. For example, an airplane with two engines is more “available” than an airplane with one engine. Reliability, on the other hand, can only be improved by bettering environmental conditions, building more reliable components, or reduce the number of components in the system. Notice that by adding hardware to improve availability, you may actually reduce reliability. For example, an airplane with two engines is twice as likely to have an engine failure than an airplane with only one engine so its reliability is lower although its availability is higher.

Disk Arrays A new organization of disk storage:
Arrays of small and inexpensive disks Increase potential throughput by having many disk drives: Data is spread over multiple disks Multiple accesses are made to several disks Reliability is lower than a single disk: But availability can be improved by adding redundant disks (RAID): Lost information can be reconstructed from redundant information The discussion of reliability and availability brings us to a new organization of disk storage where arrays of small and inexpensive disks are used to increase the potential throughput. This is how it works: Data is spread over multiple disk so multiple accesses can be made to several disks either via interleaving or done in parallel. While disk arrays improve throughput, latency is not necessary improved. Also with N disks in the disk array, its reliability is only 1 over N the reliability of a single disk. But availability can be improved by adding redundant disks so lost information can be reconstructed from redundant information. Since mean time to repair is measured in hours and MTTF is measured in years, redundancy can make the availability of disk arrays much higher than that of a single disk. +2 = 46 min. (Y:26)

Giving Commands to I/O Devices
Two methods are used to address the device: Special I/O instructions Memory-mapped I/O Special I/O instructions specify: Both the device number and the command word Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus’s data lines Memory-mapped I/O: Portions of the address space are assigned to I/O device Read and writes to those addresses are interpreted as commands to the I/O devices User programs are prevented from issuing I/O operations directly: The I/O address space is protected by the address translation How does the OS give commands to the I/O device? There are two methods. Special I/O instructions and memory-mapped I/O. If special I/O instructions are used, the OS will use the I/O instruction to specify both the device number and the command word. The processor then executes the special I/O instruction by passing the device number to the I/O device (in most cases) via a set of control lines on the bus and at the same time sends the command to the I/O device using the bus’s data lines. Special I/O instructions are not used that widely. Most processors use memory-mapped I/O where portions of the address space are assigned to the I/O device. Read and write to this special address space are interpreted by the memory controller as I/O commands and the memory controller will do right thing to communicate with the I/O device Why is memory-mapeed I/O so popular? Well, it is popular because we can use the same protection mechanism we already implemented for virtual memory to prevent the user from issuing commands to the I/O device directly.

I/O Device Notifying the OS
The OS needs to know when: The I/O device has completed an operation The I/O operation has encountered an error This can be accomplished in two different ways: Polling: The I/O device put information in a status register The OS periodically check the status register I/O Interrupt: Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. After the OS has issued a command to the I/O device either via a special I/O instruction or by writing to a location in the I/O address space, the OS needs to be notified when: (a) The I/O device has completed the operation. (b) Or when the I/O device has encountered an error. This can be accomplished in two different ways: Polling and I/O interrupt.

Polling: Programmed I/O
CPU IOC device Memory Is the data ready? busy wait loop not an efficient way to use the CPU unless the device is very fast! no yes read data but checks for I/O completion can be dispersed among computation intensive code store data done? no In Polling, the I/O device puts information in a status register and the OS periodically checks it (the busy loop) to see if the data is ready or if an error condition has occurred. If the data is ready, fine: read the data and move on. If not, we stay in this loop and try again at a later time. The advantage of this approach is simple: the processor is totally in control and does all the work but the processor in total control is also the problem. Needless to say, polling overhead can consume a lot of CPU time if the device is very fast. For this reason (Disadvantage), most I/O devices notify the processor via I/O interrupt. yes Advantage: Simple: the processor is totally in control and does all the work Disadvantage: Polling overhead can consume a lot of CPU time

Interrupt Driven Data Transfer
add sub and or nop CPU IOC device Memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr read store ... rti interrupt service routine : (4) memory Advantage: User program progress is only halted during actual transfer Disadvantage, special hardware is needed to: Cause an interrupt (I/O device) Detect an interrupt (processor) Save the proper states to resume after the interrupt (processor) That is, whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. This is how an I/O interrupt looks in the overall scheme of things. The processor is minding its business when one of the I/O device wants its attention and causes an I/O interrupt. The processor then saves the current PC, branches to the address where the interrupt service routine resides, and start executing the interrupt service routine. When it finishes executing the interrupt service routine, it branches back to the point of the original program where we stop and continue. The advantage of this approach is efficiency. The user program’s progress is halted only during actual transfer. The disadvantage is that it requires special hardware in the I/O device to generate the interrupt. And on the processor side, we need special hardware to detect the interrupt and then to save the proper states so we can resume after the interrupt.

I/O Interrupt An I/O interrupt is just like the exceptions except:
An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution: I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion You can pick your own convenient point to take an interrupt I/O interrupt is more complicated than exception: Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies: Interrupt request needs to be prioritized How is an I/O interrupt different from the exception you already learned? Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction. Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed. Let me elaborate on each of these two points. Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs. So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt. As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt. Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this.

Delegating I/O Responsibility from the CPU: DMA
CPU sends a starting address, direction, and length count to DMAC. Then issues "start". Direct Memory Access (DMA): External to the CPU Act as a master on the bus Transfer blocks of data to or from memory without CPU intervention CPU Memory DMAC IOC Finally, lets see how we can delegate some of the I/O responsibilities from the CPU. The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time. The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention. This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command. The DMA controller then take over from there and provides handshake signals required to complete the entire block transfer. So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short. device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

Delegating I/O Responsibility from the CPU: IOP
main memory bus Mem Dn I/O bus target device where cmnds are OP Device Address CPU IOP (1) Issues instruction to IOP (4) IOP interrupts CPU when done IOP looks in memory for commands (2) OP Addr Cnt Other The IOP is so smart that the CPU only needs to issue a simple instruction (Op, Device, Address) that tells them what is the target device and where to find more commands (Addr). The IOP will then fetch commands such as this (OP, Addr, Cnt, Other) from memory and do all the necessary data transfer between the I/O device and the memory system. The IOP will do the transfer at the background and it will not affect the CPU because it will access the memory only when the CPU is not using it: this is called stealing memory cycles. Only when the IOP finishes its operation will it interrupts the CPU. (3) memory what to do special requests Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. where to put data how much

Responsibilities of the Operating System
The operating system acts as the interface between: The I/O hardware and the program that requests I/O Three characteristics of the I/O systems: The I/O system is shared by multiple program using the processor I/O systems often use interrupts (externally generated exceptions) to communicate information about I/O operations. Interrupts must be handled by the OS because they cause a transfer to supervisor mode The low-level control of an I/O device is complex: Managing a set of concurrent events The requirements for correct device control are very detailed The OS acts as the interface between the I/O hardware and the program that requests I/O. The responsibilities of the operating system arise from 3 characteristics of the I/O systems: (a) First the I/O system is shared by multiple programs using the processor. (b) I/O system, as I will show you, often use interrupts to communicate information about I/O operation and interrupt must be handled by the OS. (c) Finally, the low-level control of an I/O device is very complex so we should leave to those crazy kernel programers to handle them.

Operating System Requirements
Provides protection to shared I/O resources Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights Provides abstraction for accessing devices: Supply routines that handle low-level device operation Handles the interrupts generated by I/O devices Provides equitable access to the shared I/O resources All user programs must have equal access to the I/O resources Schedules accesses in order to enhance system throughput Here is a list of the function the OS must provide. First it must guarantee that a user’s program can only access the portion of an I/O device that it has the right to do so. Then the OS must hide low level complexity from the user by suppling routines that handle low-level device operation. The OS also needs to handle the interrupts generated by I/O devices. And the OS must be be fair: all user programs must have equal access to the I/O resources. Finally, the OS needs to schedule accesses in a way that system throughput is enhanced.

OS and I/O Systems Communication Requirements
The Operating System must be able to prevent: The user program from communicating with the I/O device directly If user programs could perform I/O directly: Protection to the shared I/O resources could not be provided Three types of communication are required: The OS must be able to give commands to the I/O devices The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error Data must be transferred between memory and an I/O device The OS must be able to communicate with the I/O system but at the same time, the OS must be able to prevent the user from communicating with the I/O device directly. Why? Because if user programs could perform I/O directly, we would not be able to provide protection to the shared I/O device. Three types of communications are required: (1) First the OS must be able to give commands to the I/O devices. (2) Secondly, the device e must be able to notify the OS when the I/O device has completed an operation or has encountered an error. (3) Data must be transferred between memory and an I/O device.

Multimedia Bandwidth Requirements
High Quality Video Digital Data = (30 frames / second) (640 x 480 pels) (24-bit color / pel) = 221 Mbps (75 MB/s) Reduced Quality Video Digital Data = (15 frames / second) (320 x 240 pels) (16-bit color / pel) = 18 Mbps (2.2 MB/s) High Quality Audio Digital Data = (44,100 audio samples / sec) (16-bit audio samples) (2 audio channels for stereo) = 1.4 Mbps Reduced Quality Audio Digital Data = (11,050 audio samples / sec) (8-bit audio samples) (1 audio channel for monaural) = 0.1 Mbps compression changes the whole story!

Multimedia and Latency
How sensitive is your eye / ear to variations in audio / video rate? How can you ensure constant rate of delivery? Jitter (latency) bounds vs constant bit rate transfer Synchronizing audio and video streams you can tolerate ms early to ms late

Summary: I/O performance is limited by weakest link in chain between OS and device Disk I/O Benchmarks: I/O rate vs. Data rate vs. latency Three Components of Disk Access Time: Seek Time: advertised to be 8 to 12 ms. May be lower in real life. Rotational Latency: 4.1 ms at 7200 RPM and 8.3 ms at 3600 RPM Transfer Time: 2 to 12 MB per second I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronous Delegating I/O responsibility from the CPU: DMA, or even IOP wide range of devices multimedia and high speed networking poise important challenges First we showed you the diversity of I/O requirements by talking about three categories of disk I/O benchmarks. Supercomputer application’s main concern is data rate. The main concern of transaction processing is I/O rate and file system’s main concern is file access. Then we talk about magnetic disk. One thing to remember is that disk access time has 3 components. The first 2 components, seek time and rotational latency involves mechanical moving parts and are therefore very slow compare to the 3rd component, the transfer time. One good thing about the seek time is that this is probably one of the few times in life that you can actually get better than the “advertised” result due to the locality of disk access. As far as graphic display is concerned, resolution is the basic measurement of how much information is on the screen and is usually described as “some number” by “some number.” The first “some number” is the horizontal resolution in number of pixels while the second “some number” is the vertical resolution in number of scan lines. Then I showed you how the size as well as bandwidth requirement of a Color Frame Buffer can be reduced if a Color Map is placed between the Frame Buffer and the graphic display. Finally, we talked about a spacial memory, the VRAM, that can be used to construct the Frame Buffer. It is nothing but a DRAM core with a high speed shift register attach to it. That’s all I have for today and we will continue our discussion on I/O Friday.

I/0 devices.

Similar presentations

Presentation on theme: "I/0 devices."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I/0 devices.

Similar presentations

Presentation on theme: "I/0 devices."— Presentation transcript:

Similar presentations

About project

Feedback