The Big Picture: Where are We Now?

Slides:



Advertisements
Similar presentations
I/O Management and Disk Scheduling
Advertisements

I/O InterfaceCS510 Computer ArchitecturesLecture Lecture 17 I/O Interfaces and I/O Busses.
IT253: Computer Organization
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
1  1998 Morgan Kaufmann Publishers Interfacing Processors and Peripherals.
CPE 442 io.1 Introduction To Computer Architecture CpE 442 I/O Systems.
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
EE30332 Ch8 DP – 1 Ch 8 Interfacing Processors and Peripherals Buses °Fundamental tool for designing and building computer systems divide the problem into.
Datorteknik F1 bild 1 The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output Control Datapath Memory Processor Input Output.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
1  1998 Morgan Kaufmann Publishers Chapter 8 Storage, Networks and Other Peripherals.
11/18/2004Comp 120 Fall November 3 classes to go No class on Tuesday 23 November Last 2 classes will be survey and exam review Interconnect and.
1  1998 Morgan Kaufmann Publishers Chapter 8 Interfacing Processors and Peripherals.
11/29/2005Comp 120 Fall November 4 to go! Questions? Interrupts and I/O devices.
Device Management.
S.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Chapter 8: Part II Storage, Network and Other Peripherals.
1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.
I/0 devices.
Chapter 2: Computer-System Structures
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
General System Architecture and I/O.  I/O devices and the CPU can execute concurrently.  Each device controller is in charge of a particular device.
Storage & Peripherals Disks, Networks, and Other Devices.
3/11/2002CSE Input/Output Input/Output Control Datapath Memory Processor Input Output Memory Input Output Network Control Datapath Processor.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
CPE 442 io.1 Introduction To Computer Architecture CpE 442 I/O Systems.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
1 CSE Department MAITSandeep Tayal Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Input/Output CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent University.
1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.
Storage Systems CS465 Lecture 12 CS465.
Cs 152 L19.io. 1 DAP Fa97,  U.CB I/O System Design Issues Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk I/O Controller I/O Controller.
CS2100 Computer Organisation Input/Output (AY2010/2011) Semester 2 Adapted from.
Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection Network Structure.
2.1 Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General.
Lecture 35: Chapter 6 Today’s topic –I/O Overview 1.
Datorteknik F1 bild 1 The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output Control Datapath Memory Processor Input Output.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 2 Computer-System Structures Slide 1 Chapter 2 Computer-System Structures.
CS2100 Computer Organisation Input/Output – Own reading only (AY2015/6) Semester 1 Adapted from David Patternson’s lecture slides:
Csci 136 Computer Architecture II – IO and Storage Systems Xiuzhen Cheng
Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
بسم الله الرحمن الرحيم MEMORY AND I/O.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
10/15: Lecture Topics Input/Output –Types of I/O Devices –How devices communicate with the rest of the system communicating with the processor communicating.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 25 I/O Systems May 2,
MicroComputer Engineering IO slide 1 The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output Control Datapath Memory Processor.
CMSC 611: Advanced Computer Architecture I/O & Storage Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSCE 385: Computer Architecture Spring 2014 Dr. Mike Turi I/O.
Chapter 2: Computer-System Structures
The Big Picture: Where are We Now?
Computer Architecture
Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.
Module 2: Computer-System Structures
Computer System Design Lecture 10
Module 2: Computer-System Structures
CSC3050 – Computer Architecture
Module 2: Computer-System Structures
Module 2: Computer-System Structures
Presentation transcript:

The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output Network In terms of the overall picture, we started out from the left showing you how to design a processor’s datapath and control. The three lectures before the Spring break covered the memory system. Today we will cover the Input and Output devices. Friday, we will talk about how to interface the I/O devices to the processor and memory via busses and the OS software. Next Wednesday, we will show you how multiple computers can be connected together with a network through the I/O devices. +1 = 5 min. (X:45)

I/O System Design Issues Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Graphics Network interrupts Performance Expandability Resilience in the face of failure This is a more in-depth picture of the I/O system of a typical computer. The I/O devices are shown here to be connected to the computer via I/O controllers that sit on the Memory I/O busses. We will talk about buses on Friday. For now, notice that I/O system is inherently more irregular than the Processor and the Memory system because all the different devices (disk, graphics) that can attach to it. So when one designs an I/O system, performance is still an important consideration. But besides raw performance, one also has to think about expandability and resilience in the face of failure. For example, one has to ask questions such as: (a)[Expandability]: is there any easy way to connect another disk to the system? (b) And if this I/O controller (network) fails, is it going to affect the rest of the network? +2 = 7 min (X:47)

I/O Device Examples Keyboard Input Human 0.01 Mouse Input Human 0.02 Device Behavior Partner Data Rate (KB/sec) Keyboard Input Human 0.01 Mouse Input Human 0.02 Line Printer Output Human 1.00 Floppy disk Storage Machine 50.00 Laser Printer Output Human 100.00 Optical Disk Storage Machine 500.00 Magnetic Disk Storage Machine 5,000.00 Network-LAN Inp or Outp Machine 20 – 1,000.00 Graph. Display Output Human 30,000.00 Here are some examples of the various I/O devices you are probably familiar with. Notice that most I/O devices that has human as their partner usually has relatively low peak data rates because human in general are slow relatively to the computer system. The exceptions are the laser printer and the graphic displays. Laser printer requires a high data rate because it takes a lot of bits to describe high resolution image you like to print by the laser writer. The graphic display requires a high data rate because as I will show you later in today’s lecture, all the color objects we see in the real world and taken for granted is very hard to replicate on a graphic display. Let’s take a closer look at one of the most popular storage device, magnetic disks. +2 = 28 min. (Y:08)

I/O System Performance I/O System performance depends on many aspects of the system (“limited by weakest link in the chain”): The CPU The memory system: Internal and external caches Main Memory The underlying interconnection (buses) The I/O controller The I/O device The speed of the I/O software (Operating System) The efficiency of the software’s use of the I/O devices Two common performance metrics: Throughput: I/O bandwidth Response time: Latency Even if we look at performance alone, I/O performance is not as easy to quantify as CPU performance because I/O performance depends on many other aspect of the system. Besides the obvious factors (the last four) such as the speed of the I/O devices and their controllers and software, the CPU, the memory system, as well as the underlying interconnect also play a major role in determining the I/O performance of a computer. Two common I/O performance metrics are I/O throughput, also known as I/O bandwidth and I/O response time. Also known as I/O latency. +1 = 8 min. (X:48)

Simple Producer-Server Model Queue Throughput: The number of tasks completed by the server in unit time In order to get the highest possible throughput: The server should never be idle The queue should never be empty Response time: Begins when a task is placed in the queue Ends when it is completed by the server In order to minimize the response time: The queue should be empty The server will be idle Response time and throughput are related by this producer-server model. Throughput is the number of tasks completed by the server in unit time while response time begins when a task is placed in the queue and ends when it is completed by the server. In order to get the highest possible throughput, the server should never be idle so the queue should never be empty. But in order to minimize response time, you want the queue to be empty so the server is idle and can serve you as soon as place the order. So obviously, like many other things in life, throughput and response time is a tradeoff. +2 = 10 min. (X:50)

Throughput versus Respond Time 20% 40% 60% 80% 100% Response Time (ms) 100 200 300 Percentage of maximum throughput This is shown here in this response time versus percentage of maximum throughput plot. In order to get the last few percentage of maximum throughput, you really have to pay a steep price in response time. Notice here the horizontal scale is in percentage of maximum throughput: that is this tradeoff (curve) is in terms of relative throughput. The absolute maximum throughput can be increased without sacrificing response time. +1 = 11 min. (X:51)

Throughput Enhancement Producer Server Queue For example, one way to improve the maximum throughput without sacrificing the response time is to add another server. This bring us to an interesting fact, or joke, in I/O system and network design which says throughput is easy to improve because you can always throw more hardware at the problem. Response time, however, mis much harder to reduce, because ultimately it is limited by the speed of light and you cannot bribe God. Even though a lot of people do try by going to church regularly. +1 = 12 min. (X:52) In general throughput can be improved by: Throwing more hardware at the problem reduces load-related latency Response time is much harder to reduce: Ultimately it is limited by the speed of light (but we’re far from it)

I/O Benchmarks for Magnetic Disks Supercomputer application: Large-scale scientific problems => large files One large read and many small writes to snapshot computation Data Rate: MB/second between memory and disk Transaction processing: Examples: Airline reservations systems and bank ATMs Small changes to large sahred software I/O Rate: No. disk accesses / second given upper limit for latency File system: Measurements of UNIX file systems in an engineering environment: 80% of accesses are to files less than 10 KB 90% of all file accesses are to data with sequential addresses on the disk 67% of the accesses are reads, 27% writes, 6% read-write I/O Rate & Latency: No. disk accesses /second and response time Well you cannot talk about performance without also talk about benchmarks. As far as I/O performance benchmarks that deal with magnetic disks is concerned, they can be divided into three categories. I/O benchmarks that deal with large scale scientific problems that are usually found in supercomputer application. Then there are I/O benchmarks that are for transaction processing. An example of transaction processing is airline reservation system. ****** Can anyone give me another example of transaction processing? Your bank ATM. Finally, there are I/O benchmarks that measure file system performance. +2 = 14 min. (X:54)

Magnetic Disk Purpose: Two major types: Both types of disks: Long term, nonvolatile storage Large, inexpensive, and slow Lowest level in the memory hierarchy Two major types: Floppy disk Hard disk Both types of disks: Rely on a rotating platter coated with a magnetic surface Use a moveable read/write head to access the disk Advantages of hard disks over floppy disks: Platters are more rigid ( metal or glass) so they can be larger Higher density because it can be controlled more precisely Higher data rate because it spins faster Can incorporate more than one platter Registers Cache Memory Disk The purpose of the magnetic disk is to provide long term, non-volatile storage. Disks are large in terms of capacity, inexpensive, but slow so they reside at the lowest level in the memory hierarchy. There are 2 types of disks: floppy and hard drives. Both types relay on a rotating platter coasted with a magnetic surface and a movable head is used to access the disk. The advantages of hard disks over floppy disks are: (a) Platters are made of metal or glass so they are more rigid and can be larger. (b) Hard disk also has higher density because it can be controlled more precisely. (c) Hard disk also has higher data rate because it can spin faster. (d) Finally, each hard disk drive can incorporate more than one platter. +2 = 30 min. (Y:10)

Organization of a Hard Magnetic Disk Platters Track Sector Here is a primitive picture showing you how a disk drive can have multiple platters. Each surface on the platter are divided into tracks and each track is further divided into sectors. A sector is the smallest unit that can be read or written. By simple geometry you know the outer track have more area and you would thing the outer tack will have more sectors. This, however, is not the case in traditional disk design where all tracks have the same number of sectors. Well, you will say, this is dumb but dumb is the reason they do it . By keeping the number of sectors the same, the disk controller hardware and software can be dumb and does not have to know which track has how many sectors. With more intelligent disk controller hardware and software, it is getting more popular to record more sectors on the outer tracks. This is referred to as constant bit density. +2 = 32 min. (Y:12) Typical numbers (depending on the disk size): 500 to 2,000 tracks per surface 32 to 128 sectors per track A sector is the smallest unit that can be read or written Traditionally all tracks have the same number of sectors: Constant bit density: record more sectors on the outer tracks Recently relaxed: constant bit size, speed varies with track location

Magnetic Disk Characteristic Cylinder Sector Track Head Platter Cylinder: all the tacks under the head at a given point on all surface Read/write data is a three-stage process: Seek time: position the arm over the proper track Rotational latency: wait for the desired sector to rotate under the read/write head Transfer time: transfer a block of bits (sector) under the read-write head Average seek time as reported by the industry: Typically in the range of 8 ms to 12 ms (Sum of the time for all possible seek) / (total # of possible seeks) Due to locality of disk reference, actual average seek time may only be 25% to 33% of the advertised number To read write information into a sector, a movable arm containing a read/write head is located over each surface. The term cylinder is used to refer to all the tracks under the read/write head at a given point on all surfaces. To access data, the operating system must direct the disk through a 3-stage process. (a) The first step is to position the arm over the proper track. This is the seek operation and the time to complete this operation is called the seek time. (b) Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. This is referred to as the rotational latency. (c) Finally, once the desired sector is under the read/write head, the data transfer can begin. The average seek time as reported by the manufacturer is in the range of 12 ms to 20ms and is calculated as the sum of the time for all possible seeks divided by the number of possible seeks. This number is usually on the pessimistic side because due to locality of disk reference, the actual average seek time may only be 25 to 33% of the number published. +2 = 34 min. (Y:14)

Typical Numbers of a Magnetic Disk Sector Track Cylinder Head Platter Rotational Latency: Most disks rotate at 3,600 to 7200 RPM Approximately 16 ms to 8 ms per revolution, respectively An average latency to the desired information is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM Transfer Time is a function of : Transfer size (usually a sector): 1 KB / sector Rotation speed: 3600 RPM to 7200 RPM Recording density: bits per inch on a track Diameter typical diameter ranges from 2.5 to 5.25 in Typical values: 2 to 12 MB per second As far as rotational latency is concerned, most disks rotate at 3,600 RPM or approximately 16 ms per revolution. Since on average, the information you desired is half way around the disk, the average rotational latency will be 8ms. The transfer time is a function of transfer size, rotation speed, and recording density. The typical transfer speed is 2 to 4 MB per second. Notice that the transfer time is much faster than the rotational latency and seek time. This is similar to the DRAM situation where the DRAM access time is much shorter than the DRAM cycle time. ***** Do anybody remember what we did to take advantage of the short access time versus cycle time? Well, we interleave! +2 = 36 min. (Y:16)

Disk I/O Performance Processor Queue Disk Controller   Service Rate Request Rate Similarly for disk access, to take advantage of the short transfer time relative to the seek and rotational delay, we can have multiple disks. Since the transfer time is often a small fraction of a full disk access, the controller in higher performance system will disconnect the data path from the disk while it is seeking (upper) so the other disks (lower) can transfer their data to memory. Furthermore, in order to handle a sudden burst of disk access requests gracefully, each disk can have its own queue to accept more request before the previous request is completed. How long does the queue have to be? Well we can calculate based on queuing theory. If the Utilization is defined as Request Rate over Service Rate, then the Mean Q Length is the utilization rate over 1 minus the utilization rate. What happens if the utilization rate approaches one? That is when the request rate approaches the service rate? Well, the mean queue length will go to infinity. In other words, no matter how long you make the queue, the queue will overflow and the processor needs to stall to wait for the queue to be empty before making any more request. ***** Where have we seen a similar situation before where we are putting things into the Q faster than we can empty it? + 3 = 39 min. (Y:19) Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queueing Delay Estimating Queue Length: Utilization = U = Request Rate / Service Rate Mean Queue Length = U / (1 - U) As Request Rate -> Service Rate Mean Queue Length -> Infinity

Example 512 byte sector, rotate at 5400 RPM, advertised seeks is 12 ms, transfer rate is 4 MB/sec, controller overhead is 1 ms, queue idle so no service time Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queueing Delay Disk Access Time = 12 ms + 0.5 / 5400 RPM + 0.5 KB/4 MB/s + 1 ms + 0 Disk Access Time = 12 ms + 0.5 / 90 RPS + 0.125 / 1024 s + 1 ms + 0 Disk Access Time = 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0 ms Disk Access Time = 18.6 ms If real seeks are 1/3 advertised seeks, then its 10.6 ms, with rotation delay at 50% of the time!

Reliability and Availability Two terms that are often confused: Reliability: Is anything broken? Availability: Is the system still available to the user? Availability can be improved by adding hardware: Example: adding ECC (Error Correcting Code) on memory Reliability can only be improved by: Bettering environmental conditions Building more reliable components Building with fewer components Improve availability may come at the cost of lower reliability This bring us to two terms that are often confused: reliability and availability. Here is the proper distinction: reliability asks the question is anything broken? Availability, on the other hand, ask the question is the system still availability to the user? Adding hardware can therefore improve availability. For example, an airplane with two engines is more “available” than an airplane with one engine. Reliability, on the other hand, can only be improved by bettering environmental conditions, building more reliable components, or reduce the number of components in the system. Notice that by adding hardware to improve availability, you may actually reduce reliability. For example, an airplane with two engines is twice as likely to have an engine failure than an airplane with only one engine so its reliability is lower although its availability is higher. +2 = 44 min. (Y:24)

Disk Arrays A new organization of disk storage: Arrays of small and inexpensive disks Increase potential throughput by having many disk drives: Data is spread over multiple disk Multiple accesses are made to several disks Reliability is lower than a single disk: But availability can be improved by adding redundant disks (RAID): Lost information can be reconstructed from redundant information MTTR: mean time to repair is in the order of hours MTTF: mean time to failure of disks is tens of years The discussion of reliability and availability brings us to a new organization of disk storage where arrays of small and inexpensive disks are used to increase the potential throughput. This is how it works: Data is spread over multiple disk so multiple accesses can be made to several disks either via interleaving or done in parallel. While disk arrays improve throughput, latency is not necessary improved. Also with N disks in the disk array, its reliability is only 1 over N the reliability of a single disk. But availability can be improved by adding redundant disks so lost information can be reconstructed from redundant information. Since mean time to repair is measured in hours and MTTF is measured in years, redundancy can make the availability of disk arrays much higher than that of a single disk. +2 = 46 min. (Y:26)

Optical Compact Disks Disadvantage: It is primarily read-only media Advantages of Optical Compact Disk: It is removable It is inexpensive to manufacture Have the potential to compete with new tape technologies for archival storage Another challenger to magnetic disk as secondary storage device is optical disks or CDs. The drawback of using CDs as secondary storage is that it is read-only. The advantages of optical compact disk is that it is removable, inexpensive to manufacture, and some of them are write-once, which means you can make one reliable write to them. The write-once feature gives CD the potential to compete with new tape technologies for archival storage. +1 = 49 min. (Y:29)

Giving Commands to I/O Devices Two methods are used to address the device: Special I/O instructions Memory-mapped I/O Special I/O instructions specify: Both the device number and the command word Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus’s data lines Memory-mapped I/O: Portions of the address space are assigned to I/O device Read and writes to those addresses are interpreted as commands to the I/O devices User programs are prevented from issuing I/O operations directly: The I/O address space is protected by the address translation How do the OS give commands to the I/O device? There are two methods. Special I/O instructions and memory-mapped I/O. If special I/O instructions are used, the OS will use the I/O instruction to specify both the device number and the command word. The processor then executes the special I/O instruction by passing the device number to the I/O device (in most cases) via a set of control lines on the bus and at the same time sends the command to the I/O device using the bus’s data lines. Special I/O instructions are not used that widely. Most processors use memory-mapped I/O where portions of the address space are assigned to the I/O device. Read and write to this special address space are interpreted by the memory controller as I/O commands and the memory controller will do right thing to communicate with the I/O device Why is memory-mapeed I/O so popular? Well, it is popular because we can use the same protection mechanism we already implemented for virtual memory to prevent the user from issuing commands to the I/O device directly. +2 = 57 min. (Y:37)

I/O Device Notifying the OS The OS needs to know when: The I/O device has completed an operation The I/O operation has encountered an error This can be accomplished in two different ways: Polling: The I/O device put information in a status register The OS periodically check the status register I/O Interrupt: Whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. After the OS has issued a command to the I/O device either via a special I/O instruction or by writing to a location in the I/O address space, the OS needs to be notified when: (a) The I/O device has completed the operation. (b) Or when the I/O device has encountered an error. This can be accomplished in two different ways: Polling and I/O interrupt. +1 = 58 min. (Y:38)

Polling: Programmed I/O busy wait loop not an efficient way to use the CPU unless the device is very fast! but checks for I/O completion can be dispersed among computation intensive code Is the data ready? read store yes no done? CPU IOC device Memory In Polling, the I/O device put information in a status register and the OS periodically check it (the busy loop) to see if the data is ready or if an error condition has occurred. If the data is ready, fine: read the data and move on. If not, we stay in this loop and try again at a later time. The advantage of this approach is simple: the processor is totally in control and does all the work but the processor in total control is also the problem. Needless to say, polling overhead can consume a lot of CPU time if the device is very fast. For this reason (Disadvantage), most I/O devices notify the processor via I/O interrupt. +2 = 60 min. (Y:40) Advantage: Simple: the processor is totally in control and does all the work Disadvantage: Polling overhead can consume a lot of CPU time

Interrupt Driven Data Transfer add sub and or nop read store ... rti memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr service routine (4) : CPU IOC device Memory That is, whenever an I/O device needs attention from the processor, it interrupts the processor from what it is currently doing. This is how an I/O interrupt looks in the overall scheme of things. The processor is minding its business when one of the I/O device wants its attention and causes an I/O interrupt. The processor then save the current PC, branch to the address where the interrupt service routine resides, and start executing the interrupt service routine. When it finishes executing the interrupt service routine, it branches back to the point of the original program where we stop and continue. The advantage of this approach is efficiency. The user program’s progress is halted only during actual transfer. The disadvantage is that it require special hardware in the I/O device to generate the interrupt. And on the processor side, we need special hardware to detect the interrupt and then to save the proper states so we can resume after the interrupt. +2 = 62 min. (Y:42) Advantage: User program progress is only halted during actual transfer Disadvantage, special hardware is needed to: Cause an interrupt (I/O device) Detect an interrupt (processor) Save the proper states to resume after the interrupt (processor)

I/O Interrupt An I/O interrupt is just like the exceptions except: An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution: I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion You can pick your own convenient point to take an interrupt I/O interrupt is more complicated than exception: Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies: Interrupt request needs to be prioritized How does an I/O interrupt different from the exception you already learned? Well, an I/O interrupt is asynchronous with respect to the instruction execution while exception such as overflow or page fault are always associated with a certain instruction. Also for exception, the only information needs to be conveyed is the fact that an exceptional condition has occurred but for interrupt, there is more information to be conveyed. Let me elaborate on each of these two points. Unlike exception, which is always associated with an instruction, interrupt is not associated with any instruction. The user program is just doing its things when an I/O interrupt occurs. So I/O interrupt does not prevent any instruction from completing so you can pick your own convenient point to take the interrupt. As far as conveying more information is concerned, the interrupt detection hardware must somehow let the OS know who is causing the interrupt. Furthermore, interrupt requests needs to be prioritized. The hardware that can do all these looks like this. +2 = 64 min. (Y:44)

Delegating I/O Responsibility from the CPU: DMA IOC device Memory DMAC CPU sends a starting address, direction, and length count to DMAC. Then issues "start". DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. Direct Memory Access (DMA): External to the CPU Act as a maser on the bus Transfer blocks of data to or from memory without CPU intervention Finally, lets see how we can delegate some of the I/O responsibilities from the CPU. The first option is Direct Memory Access which take advantage of the fact that I/O events often involve block transfer: you are not going to access the disk 1 byte at a time. The DMA controller is external to the processor and can acts as a bus master to transfer blocks of data to or from memory and the I/O device without CPU intervention. This is how it works. The CPU sends the starting address, the direction and length of the transfer to the DMA controller and issues a start command. The DMA controller then take over from there and provides handshake signals required (point to the last text block) to complete the entire block transfer. So the DMA controller are pretty intelligent. If you add more intelligent to the DMA controller, you will end up with a IO processor or IOP for short. +2 = 72 min. (Y:52)

Delegating I/O Responsibility from the CPU: IOP main memory bus Mem . . . Dn I/O bus OP Device Address target device where cmnds are IOP looks in memory for commands OP Addr Cnt Other what to do where to put data how much special requests CPU IOP (1) Issues instruction to IOP memory (2) (3) Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. (4) IOP interrupts CPU when done The IOP is so smart that the CPU only needs to issue a simple instruction (Op, Device, Address) that tells them what is the target device and where to find more commands (Addr). The IOP will then fetch commands such as this (OP, Addr, Cnt, Other) from memory and do all the necessary data transfer between the I/O device and the memory system. The IOP will do the transfer at the background and it will not affect the CPU because it will access the memory only when the CPU is not using it: this is called stealing memory cycles. Only when the IOP finishes its operation will it interrupts the CPU. +2 = 74 min. (Y:54) 1 2

Responsibilities of the Operating System The operating system acts as the interface between: The I/O hardware and the program that requests I/O Three characteristics of the I/O systems: The I/O system is shared by multiple program using the processor I/O systems often use interrupts (external generated exceptions) to communicate information about I/O operations. Interrupts must be handled by the OS because they cause a transfer to supervisor mode The low-level control of an I/O device is complex: Managing a set of concurrent events The requirements for correct device control are very detailed The OS acts as the interface between the I/O hardware and the program that requests I/O. The responsibilities of the operating system arise from 3 characteristics of the I/O systems: (a) First the I/O system is shared by multiple programs using the processor. (b) I/O system, as I will show you, often use interrupts to communicate information about I/O operation and interrupt must be handled by the OS. (c) Finally, the low-level control of an I/O device is very complex so we should leave to those crazy kernel programers to handle them. +1 = 52 min. (Y:32)

Operating System Requirements Provide protection to shared I/O resources Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights Provides abstraction for accessing devices: Supply routines that handle low-level device operation Handles the interrupts generated by I/O devices Provide equitable access to the shared I/O resources All user programs must have equal access to the I/O resources Schedule accesses in order to enhance system throughput Here is a list of the function the OS must provide. First it must guarantee that a user’s program can only access the portion of an I/O device that it has the right to do so. Then the OS must hide low level complexity from the user by suppling routines that handle low-level device operation. The OS also needs to handle the interrupts generated by I/O devices. And the OS must be be fair: all user programs must have equal access to the I/O resources. Finally, the OS needs to schedule accesses in a way that system throughput is enhanced. +1 = 53 min. (Y:33)

OS and I/O Systems Communication Requirements The Operating System must be able to prevent: The user program from communicating with the I/O device directly If user programs could perform I/O directly: Protection to the shared I/O resources could not be provided Three types of communication are required: The OS must be able to give commands to the I/O devices The I/O device must be able to notify the OS when the I/O device has completed an operation or has encountered an error Data must be transferred between memory and an I/O device The OS must be able to communicate with the I/O system but at the same time, the OS must be able to prevent the user from communicating with the I/O device directly. Why? Because if user programs could perform I/O directly, we would not be able to provide protection to the shared I/O device. Three types of communications are required: (1) First the OS must be able to give commands to the I/O devices. (2) Secondly, the device e must be able to notify the OS when the I/O device has completed an operation or has encountered an error. (3) Data must be transferred between memory and an I/O device. +2 = 55 min. (Y:35)

Multimedia Bandwidth Requirements High Quality Video Digital Data = (30 frames / second) (640 x 480 pels) (24-bit color / pel) = 221 Mbps (75 MB/s) Reduced Quality Video Digital Data = (15 frames / second) (320 x 240 pels) (16-bit color / pel) = 18 Mbps (2.2 MB/s) High Quality Audio Digital Data = (44,100 audio samples / sec) (16-bit audio samples) (2 audio channels for stereo) = 1.4 Mbps Reduced Quality Audio Digital Data = (11,050 audio samples / sec) (8-bit audio samples) (1 audio channel for monaural) = 0.1 Mbps compression changes the whole story!

Multimedia and Latency How sensitive is your eye / ear to variations in audio / video rate? How can you ensure constant rate of delivery? Jitter (latency) bounds vs constant bit rate transfer Synchronizing audio and video streams you can tolerate 15-20 ms early to 30-40 ms late

P1394 High-Speed Serial Bus (firewire) a digital interface – there is no need to convert digital data into analog and tolerate a loss of data integrity, physically small - the thin serial cable can replace larger and more expensive interfaces, easy to use - no need for terminators, device IDs, or elaborate setup, hot pluggable - users can add or remove 1394 devices with the bus active, inexpensive - priced for consumer products, scalable architecture - may mix 100, 200, and 400 Mbps devices on a bus, flexible topology - support of daisy chaining and branching for true peer-to-peer communication, fast - even multimedia data can be guaranteed its bandwidth for just-in-time delivery, and non-proprietary mixed asynchronous and isochornous traffic

Firewire Operations isochronous channel #1 time slot Packet Frame = 125 µsecs Time slot available for asynchronous transport Timing indicator Fixed frame is divided into preallocated CBR slots + best effort asycnhronous slot Each slot has packet containing “ID” command and data Example: digital video camera can expect to send one 64 byte packet every 125 µs 80 * 1024 * 64 = 5MB/s

Summary: I/O performance is limited by weakest link in chain between OS and device Disk I/O Benchmarks: I/O rate vs. Data rate vs. latency Three Components of Disk Access Time: Seek Time: advertised to be 8 to 12 ms. May be lower in real life. Rotational Latency: 4.1 ms at 7200 RPM and 8.3 ms at 3600 RPM Transfer Time: 2 to 12 MB per second I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronous Delegating I/O responsibility from the CPU: DMA, or even IOP wide range of devices multimedia and high speed networking pose important challenges First we showed you the diversity of I/O requirements by talking about three categories of disk I/O benchmarks. Supercomputer application’s main concern is data rate. The main concern of transaction processing is I/O rate and file system’s main concern is file access. Then we talk about magnetic disk. One thing to remember is that disk access time has 3 components. The first 2 components, seek time and rotational latency involves mechanical moving parts and are therefore very slow compare to the 3rd component, the transfer time. One good thing about the seek time is that this is probably one of the few times in life that you can actually get better than the “advertised” result due to the locality of disk access. As far as graphic display is concerned, resolution is the basic measurement of how much information is on the screen and is usually described as “some number” by “some number.” The first “some number” is the horizontal resolution in number of pixels while the second “some number” is the vertical resolution in number of scan lines. Then I showed you how the size as well as bandwidth requirement of a Color Frame Buffer can be reduced if a Color Map is placed between the Frame Buffer and the graphic display. Finally, we talked about a spacial memory, the VRAM, that can be used to construct the Frame Buffer. It is nothing but a DRAM core with a high speed shift register attach to it. That’s all I have for today and we will continue our discussion on I/O Friday. +3 = 80 min. (Z:00)