EE 108B Review Session Friday, March 9th, 2007.

EE 108B Review Session Friday, March 9th, 2007

Today’s Agenda… Announcements Final quiz logistics
Snap-shot of quarter’s topics I/O Systems Sample problems (previous finals and other sources) 2

Announcements HW 4 due 3/13 (Tuesday)
Sample final solutions to be posted Monday Quiz #2 – Thursday 3/15 Lab 4 due 3/19 3

Final Quiz Logistics URL with info: Date and Time: Thursday, 3/15/07, 11:00 AM-12:30 PM Location: MuCullough 115 Coverage: Lectures 1-18 Format: 1 page of notes. No electronic devices other than a calculator will be permitted. Local SCPD students: must attend (within a 50-mile radius of Stanford). Parking is difficult to find on campus  Arrive early (45 minutes in advance) Remote SCPD students: can take the test any time on the same day. their proctors will receive a PDF copy of the exam from SCPD the morning of the exam. Contact SCPD if there are any issues in the delivery. 4

Main Topics Covered this Quarter
MIPS Assembly Language CPU Performance and Compiler Optimizations Single Cycle Processor Design Pipeline Processor Design Memory Design Processes, Interrupts and Exceptions Virtual Memory I/O and OS basics – Bus interfaces, DMA Controllers, etc. Intro to Multiprocessor Design pre-midterm post-midterm 5

Disk Memory Devices Largest and slowest Magnetic disks:
Want to store majority of data here. (VM, FS) Magnetic disks: Rotating platter coated with magnetic surface. Platter  side  track (cylinders)  sector. Latency = seek time + rotational latency + transfer time Seek time usually reduced by locality, increased by random access 6

Hard Disk Terms Important definitions
– Each drive uses one or more magnetic platters to store data – A head is used to read/write data on each size of each platter – Each platter is divided into a series of concentric rings called tracks – Each track is in turn divided into a series of sectors which is the basic unit of transfer for disks (“block size”) – A common track across multiple platters is referred to as a cylinder 7

Example The operation of BlockCluster for each incoming request is simple. The request is routed by the Ethernet switch to the PC with the disk that stores that corresponding file. The disk is accessed to read the requested data from contiguous locations. The processor is used to encrypt the data before it is sent to the requestor through the Ethernet switch. There are two types of requests: a) requests for preview videos that read 0.25 Mbytes of data (1Mbyte=106 bytes) and b) requests for a movie fragment that read 2 Mbytes. For the typical workload of BlockCluster, 60% of the requests refer to a preview video. Your overall design goal for BlockCluster is to achieve maximum performance within a budget of $50,000. Performance is measured in requests per second (RPS) for the typical workload. 8

Example (a) Your first design decision is the type of hard disk to use. You have identified two candidates, both with capacity of 50 Gbytes. Disk A: 30MBytes/sec bandwidth, 10,000 RPM rotational speed, 2ms average seek time, costs $300. Disk B: 50MBytes/sec bandwidth, 5,400 RPM rotational speed, 5ms average seek time, costs $500. Accessing either disk requires an additional 3ms due to the controller associated with each disk. 9

Example DISK A: Latency_preview = 3ms + 2ms + 0.5/(10,000/60) /30 = = 16.3ms Latency_movie = 3ms + 2ms + 0.5/(10,000/60) + 2/30 = 74.6ms Average_Latency= 0.6*Latency_preview + 0.4*Latency_movie = 39.62ms RPS = 1/Average_Latency = 1/39.62 = RPS PRS/$ = 25.23/300 = DISK B: Latency_preview = 3ms + 5ms + 0.5/(5,400/60) /50 = = 18.5ms Latency_movie = 3ms + 5ms + 0.5/(5,400/60) + 2/50 = 53.5 Average_Latency= 0.6*Latency_preview + 0.4*Latency_movie = 32.5ms RPS = 1/Average_Latency = 1/32.5 = 30.7RPS PRS/$ = 30.7/500 = Disk B has better RPS rating than disk A. However, disk A has better RPS/$ than disk B. Since our overall goal is best performance within a given budget, we should choose disk A. 10

Example (b) Calculate the following parameters for the BlockCluster video server: number of disks per PC; the number of PCs in the cluster; overall performance (in RPS); and storage capacity (in Gbytes = 109 bytes). Remember that your goal is to optimize performance within the design budget. Assume the following: You use disk A from (a) (this is not necessarily the correct answer for (a)) You can connect up to 25 disks on the IO bus. The processor’s sustained performance is 500 MIPS (millions of instructions). The number of instructions it must execute per request is given by the following formula: 200, ,000*(Size/Mbyte), where Size is the size of the data associate with the request, measured in Mbytes. Each PC costs $1,000 (disks not included). The Ethernet switch costs $5,000. The memory bus and main memory in each PC are never a performance bottleneck. The Ethernet switch is never a performance bottleneck. 11

Example Each disk can support 25.23 RPS on the average (see a)
For the CPU: For a preview video the CPU executes 200K+800K*0.25 = 400K instructions For a movie fragment the CPU executes 200K+800K*2 = 1,800K instructions On the average, the CPU executes 0.6*400K+0.4*1,800K = 960K instructions per request Hence the CPU can support up to 500MIPS/(960K instructions per request) = RPS Hence, the CPU can support the RPS rate of /25.33 = disks. We need to round this number to 20! The IO bus saturates at 25 disks, which is more than the 20 the CPU can support, hence the IO bus is not a bottleneck. Overall, we will have 20 disks per CPU. The overall budget is $50,000. Assume that we have x PCs in our system. The cost of the system is x*(1,000+20*300)+$5,000 = 7,000*x+5,000. Since the overall cost cannot exceed the budget, 7,000*x+5,000 <= 50,000 or x<=6.42 or x=6. In summary, the BlockCluster will have 6 PCs with 20 disks per PC Overall performance of 6*20*25.23 = 3,027RPS Overall capacity of 6*20*50GB = 6,000 GB or 6TB 12

Disk, cont. Dependability, Availability, Reliability
ECC, RAID, MTTF, MTTR RAID0 – Striping. No Redundancy RAID1 – Mirroring RAID2 – ECC. Not used RAID3/4/5 – Interleaved parity. 13

Accessing the Bus Master/slave protocol Arbitration Multiple masters
Priority, starvation, fairness 14

Interfacing to the CPU Memory-mapped I/O Polling vs Interrupts DMA
Priority scheme for multiple interrupts DMA 15

Bus Types 16

Optimizations to improve bus b/w?
Cost Wider Buses Separate data-address lines Block transfers Split transactions Synchronous vs. Asynchronous Critical word first 17

Advantages Disadvantages
Advantages and disadvantages of buses when connecting Ps, MEMs, or I/O devices Advantages Simple Implements broadcast Provides an easy way to implement ordering (serialization) (Assuming standard buses) interoperability Disadvantages Does not scale well (limited bandwidth or limited distance or small number of devices or limited clock frequency) Turn-around time and arbitration overhead Unnecessary serialization in communication when ordering is not necessary 18

That’s all folks… All the best on the final exam!
It was a pleasure TA-ing you all this quarter 19

EE 108B Review Session Friday, March 9th, 2007.

Similar presentations

Presentation on theme: "EE 108B Review Session Friday, March 9th, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EE 108B Review Session Friday, March 9th, 2007.

Similar presentations

Presentation on theme: "EE 108B Review Session Friday, March 9th, 2007."— Presentation transcript:

Similar presentations

About project

Feedback