Computer Architecture Lec 18 – Storage. 01/19/10 Storage2 Review Disks: Arial Density now 30%/yr vs. 100%/yr in 2000s TPC: price performance as normalizing.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Introduction to Queuing Theory
Lecture 12: Storage Systems Performance Kai Bu
CS 501: Software Engineering Fall 2000 Lecture 19 Performance of Computer Systems.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1 Performance Evaluation of Computer Networks Objectives  Introduction to Queuing Theory  Little’s Theorem  Standard Notation of Queuing Systems  Poisson.
Data Communication and Networks Lecture 13 Performance December 9, 2004 Joseph Conron Computer Science Department New York University
1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.
Performance Evaluation
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
CS533 Modeling and Performance Evaluation of Network and Computer Systems Queuing Theory (Chapter 30-31)
Queueing Theory.
7/3/2015© 2007 Raymond P. Jefferis III1 Queuing Systems.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Introduction to Queuing Theory. 2 Queuing theory definitions  (Kleinrock) “We study the phenomena of standing, waiting, and serving, and we call this.
Internet Queuing Delay Introduction How many packets in the queue? How long a packet takes to go through?
CS 136, Advanced Architecture Storage Performance Measurement.
Queuing Networks. Input source Queue Service mechanism arriving customers exiting customers Structure of Single Queuing Systems Note: 1.Customers need.
Introduction to Queuing Theory
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Network Analysis A brief introduction on queues, delays, and tokens Lin Gu, Computer Networking: A Top Down Approach 6 th edition. Jim Kurose.
Introduction to Queuing Theory
Queuing models Basic definitions, assumptions, and identities Operational laws Little’s law Queuing networks and Jackson’s theorem The importance of think.
Probability Review Thinh Nguyen. Probability Theory Review Sample space Bayes’ Rule Independence Expectation Distributions.
Lecture 14 – Queuing Networks Topics Description of Jackson networks Equations for computing internal arrival rates Examples: computation center, job shop.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
1 CS 501 Spring 2006 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Introduction to Queueing Theory
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
TexPoint fonts used in EMF.
1 Elements of Queuing Theory The queuing model –Core components; –Notation; –Parameters and performance measures –Characteristics; Markov Process –Discrete-time.
Lecture 3: 1 Introduction to Queuing Theory More interested in long term, steady state than in startup => Arrivals = Departures Little’s Law: Mean number.
CS433 Modeling and Simulation Lecture 12 Queueing Theory Dr. Anis Koubâa 03 May 2008 Al-Imam Mohammad Ibn Saud University.
1 Chapters 8 Overview of Queuing Analysis. Chapter 8 Overview of Queuing Analysis 2 Projected vs. Actual Response Time.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,
Queuing Theory and Traffic Analysis Based on Slides by Richard Martin.
CS352 - Introduction to Queuing Theory Rutgers University.
CSCI1600: Embedded and Real Time Software Lecture 19: Queuing Theory Steven Reiss, Fall 2015.
Csci 136 Computer Architecture II – IO and Storage Systems Xiuzhen Cheng
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 23 Performance of Computer Systems.
Queuing Theory.  Queuing Theory deals with systems of the following type:  Typically we are interested in how much queuing occurs or in the delays at.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
Random Variables r Random variables define a real valued function over a sample space. r The value of a random variable is determined by the outcome of.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 25 I/O Systems May 2,
Queuing Theory Simulation & Modeling.
Introduction to Queueing Theory
OPERATING SYSTEMS CS 3502 Fall 2017
Al-Imam Mohammad Ibn Saud University
Queuing Theory Queuing Theory.
Internet Queuing Delay Introduction
Introduction I/O devices can be characterized by I/O bus connections
Lecture 21: Storage Systems
Queueing Theory Carey Williamson Department of Computer Science
Queuing models Basic definitions, assumptions, and identities
EECS 252 Graduate Computer Architecture Lec 18 – Storage
Queuing models Basic definitions, assumptions, and identities
TexPoint fonts used in EMF.
Single server model Queue Server I/O Controller And device Arrivals.
Storage Systems Performance
Lecture 12: Storage Systems Performance
Queueing Theory 2008.
Carey Williamson Department of Computer Science University of Calgary
Course Description Queuing Analysis This queuing course
Presentation transcript:

Computer Architecture Lec 18 – Storage

01/19/10 Storage2 Review Disks: Arial Density now 30%/yr vs. 100%/yr in 2000s TPC: price performance as normalizing configuration feature –Auditing to ensure no foul play –Throughput with restricted response time is normal measure Fault  Latent errors in system  Failure in service Components often fail slowly Real systems: problems in maintenance, operation as well as hardware, software

01/19/10 Storage3 Introduction to Queueing Theory More interested in long term, steady state than in startup => Arrivals = Departures Little’s Law: Mean number tasks in system = arrival rate x mean reponse time –Observed by many, Little was first to prove Applies to any system in equilibrium, as long as black box not creating or destroying tasks ArrivalsDepartures

01/19/10 Storage4 Deriving Little’s Law Time observe = elapsed time that observe a system Number task = number of (overlapping) tasks during Time observe Time accumulated = sum of elapsed times for each task Then Mean number tasks in system = Time accumulated / Time observe Mean response time = Time accumulated / Number task Arrival Rate = Number task / Time observe Factoring RHS of 1 st equation Time accumulated / Time observe = Time accumulated / Number task x Number task / Time observe Then get Little’s Law: Mean number tasks in system = Arrival Rate x Mean response time

01/19/10 Storage5 A Little Queuing Theory: Notation Notation: Time server average time to service a task Average service rate = 1 / Time server (traditionally µ) Time queue average time/task in queue Time system average time/task in system = Time queue + Time server Arrival rate avg no. of arriving tasks/sec (traditionally λ) Length server average number of tasks in service Length queue average length of queue Length system average number of tasks in service = Length queue + Length server Little’s Law: Length server = Arrival rate x Time server (Mean number tasks = arrival rate x mean service time) ProcIOCDevice Queue server System

01/19/10 Storage6 Server Utilization For a single server, service rate = 1 / Time server Server utilization must be between 0 and 1, since system is in equilibrium (arrivals = departures); often called traffic intensity, traditionally ρ) Server utilization = mean number tasks in service = Arrival rate x Time server What is disk utilization if get 50 I/O requests per second for disk and average disk service time is 10 ms (0.01 sec)? Server utilization = 50/sec x 0.01 sec = 0.5 Or server is busy on average 50% of time

01/19/10 Storage7 Time in Queue vs. Length of Queue We assume First In First Out (FIFO) queue Relationship of time in queue (Time queue ) to mean number of tasks in queue (Length queue ) ? Time queue = Length queue x Time server + “Mean time to complete service of task when new task arrives if server is busy” New task can arrive at any instant; how predict last part? To predict performance, need to know sometime about distribution of events

01/19/10 Storage8 Distribution of Random Variables A variable is random if it takes one of a specified set of values with a specified probability –Cannot know exactly next value, but may know probability of all possible values I/O Requests can be modeled by a random variable because OS normally switching between several processes generating independent I/O requests –Also given probabilistic nature of disks in seek and rotational delays Can characterize distribution of values of a random variable with discrete values using a histogram –Divides range between the min & max values into buckets –Histograms then plot the number in each bucket as columns –Works for discrete values e.g., number of I/O requests? What about if not discrete? Very fine buckets

01/19/10 Storage9 Characterizing distribution of a random variable Need mean time and a measure of variance For mean, use weighted arithmetic mean (WAM): f i = frequency of task i Ti = time for tasks I weighted arithmetic mean = f1  T1 + f2  T fn  Tn For variance, instead of standard deviation, use Variance (square of standard deviation) for WAM: Variance = (f1  T1 2 + f2  T fn  Tn 2 ) – WAM 2 –If time is miliseconds, Variance units are square milliseconds! Got a unitless measure of variance?

01/19/10 Storage10 Squared Coefficient of Variance (C 2 ) C 2 = Variance / WAM 2  C = sqrt(Variance)/WAM = StDev/WAM –Unitless measure Trying to characterize random events, but need distribution of random events with tractable math Most popular such distribution is exponential distribution, where C = 1 Note using constant to characterize variability about the mean –Invariance of C over time  history of events has no impact on probability of an event occurring now –Called memoryless, an important assumption to predict behavior –(Suppose not; then have to worry about the exact arrival times of requests relative to each other  make math not tractable!)

01/19/10 Storage11 Poisson Distribution Most widely used exponential distribution is Poisson Described by probability mass function: Probability (k) = e -a x a k / k! –where a = Rate of events x Elapsed time If interarrival times exponentially distributed & use arrival rate from above for rate of events, number of arrivals in time interval t is a Poisson process

01/19/10 Storage12 Time in Queue Time new task must wait for server to complete a task assuming server busy –Assuming it’s a Poisson process Average residual service time = ½ x Arithmetic mean x (1 + C 2 ) –When distribution is not random & all values = average  standard deviation is 0  C is 0  average residual service time = half average service time –When distribution is random & Poisson  C is 1  average residual service time = weighted arithmetic mean

01/19/10 Storage13 Time in Queue All tasks in queue (Length queue ) ahead of new task must be completed before task can be serviced –Each task takes on average Time server –Task at server takes average residual service time to complete Chance server is busy is server utilization  expected time for service is Server utilization  Average residual service time Time queue = Length queue + Time server + Server utilization x Average residual service time Substituting definitions for Length queue, Average residual service time, & rearranging: Time queue = Time server x Server utilization/(1-Server utilization)

01/19/10 Storage14 Time in Queue vs. Length of Queue Length queue = Arrival rate x Time queue –Little’s Law applied to the components of the black box since they must also be in equilibrium Given 1.Time queue = Time server x Server utilization/(1-Server utilization) 2.Arrival rate  Time server = Server utilization  Length queue = Server utilization 2 /(1-Server utilization) Mean no. requests in queue slide 6? (50%) Length queue = (0.5) 2 / (1-0.5) = 0.25/0.5 = 0.5  0.5 requests on average in queue

01/19/10 Storage15 M/M/1 Queuing Model System is in equilibrium Times between 2 successive requests arriving, “interarrival times”, are exponentially distributed Number of sources of requests is unlimited “infinite population model” Server can start next job immediately Single queue, no limit to length of queue, and FIFO discipline, so all tasks in line must be completed There is one server Called M/M/1 (book also derives M/M/m) 1.Exponentially random request arrival (C 2 = 1) 2.Exponentially random service time (C 2 = 1) 3.1 server –M standing for Markov, mathematician who defined and analyzed the memoryless processes

01/19/10 Storage16 Example 40 disk I/Os / sec, requests are exponentially distributed, and average service time is 20 ms  Arrival rate/sec = 40, Time server = 0.02 sec 1.On average, how utilized is the disk? Server utilization = Arrival rate  Time server = 40 x 0.02 = 0.8 = 80% 1.What is the average time spent in the queue? Time queue = Time server x Server utilization/(1-Server utilization) = 20 ms x 0.8/(1-0.8) = 20 x 4 = 80 ms 1.What is the average response time for a disk request, including the queuing time and disk service time? Time system =Time queue + Time server = ms = 100 ms

01/19/10 Storage17 How much better with 2X faster disk? Average service time is 10 ms  Arrival rate/sec = 40, Time server = 0.01 sec 1.On average, how utilized is the disk? Server utilization = Arrival rate  Time server = 40 x 0.01 = 0.4 = 40% 1.What is the average time spent in the queue? Time queue = Time server x Server utilization/(1-Server utilization) = 10 ms x 0.4/(1-0.4) = 10 x 2/3 = 6.7 ms 1.What is the average response time for a disk request, including the queuing time and disk service time? Time system =Time queue + Time server = ms = 16.7 ms 6X faster response time with 2X faster disk!

01/19/10 Storage18 Value of Queueing Theory in practice Learn quickly do not try to utilize resource 100% but how far should back off? Allows designers to decide impact of faster hardware on utilization and hence on response time Works surprisingly well

01/19/10 Storage19 Cross cutting Issues: Buses  point-to-point links and switches StandardwidthlengthClock rateMB/sMax (Parallel) ATA8b0.5 m133 MHz1332 Serial ATA2b2 m3 GHz300? (Parallel) SCSI16b12 m80 MHz (DDR)32015 Serial Attach SCSI1b10 m ,256 PCI32/640.5 m33 / 66 MHz533? PCI Express2b0.5 m3 GHz250? No. bits and BW is per direction  2X for both directions (not shown). Since use fewer wires, commonly increase BW via versions with 2X-12X the number of wires and BW

01/19/10 Storage20 Storage Example: Internet Archive Goal of making a historical record of the Internet –Internet Archive began in 1996 –Wayback Machine interface perform time travel to see what the website at a URL looked like in the past It contains over a petabyte (10 15 bytes), and is growing by 20 terabytes (10 12 bytes) of new data per month In addition to storing the historical record, the same hardware is used to crawl the Web every few months to get snapshots of the Interne.

01/19/10 Storage21 Internet Archive Cluster 1U storage node PetaBox GB2000 from Capricorn Technologies Contains GB Parallel ATA (PATA) disk drives, 512 MB of DDR266 DRAM, one 10/100/1000 Ethernet interface, and a 1 GHz C3 Processor from VIA (80x86). Node dissipates  80 watts 40 GB2000s in a standard VME rack,  80 TB of raw storage capacity 40 nodes are connected with a 48-port 10/100 or 10/100/1000 Ethernet switch Rack dissipates about 3 KW 1 PetaByte = 12 racks

01/19/10 Storage22 Estimated Cost Via processor, 512 MB of DDR266 DRAM, ATA disk controller, power supply, fans, and enclosure = $ RPM Parallel ATA drives holds 500 GB = $ port 10/100/1000 Ethernet switch and all cables for a rack = $3000. Cost $84,500 for a 80-TB rack. 160 Disks are  60% of the cost

01/19/10 Storage23 Estimated Performance 7200 RPM Parallel ATA drives holds 500 GB, has an average time seek of 8.5 ms, transfers at 50 MB/second from the disk. The PATA link speed is 133 MB/second. –performance of the VIA processor is 1000 MIPS. –operating system uses 50,000 CPU instructions for a disk I/O. –network protocol stacks uses 100,000 CPU instructions to transmit a data block between the cluster and the external world ATA controller overhead is 0.1 ms to perform a disk I/O. Average I/O size is 16 KB for accesses to the historical record via the Wayback interface, and 50 KB when collecting a new snapshot Disks are limit:  75 I/Os/s per disk, 300/s per node, 12000/s per rack, or about 200 to 600 Mbytes / sec Bandwidth per rack Switch needs to support 1.6 to 3.8 Gbits/second over 40 Gbit/sec links

01/19/10 Storage24 Estimated Reliability CPU/memory/enclosure MTTF is 1,000,000 hours (x 40) PATA Disk MTTF is 125,000 hours (x 160) PATA controller MTTF is 500,000 hours (x 40) Ethernet Switch MTTF is 500,000 hours (x 1) Power supply MTTF is 200,000 hours (x 40) Fan MTTF is 200,000 hours (x 40) PATA cable MTTF is 1,000,000 hours (x 40) MTTF for the system is 531 hours (  3 weeks) 70% of time failures are disks 20% of time failures are fans or power supplies

01/19/10 Storage25 RAID Paper Discussion What was main motivation for RAID in paper? Did prediction of processor performance and disk capacity hold? What were the performance figures of merit to compare RAID levels? What RAID groups sizes were in the paper? Are they realistic? Why? Why would RAID 2 (ECC) have lower predicted MTTF than RAID 3 (Parity)?

01/19/10 Storage26 RAID Paper Discussion How propose balance performance and capacity of RAID 1 to RAID 5? What do you think of it? What were some of the open issues? Which were significant? In retrospect, what do you think were important contributions? What did the authors get wrong? In retrospect: –RAID in Hardware vs. RAID in Software –Rated MTTF vs. in the field –Synchronization of disks in an array –EMC ($10B sales in 2005) and RAID –Who invented RAID?

01/19/10 Storage27 Summary Little’s Law: Length system = rate x Time system (Mean number customers = arrival rate x mean service time) Appreciation for relationship of latency and utilization: Time system = Time server +Time queue Time queue = Time server x Server utilization/(1-Server utilization) Clusters for storage as well as computation RAID paper: Its reliability, not performance, that matters for storage ProcIOCDevice Queue server System