CS 136, Advanced Architecture Storage Performance Measurement.

CS 136, Advanced Architecture Storage Performance Measurement

CS136 2 Outline I/O Benchmarks, Performance, and Dependability Introduction to Queueing Theory

CS136 3 I/O Performance Response time = Queue Time + Device Service Time 100% Response Time (ms) Throughput (% total BW) 0 100 200 300 0% Proc Queue IOCDevice Metrics: Response Time vs. Throughput

CS136 4 I/O Benchmarks For better or worse, benchmarks shape a field –Processor benchmarks classically aimed at response time for fixed-size problem –I/O benchmarks typically measure throughput, possibly with upper limit on response times (or 90% of response times) Transaction Processing (TP) (or On-Line TP = OLTP) –If bank computer fails when customer withdraws money, TP system guarantees account debited if customer gets $ & account unchanged if no $ –Airline reservation systems & banks use TP Atomic transactions make this work Classic metric is Transactions Per Second (TPS)

CS136 5 I/O Benchmarks: Transaction Processing Early 1980s great interest in OLTP –Expecting demand for high TPS (e.g., ATMs, credit cards) –Tandem’s success implied medium-range OLTP expanding –Each vendor picked own conditions for TPS claims, reported only CPU times with widely different I/O –Conflicting claims led to disbelief in all benchmarks  chaos 1984 Jim Gray (Tandem) distributed paper to Tandem + 19 in other companies proposing standard benchmark Published “A measure of transaction processing power,” Datamation, 1985 by Anonymous et. al –To indicate that this was effort of large group –To avoid delays in legal departments at each author’s firm Led to Transaction Processing Council in 1988 –www.tpc.org

CS136 6 I/O Benchmarks: TP1 by Anon et. al Debit/Credit Scalability: # of accounts, branches, tellers, history all function of throughput TPSNumber of ATMsAccount-file size 101,0000.1 GB 10010,0001.0 GB 1,000100,00010.0 GB 10,0001,000,000100.0 GB – Each input TPS =>100,000 account records, 10 branches, 100 ATMs – Accounts must grow since customer unlikely to use bank more often just because they have faster computer! Response time: 95% transactions take ≤ 1 second Report price (initial purchase price + 5-year maintenance = cost of ownership) Hire auditor to certify results

CS136 7 Unusual Characteristics of TPC Price included in benchmarks –Cost of HW, SW, 5-year maintenance agreement »Price-performance as well as performance Data set must scale up as throughput increases –Trying to model real systems: demand on system and size of data stored in it increase together Benchmark results are audited –Ensures only fair results submitted Throughput is performance metric but response times are limited –E.g, TPC-C: 90% of transaction response times < 5 seconds Independent organization maintains the benchmarks –Ballots on changes, holds meetings to settle disputes...

CS136 8 TPC Benchmark History/Status

CS136 9 I/O Benchmarks via SPEC SFS 3.0: Attempt by NFS companies to agree on standard benchmark –Run on multiple clients & networks (to prevent bottlenecks) –Same caching policy in all clients –Reads: 85% full-block & 15% partial-block –Writes: 50% full-block & 50% partial-block –Average response time: 40 ms –Scaling: for every 100 NFS ops/sec, increase capacity 1GB Results: plot of server load (throughput) vs. response time & number of users –Assumes: 1 user => 10 NFS ops/sec –3.0 for NFS 3.0 Added SPECMail (mailserver), SPECWeb (web server) benchmarks

CS136 10 2005 Example SPEC SFS Result: NetApp FAS3050c NFS servers 2.8 GHz Pentium Xeons, 2 GB DRAM per processor, 1GB non-volatile memory per system 4 FDDI nets; 32 NFS Daemons, 24 GB file size 168 fibre channel disks: 72 GB, 15000 RPM, 2 or 4 FC controllers

CS136 11 Availability Benchmark Methodology Goal: quantify variation in QoS metrics as events occur that affect system availability Leverage existing performance benchmarks –To generate fair workloads –To measure & trace quality-of-service metrics Use fault injection to compromise system –Hardware faults (disk, memory, network, power) –Software faults (corrupt input, driver error returns) –Maintenance events (repairs, SW/HW upgrades) Examine single-fault and multi-fault workloads –The availability analogues of performance micro- and macro- benchmarks

CS136 12 Example single-fault result Compares Linux and Solaris reconstruction –Linux: minimal performance impact but longer window of vulnerability to second fault –Solaris: large perf. impact but restores redundancy fast Linux Solaris

CS136 13 Reconstruction policy (2) Linux: favors performance over data availability –Automatically-initiated reconstruction, idle bandwidth –Virtually no performance impact on application –Very long window of vulnerability (>1hr for 3GB RAID) Solaris: favors data availability over app. perf. –Automatically-initiated reconstruction at high BW –As much as 34% drop in application performance –Short window of vulnerability (10 minutes for 3GB) Windows: favors neither! –Manually-initiated reconstruction at moderate BW –As much as 18% app. performance drop –Somewhat short window of vulnerability (23 min/3GB)

CS136 14 Introduction to Queueing Theory More interested in long-term steady state than in startup ⇒ Arrivals = Departures Little’s Law: Mean number tasks in system = arrival rate x mean response time –Observed by many, Little was first to prove –Makes sense: large number of customers means long waits Applies to any system in equilibrium, as long as black box not creating or destroying tasks ArrivalsDepartures

CS136 15 Deriving Little’s Law Define arr(t) = # arrivals in interval (0,t) Define dep(t) = # departures in (0,t) Clearly, N(t) = # in system at time t = arr(t) – dep(t) Area between curves = spent(t) = total time spent in system by all customers (measured in customer-seconds) N(t)

CS136 16 Deriving Little’s Law (cont’d) Define average arrival rate during interval t, in customers/second, as λ t = arr(t)/t Define T t as system time/customer, averaged over all customers in (0,t) –Since spent(t) = accumulated customer-seconds, divide by arrivals up to that point to get T t = spent(t)/arr(t) Mean tasks in system over (0,t) is accumulated customer-seconds divided by seconds: Mean_tasks t = spent(t)/t Above three equations give us: Mean_tasks t = λ t T t Assuming limits of λ t and T t exist, limit of mean_tasks t also exists and gives Little’s result: Mean tasks in system = arrival rate × mean time in system

CS136 17 A Little Queuing Theory: Notation Notation: Time server average time to service a task Average service rate = 1 / Time server (traditionally µ) Time queue average time/task in queue Time system average time/task in system = Time queue + Time server Arrival rate avg no. of arriving tasks/sec (traditionally λ) Length server average number of tasks in service Length queue average length of queue Length system average number of tasks in system = Length queue + Length server Little’s Law: Length server = Arrival rate x Time server ProcIOCDevice Queue server System

CS136 18 Server Utilization For a single server, service rate = 1 / Time server Server utilization must be between 0 and 1, since system is in equilibrium (arrivals = departures); often called traffic intensity, traditionally ρ Server utilization = mean number tasks in service = Arrival rate x Time server What is disk utilization if get 50 I/O requests per second for disk and average disk service time is 10 ms (0.01 sec)? Server utilization = 50/sec x 0.01 sec = 0.5 Or, on average server is busy 50% of time

CS136 19 Time in Queue vs. Length of Queue We assume First In First Out (FIFO) queue Relationship of time in queue (Time queue ) to mean number of tasks in queue (Length queue )? Time queue = Length queue x Time server + “Mean time to complete service of task when new task arrives if server is busy” New task can arrive at any instant; how to predict last part? To predict performance, need to know something about distribution of events

CS136 20 I/O Request Distributions I/O request arrivals can be modeled by random variable –Multiple processes generate independent I/O requests –Disk seeks and rotational delays are probabilistic What distribution to use for model? –True distributions are complicated »Self-similar (fractal) »Zipf –We often ignore that and use Poisson »Highly tractable for analysis »Intuitively appealing (independence of arrival times)

CS136 21 The Poisson Distribution Probability of exactly k arrivals in (0,t) is: P k (t) = (λt) k e -λt /k! –λ is arrival rate parameter More useful formulation is Poisson arrival distribution: –PDF A(t) = P[next arrival takes time ≤ t] = 1 – e -λt –pdf a(t) = λe -λt –Also known as exponential distribution –Mean = standard deviation = λ Poisson distribution is memoryless: –Assume P[arrival within 1 second] at time t 0 = x –Then P[arrival within 1 second] at time t 1 > t 0 is also x »I.e., no memory that time has passed

CS136 22 Kendall’s Notation Queueing system is notated A/S/s/c, where: –A encodes the interarrival distribution –S encodes the service-time distribution »Both A and S can be M (Memoryless, Markov, or exponential), D (deterministic), E r (r-stage Erlang), G (general), or others –s is the number of servers –c is the capacity of the queue, if non-infinite Examples: –D/D/1 is arrivals on clock tick, fixed service times, one server –M/M/m is memoryless arrivals, memoryless service, multiple servers (this is good model of a bank) –M/M/m/m is case where customers go away rather than wait in line –G/G/1 is what disk drive is really like (but mostly intractable to analyze)

CS136 23 M/M/1 Queuing Model System is in equilibrium Exponential interarrival and service times Unlimited source of customers (“infinite population model”) FIFO queue Book also derives M/M/m Most important results: –Let arrival rate = λ = 1/average interarrival time –Let service rate = μ = 1/average service time –Define utilization = ρ = λ/μ –Then average number in system = ρ/(1-ρ) –And time in system = (1/μ)/(1-ρ)

CS136 24 Explosion of Load with Utilization

CS136 25 Example M/M/1 Analysis Assume 40 disk I/Os / sec –Exponential interarrival time –Exponential service time with mean 20 ms ⇒ λ = 40, Time server = 1/μ = 0.02 sec Server utilization = ρ = Arrival rate  Time server = λ/μ = 40 x 0.02 = 0.8 = 80% Time queue = Time server x ρ /(1-ρ) = 20 ms x 0.8/(1-0.8) = 20 x 4 = 80 ms Time system =Time queue + Time server = 80+20 ms = 100 ms

CS136 26 How Much Better With 2X Faster Disk? Average service time is now 10 ms ⇒ Arrival rate/sec = 40, Time server = 0.01 sec Now server utilization = Arrival rate  Time server = 40 x 0.01 = 0.4 = 40% Time queue = Time server x ρ /(1-ρ) = 10 ms x 0.4/(1-0.4) = 10 x 2/3 = 6.7 ms Time system = Time queue + Time server = 6.7+10 ms = 16.7 ms 6X faster response time with 2X faster disk!

CS136 27 Value of Queueing Theory in Practice Quick lesson: –Don’t try for 100% utilization –But how far to back off? Theory allows designers to: –Estimate impact of faster hardware on utilization –Find knee of response curve –Thus find impact of HW changes on response time Works surprisingly well

CS136 28 Crosscutting Issues: Buses  Point-to-Point Links & Switches StandardwidthlengthClock rateMB/sMax (Parallel) ATA8b0.5 m133 MHz1332 Serial ATA2b2 m3 GHz300? (Parallel) SCSI16b12 m80 MHz (DDR)32015 Serial Attach SCSI1b10 m --37516,256 PCI32/640.5 m33 / 66 MHz533? PCI Express2b0.5 m3 GHz250? No. bits and BW is per direction  2X for both directions (not shown). Since use fewer wires, commonly increase BW via versions with 2X-12X the number of wires and BW –…but timing problems arise

CS136 29 Storage Example: Internet Archive Goal of making a historical record of the Internet –Internet Archive began in 1996 –Wayback Machine interface performs time travel to see what a web page looked like in the past Contains over a petabyte (10 15 bytes) –Growing by 20 terabytes (10 12 bytes) of new data per month Besides storing historical record, same hardware crawls Web to get new snapshots

CS136 30 Internet Archive Cluster 1U storage node PetaBox GB2000 from Capricorn Technologies Has 4 500-GB Parallel ATA (PATA) drives, 512 MB of DDR266 DRAM, G-bit Ethernet, and 1 GHz C3 processor from VIA (80x86). Node dissipates  80 watts 40 GB2000s in a standard VME rack,  80 TB raw storage capacity 40 nodes connected with 48-port Ethernet switch Rack dissipates about 3 KW 1 Petabyte = 12 racks

CS136 31 Estimated Cost Via processor, 512 MB of DDR266 DRAM, ATA disk controller, power supply, fans, and enclosure = $500 7200 RPM 500-GB PATA drive = $375 (in 2006) 48-port 10/100/1000 Ethernet switch and all cables for a rack = $3000 Total cost $84,500 for an 80-TB rack 160 Disks are  60% of total

CS136 32 Estimated Performance 7200 RPM drive: –Average seek time = 8.5 ms –Transfer bandwidth 50 MB/second –PATA link can handle 133 MB/second –ATA controller overhead is 0.1 ms per I/O VIA processor is 1000 MIPS –OS needs 50K CPU instructions for a disk I/O –Network stack uses 100K instructions per data block Average I/O size: –16 KB for archive fetches –50 KB when crawling Web Disks are limit: –  75 I/Os/s per disk, thus 300/s per node, 12000/s per rack –About 200-600 MB/sec bandwidth per rack Switch must do 1.6-3.8 Gb/s over 40 Gb/s links

CS136 33 Estimated Reliability CPU/memory/enclosure MTTF is 1,000,000 hours (x 40) Disk MTTF 125,000 hours (x 160) PATA controller MTTF 500,000 hours (x 40) PATA cable MTTF 1,000,000 hours (x 40) Ethernet switch MTTF 500,000 hours (x 1) Power supply MTTF 200,000 hours (x 40) Fan MTTF 200,000 hours (x 40) MTTF for system works out to 531 hours (  3 weeks) 70% of failures in time are disks 20% of failures in time are fans or power supplies

CS136 34 Summary Little’s Law: Length system = rate x Time system (Mean no. customers = arrival rate x mean service time) Appreciation for relationship of latency and utilization: Time system = Time server + Time queue Time queue = Time server x ρ/(1-ρ) Clusters for storage as well as computation RAID: Reliability matters, not performance ProcIOCDevice Queue server System

CS 136, Advanced Architecture Storage Performance Measurement.

Similar presentations

Presentation on theme: "CS 136, Advanced Architecture Storage Performance Measurement."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 136, Advanced Architecture Storage Performance Measurement.

Similar presentations

Presentation on theme: "CS 136, Advanced Architecture Storage Performance Measurement."— Presentation transcript:

Similar presentations

About project

Feedback