Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring End-User Availability on the Web: Practical Experience Matthew Merzbacher (visiting research scientist) Dan Patterson (undergraduate) Recovery-Oriented.

Similar presentations


Presentation on theme: "Measuring End-User Availability on the Web: Practical Experience Matthew Merzbacher (visiting research scientist) Dan Patterson (undergraduate) Recovery-Oriented."— Presentation transcript:

1 Measuring End-User Availability on the Web: Practical Experience Matthew Merzbacher (visiting research scientist) Dan Patterson (undergraduate) Recovery-Oriented Computing (ROC) University of California, Berkeley http://roc.cs.berkeley.edu

2 E—Commerce Goal Non-stop Availability –24 hours/day –365 days/year How realistic is this goal? How do we measure availability? –To evaluate competing systems –To see how close we are to optimum

3 The State of the World Uptime measured in “nines” –Four nines == 99.99% uptime (just under an hour downtime per year) –Does not include scheduled downtime Manufacturers advertise six nines –Under 30s unscheduled downtime/year –May be true in perfect world –Not true in practice on real Internet

4 Measuring Availability Measuring “nines” of uptime is not sufficient –Reflects unrealistic operating conditions Must capture end-user’s experience –Server + Network + Client Client Machine and Client Software

5 Existing Systems Topaz, Porvio, SiteAngel –Measure response time, not availability –Monitor service-level agreements NetCraft –Measures availability, not performance or end-user experience We measured end-user experience and located common problems

6 Experiment “Hourly” small web transactions –From two relatively proximate sites (Mills CS, Berkeley CS) –To a variety of sites, including Internet Retailer (US and international) Search Engine Directory Service (US and international) Ran for 6+ months

7 Availability: Did the Transaction Succeed? AllRetailerSearchDirectory Raw (Overall).9305.9311.9355.9267 Ignoring local problems.9888.9887.9935.9857 Ignoring local and network problems.9991.99761.00.9997 Ignoring local, network, and transient problems.9994.99841.00.9999

8 Types of Errors Local (82%) Network: Medium (11%) Severe (4%) Server (2%) Corporate (1%)

9 Client Hardware Problems Dominate User Experience System-wide crashes Administration errors Power outages And many many more… –Many, if not most, caused or aggravated by human error

10 What About Speed?

11 Does Retry Help? Error TypeAllRetailerSearchDirectory Client 0.2670.2710.265 Medium Network 0.8620.8700.9290.838 Severe Network 0.7890.9231.000.689 Server 0.9110.7861.000.96 Corporate 0.4210.3121.00n/a Green > 80%Red < 50%

12 What Guides Retry? Uniqueness of data Importance of data to user Loyalty of user to site Transience of information And more…

13 Conclusion Experiment modeled user experience Vast majority (81%) of errors were on the local end Almost all errors were in the “last mile” of service Retry doesn’t help for local errors –User may be aware of the problem and therefore less frustrated by it


Download ppt "Measuring End-User Availability on the Web: Practical Experience Matthew Merzbacher (visiting research scientist) Dan Patterson (undergraduate) Recovery-Oriented."

Similar presentations


Ads by Google