Download presentation
Presentation is loading. Please wait.
Published byXavier Fernando Jiménez Redondo Modified over 6 years ago
1
Latency as a Performability Metric: Experimental Results
Pete Broadwell, UC Berkeley Motivation Abstract Results Goal of ROC project: develop metrics to evaluate new recovery techniques Assertion: latency and data quality are better than throughput for describing the user experience provided during a failure2 What are the best ways to represent latency in performability reports? This study uses experimental results obtained with the PRESS web server and Mendosus fault injection system to consider the best ways to present latency-based measurements of the behavior of online services during failures. Throughput-based availability: responses served total requests Latency-based “punctuality”: ideal latency actual total latency Test setup Fault Injection: Mendosus Mendosus injects faults into different versions of PRESS Perf-PRESS: basic HA-PRESS: made for high availability Workstations (real or VMs) Can combine latency & throughput into “demerits” Global Controller (Java) Modified NIC driver SCSI module proc module Demerit formula: Aborted conn: 2 Conn error: 1 User timeout: 8 Sec of total latency above ideal: 1 x scaling factor App hang App crash Node crash Node freeze Link down Apps config file LAN emu config file Fault config file User-level daemon (Java) apps What is “performability”? Class of metrics to describe how failures influence the performance of a system1 Performability metrics for Internet services: Throughput - requests/sec Latency - response time Data quality – harvest (response completeness) & yield (% queries answered) Emulated LAN Sample result: server crash Perf-PRESS HA-PRESS Fault test cases Category Fault Possible Root Cause Node Node crash Operator error, OS bug, hardware component failure, power outage Node freeze OS or kernel module bug Application App crash Application bug or resource unavailability App hang Application bug or resource contention with other processes Network Link down or flaky Broken, damaged or misattached cable Switch down or flaky Damaged or misconfigured switch, power outage Recovery stages Perf FAILURE RECOVERY Time 1 J. F. Meyer, Performability Evaluation: Where It Is and What Lies Ahead, 1994 2 Zona Research and Keynote Systems, The Need for Speed II, 2001
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.