Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001.

Similar presentations


Presentation on theme: "Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001."— Presentation transcript:

1 Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001

2 Slide 2 Motivation Drinking the availability Kool-Aid –availability is the key metric for modern apps. Database stack’s availability is especially important –guardians of the world’s hard state –almost any user’s request for electronic information hits a database stack »web services, directories, enterprise apps,... Can we trust database software stacks in the face of failure?

3 Slide 3 Availability benchmarks quantify system behavior under failures, maintenance, recovery They require –a realistic workload for the system: TPC-C –quality of service metrics: txn rates, OK and aborted –fault-injection to simulate failures: single-disk errors Repair Time QoS degradation failure normal behavior (99% conf.) Availability benchmarking 101

4 Slide 4 Well, what happens? Setup –3-tier: Microsoft SQLServer/COM+/IIS & bus. logic –TPC-C-like workload; faults injected into DB data & log Results –DBMS tolerates transient and recoverable failures, reflecting errors back via transaction aborts –middleware highly unstable: degrades or crashes when DBMS fails or undergoes lengthy recovery Disk hang during write to data disk sticky uncorrectable write error, log disk middleware causes degraded performance database recovers database fails, middleware degrades middleware crashes

5 Slide 5 Summary Database is pretty resilient –transaction abort == good error-reflection mechanism Middleware/applications suck (well, at least this instance of them) Robustness is end-to-end –user cannot distinguish DBMS and middleware failures –failure recovery must go beyond the DBMS Achievable Grand Challenges? –build and run availability benchmarks on your systems –tolerate and recover from non-failstop system-level faults Does performance matter?

6 Slide 6 Backup slides

7 Slide 7 Experimental setup Database –Microsoft SQL Server 2000, default configuration Middleware/front-end software –Microsoft COM+ transaction monitor/coordinator –IIS 5.0 web server with Microsoft’s tpcc.dll HTML terminal interface and business logic –Microsoft BenchCraft remote terminal emulator TPC-C-like OLTP order-entry workload –10 warehouses, 100 active users, ~860 MB database Measured metrics –throughput of correct NewOrder transactions/min –rate of aborted NewOrder transactions (txn/min)

8 Slide 8 Experimental setup (2) Database installed in one of two configurations: –data on emulated disk, log on real (IBM) disk –data on real (IBM) disk, log on emulated disk IBM 18 GB 10k RPM DB Server IDE system disk = Fast/Wide SCSI bus, 20 MB/sec Adaptec 3940 Emulated Disk DB data/ log disks Front End SCSI system disk 100mb Ethernet IBM 18 GB 10k RPM SCSI system disk Disk Emulator Intel P-II/300 128 MB DRAM Windows NT 4.0 Adaptec 2940 emulator backing disk (NTFS) AdvStor ASC-U2W UltraSCSI ASC VirtualSCSI lib. Intel P-III/450 256 MB DRAM Windows 2000 AS MS BenchCraft RTE IIS + MS tpcc.dll MS COM+ AMD K6-2/333 128 MB DRAM Windows 2000 AS SQL Server 2000

9 Slide 9 Results All results are from single-fault micro- benchmarks 14 different fault types –injected once for each of data and log partitions 4 categories of behavior detected 1) normal 2) transient glitch 3)degraded 4)failed

10 Slide 10 Type 1: normal behavior System tolerates fault Demonstrated for all sector-level faults except: –sticky uncorrectable read, data partition –sticky uncorrectable write, log partition

11 Slide 11 Type 2: transient glitch One transaction is affected, aborts with error Subsequent transactions using same data would fail Demonstrated for one fault only: –sticky uncorrectable read, data partition

12 Slide 12 Type 3: degraded behavior DBMS survives error after running log recovery Middleware partially fails, results in degraded perf. Demonstrated for one fault only: –sticky uncorrectable write, log partition

13 Slide 13 Type 4: failure DBMS hangs or aborts all transactions Middleware behaves erratically, sometimes crashing Demonstrated for all fatal disk-level faults –SCSI hangs, disk power failures Example behaviors (10 distinct variants observed) Disk hang during write to data diskSimulated log disk power failure


Download ppt "Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001."

Similar presentations


Ads by Google