Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001.

Slides:



Advertisements
Similar presentations
Chapter 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Advertisements

Fabián E. Bustamante, Winter 2006 Recovery Oriented Computing Embracing Failure A. B. Brown and D. A. Patterson, Embracing failure: a case for recovery-
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
1 Module 10 Managing Partitions. 2  Overview Partitioning a Disk Using Disk Administrator General Maintenance and Troubleshooting.
Log Tuning. AOBD 2007/08 H. Galhardas Atomicity and Durability Every transaction either commits or aborts. It cannot change its mind Even in the face.
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
Slide 1 Initial Availability Benchmarking of a Database System Aaron Brown DBLunch Seminar, 1/23/01.
SQL Server on a Cluster Experiences Mike FITZSIMON SYSTEMSARCHITECT F ITZSIMON IT C ONSULTING PTY LTD.
Slide 1 Availability and Maintainability Benchmarks A Case Study of Software RAID Systems Aaron Brown, Eric Anderson, and David A. Patterson Computer Science.
DISTRIBUTED DATABASE. Centralized & Distributed Database  Single site database – centralized database –A database is located at a single site or distributed.
Slide 1 Computers for the Post-PC Era David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley.
ISCSI Performance in Integrated LAN/SAN Environment Li Yin U.C. Berkeley.
Lesson 18 – INSTALLING AND SETTING UP WINDOWS 2000 SERVER.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Slide 1 Towards Benchmarks for Availability, Maintainability, and Evolutionary Growth (AME) A Case Study of Software RAID Systems Aaron Brown 2000 Winter.
Slide 1 Availability and Maintainability Benchmarks A Case Study of Software RAID Systems Aaron Brown, Eric Anderson, and David A. Patterson Computer Science.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
VMware vCenter Server Module 4.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Dedicated Windows 2003 Servers Dedicated Windows 2003 Servers Application Server Application Server Database Server Database Server Web Server Web Server.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
1 Motivation Goal: Create and document a black box availability benchmark Improving dependability requires that we quantify the ROC-related metrics.
Administration etc.. What is this ? This section is devoted to those bits that I could not find another home for… Again these may be useless, but humour.
Hard Drives Non-Volatile Storage. Hard Drives Hard Drives (HD) The primary storage device in a computer system.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.

Dependability benchmarking for transactional and web systems Henrique Madeira University of Coimbra, DEI-CISUC Coimbra, Portugal.
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
AppMetrics and SCOM Working Together to Maximize the availability of Your applications.
Naaliel Mendes, João Durães, Henrique Madeira CISUC, Department of Informatics Engineering University of Coimbra {naaliel, jduraes,
Chapter Fourteen Windows XP Professional Fault Tolerance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Window NT File System JianJing Cao (#98284).
Enterprise Computing With Aspects of Computer Architecture Jordan Harstad Technology Support Analyst Arizona State University.
COMP1321 Digital Infrastructure Richard Henson February 2014.
ATG Environment Setup In this session you will learn – Setting Up ATG environment – Creating new ATG application – Configuring Data Source – Configuring.
1 Web Server Administration Chapter 2 Preparing For Server Installation.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.
Slide 1 Initial Availability Benchmarking of a Database System Aaron Brown 2001 Winter ISTORE Retreat.
Concurrency Control. Objectives Management of Databases Concurrency Control Database Recovery Database Security Database Administration.
1 Chapter Overview Planning to Install SQL Server 2000 Deciding SQL Server 2000 Setup Configuration Options Running the SQL Server 2000 Setup Program Using.
Slide 1 What Happens Before A Disk Fails? Randi Thomas, Nisha Talagala
Transactions.
Backing Up and Restoring Databases by Using the SQL Server 2000.
Install, configure and test ICT Networks
CERN - European Organization for Nuclear Research FOCUS March 2 nd, 2000 Frédéric Hemmer - IT Division.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Chapter 5 Server Installation NT Server Requirements NT Server File Systems Installation.
COMP1321 Digital Infrastructure Richard Henson March 2016.
Answer to Summary Questions
Fail-stutter Behavior Characterization of NFS
High Availability 24 hours a day, 7 days a week, 365 days a year…
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
Maximum Availability Architecture Enterprise Technology Centre.
Latency as a Performability Metric: Experimental Results
Web Server Administration
Fault Tolerance Distributed Web-based Systems
Concurrency Control.
Presentation transcript:

Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001

Slide 2 Motivation Drinking the availability Kool-Aid –availability is the key metric for modern apps. Database stack’s availability is especially important –guardians of the world’s hard state –almost any user’s request for electronic information hits a database stack »web services, directories, enterprise apps,... Can we trust database software stacks in the face of failure?

Slide 3 Availability benchmarks quantify system behavior under failures, maintenance, recovery They require –a realistic workload for the system: TPC-C –quality of service metrics: txn rates, OK and aborted –fault-injection to simulate failures: single-disk errors Repair Time QoS degradation failure normal behavior (99% conf.) Availability benchmarking 101

Slide 4 Well, what happens? Setup –3-tier: Microsoft SQLServer/COM+/IIS & bus. logic –TPC-C-like workload; faults injected into DB data & log Results –DBMS tolerates transient and recoverable failures, reflecting errors back via transaction aborts –middleware highly unstable: degrades or crashes when DBMS fails or undergoes lengthy recovery Disk hang during write to data disk sticky uncorrectable write error, log disk middleware causes degraded performance database recovers database fails, middleware degrades middleware crashes

Slide 5 Summary Database is pretty resilient –transaction abort == good error-reflection mechanism Middleware/applications suck (well, at least this instance of them) Robustness is end-to-end –user cannot distinguish DBMS and middleware failures –failure recovery must go beyond the DBMS Achievable Grand Challenges? –build and run availability benchmarks on your systems –tolerate and recover from non-failstop system-level faults Does performance matter?

Slide 6 Backup slides

Slide 7 Experimental setup Database –Microsoft SQL Server 2000, default configuration Middleware/front-end software –Microsoft COM+ transaction monitor/coordinator –IIS 5.0 web server with Microsoft’s tpcc.dll HTML terminal interface and business logic –Microsoft BenchCraft remote terminal emulator TPC-C-like OLTP order-entry workload –10 warehouses, 100 active users, ~860 MB database Measured metrics –throughput of correct NewOrder transactions/min –rate of aborted NewOrder transactions (txn/min)

Slide 8 Experimental setup (2) Database installed in one of two configurations: –data on emulated disk, log on real (IBM) disk –data on real (IBM) disk, log on emulated disk IBM 18 GB 10k RPM DB Server IDE system disk = Fast/Wide SCSI bus, 20 MB/sec Adaptec 3940 Emulated Disk DB data/ log disks Front End SCSI system disk 100mb Ethernet IBM 18 GB 10k RPM SCSI system disk Disk Emulator Intel P-II/ MB DRAM Windows NT 4.0 Adaptec 2940 emulator backing disk (NTFS) AdvStor ASC-U2W UltraSCSI ASC VirtualSCSI lib. Intel P-III/ MB DRAM Windows 2000 AS MS BenchCraft RTE IIS + MS tpcc.dll MS COM+ AMD K6-2/ MB DRAM Windows 2000 AS SQL Server 2000

Slide 9 Results All results are from single-fault micro- benchmarks 14 different fault types –injected once for each of data and log partitions 4 categories of behavior detected 1) normal 2) transient glitch 3)degraded 4)failed

Slide 10 Type 1: normal behavior System tolerates fault Demonstrated for all sector-level faults except: –sticky uncorrectable read, data partition –sticky uncorrectable write, log partition

Slide 11 Type 2: transient glitch One transaction is affected, aborts with error Subsequent transactions using same data would fail Demonstrated for one fault only: –sticky uncorrectable read, data partition

Slide 12 Type 3: degraded behavior DBMS survives error after running log recovery Middleware partially fails, results in degraded perf. Demonstrated for one fault only: –sticky uncorrectable write, log partition

Slide 13 Type 4: failure DBMS hangs or aborts all transactions Middleware behaves erratically, sometimes crashing Demonstrated for all fatal disk-level faults –SCSI hangs, disk power failures Example behaviors (10 distinct variants observed) Disk hang during write to data diskSimulated log disk power failure