An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Chapter 4 Quality Assurance in Context
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Chapter 6 - Part 1 Introduction to SPC.
Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
SE 450 Software Processes & Product Metrics Reliability Engineering.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions Dawson Engler, Benjamin Chelf, Andy Chow, Seth Hallem Computer Systems.
1 Validation and Verification of Simulation Models.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
MULTIVIE W Checking System Rules Using System-Specific, Program-Written Compiler Extensions Paper: Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem.
Statistical Methods Descriptive Statistics Inferential Statistics Collecting and describing data. Making decisions based on sample data.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
Introduction to Survival Analysis PROC LIFETEST and Survival Curves.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
/* iComment: Bugs or Bad Comments? */
Intrusion Detection System Marmagna Desai [ 520 Presentation]
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2006 Exterminator: Automatically Correcting Memory Errors Gene Novark, Emery Berger.
State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.
OKU 9 Chapter 15: ORTHOPAEDIC RESEARCH Brian E. Walczak.
University of Maryland Bug Driven Bug Finding Chadd Williams.
© 2003, Carla Ellis Experimentation in Computer Systems Research Why: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
5.1 Advanced Operating Systems Operating Systems Bugs Linux's code has been significantly growing during last years. Is this code bugs free? Obviously.
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
© 2003, Carla Ellis Simulation Techniques Overview Simulation environments emulation exec- driven sim trace- driven sim stochastic sim Workload parameters.
Static Code Checking: Security and Concurrency Ben Watson The George Washington University CS 297 Security and Programming Languages June 9, 2005.
1 Evaluating Code Duplication Detection Techniques Filip Van Rysselberghe and Serge Demeyer Lab On Re-Engineering University Of Antwerp Towards a Taxonomy.
Chapter 6 Business and Economic Forecasting Root-mean-squared Forecast Error zUsed to determine how reliable a forecasting technique is. zE = (Y i -
Week 9 Data structures / collections. Vladimir Misic Week 9 Monday, 4:20:52 PM2 Data structures (informally:) By size: –Static (e.g. arrays)
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author.
© 2003, Carla Ellis Self-Scaling Benchmarks Peter Chen and David Patterson, A New Approach to I/O Performance Evaluation – Self-Scaling I/O Benchmarks,
Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions Dawson Engler Benjamin Chelf Andy Chou Seth Hallem Stanford University.
Simulation Techniques Overview Simulation environments emulation/ exec- driven event- driven sim trace- driven sim stochastic sim Workload parameters System.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Uncertainty in AVO: How can we measure it? Dan Hampson, Brian Russell
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Radiation Detection and Measurement, JU, 1st Semester, (Saed Dababneh). 1 Radioactive decay is a random process. Fluctuations. Characterization.
Chapter 2.11 Program Validation. Reliable System = Reliable Hardware AND Reliable Software AND Compatible Hardware and Software.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Chap 1-1 Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population.
Treat everyone with sincerity,
1 Oct 2009Paul Dauncey1 Status of 2D efficiency study Paul Dauncey.
Random Test Generation of Unit Tests: Randoop Experience
The Population of Near-Earth Asteroids and Current Survey Completion Alan W. Harris MoreData! : The Golden Age of Solar System Exploration Rome,
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 8 1 MER301: Engineering Reliability LECTURE 8: Chapter 4: Statistical Inference,
© 2003, Carla Ellis Model Vague idea “groping around” experiences Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Process Management Deadlocks.
Psychology 202a Advanced Psychological Statistics
Descriptive and inferential statistics. Confidence interval
College of Computer Science and Engineering
Tools.
Tools.
CSE 542: Operating Systems
Psych 231: Research Methods in Psychology
Statistical Thinking and Applications
Presumptions Subgroups (samples) of data are formed.
Empirical Distributions
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.

Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark programs Synthetic benchmark programs Traces Distributions & other statistics monitor analysis generator Synthetic traces Made-up © 2006, Carla Ellis Data sets Linux Compiler analysis

Method: Checkers Evolution: 21 snapshots of Linux over 7 years Structure: 7 main subdirectories Over 1000 unique errors detected.

Metrics Inspected errors – manually reviewed and propagated back through versions Projected errors – automatically found by low false positive checkers Notes – number of time check applied Relative error rate – errors/notes

Caveats Compiler analysis – is the set targeted representative of all bugs? All bugs treated equally vs. important bugs Narrow focus – claim: unlikely to have bad code that doesn’t expose some of the errors they look for Low level bookkeeping operations

Size of Subdirectories

Projected Bug Counts

Where are the errors?

Error-rate by function size

Log series distribution o data points x distribution  = 0.567

Bug Lifetimes

Error birth for 2.4.1

Birth & Death Just Block, Null, and Var (low false positive checkers) Bottom graph – shift to connect peaks Mostly using odd numbered releases toward lifetimes

Kaplan-Meier Estimates of Lifetime Method deals with censoring (truncating) Survives at least as long as… Issues included granularity & interference by finding errors in previous work.

Do bugs cluster? Expect that the #errors would be stable fraction of # notes, but spikey A: 80% errors accounted for by 50% of files with errors B&C: random exp

Global cluster metric c theor uses the log series distr. c > 1 means more clustering than random

Intuitively, why clusters? Wide-spread ignorance of the system rules Poor programming in focused place Cut and paste errors Less executed code is less well-tested

Summary Driver code is error-prone Error distributions seem to fit log series distribution Average lifetime of bugs 1.8 years Clustering exists.

For next Tuesday Chapter 10 Assignment on data presentation. Actually the more examples the better but I’d rather have 1 exceptionally bad example than a survey of garden-variety plots. Of potential interest: BugBench – a benchmark suite of known buggy programs