Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.

Similar presentations


Presentation on theme: "An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability."— Presentation transcript:

1 An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.

2 Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark programs Synthetic benchmark programs Traces Distributions & other statistics monitor analysis generator Synthetic traces Made-up © 2006, Carla Ellis Data sets Linux Compiler analysis

3 Method: Checkers Evolution: 21 snapshots of Linux over 7 years Structure: 7 main subdirectories Over 1000 unique errors detected.

4 Metrics Inspected errors – manually reviewed and propagated back through versions Projected errors – automatically found by low false positive checkers Notes – number of time check applied Relative error rate – errors/notes

5 Caveats Compiler analysis – is the set targeted representative of all bugs? All bugs treated equally vs. important bugs Narrow focus – claim: unlikely to have bad code that doesn’t expose some of the errors they look for Low level bookkeeping operations

6 Size of Subdirectories

7 Projected Bug Counts

8 Where are the errors?

9

10 Error-rate by function size

11 Log series distribution o data points x distribution  = 0.567

12 Bug Lifetimes

13 Error birth for 2.4.1

14 Birth & Death Just Block, Null, and Var (low false positive checkers) Bottom graph – shift to connect peaks Mostly using odd numbered releases toward lifetimes

15 Kaplan-Meier Estimates of Lifetime Method deals with censoring (truncating) Survives at least as long as… Issues included granularity & interference by finding errors in previous work.

16 Do bugs cluster? Expect that the #errors would be stable fraction of # notes, but spikey A: 80% errors accounted for by 50% of files with errors B&C: random exp

17 Global cluster metric c theor uses the log series distr. c > 1 means more clustering than random

18 Intuitively, why clusters? Wide-spread ignorance of the system rules Poor programming in focused place Cut and paste errors Less executed code is less well-tested

19 Summary Driver code is error-prone Error distributions seem to fit log series distribution Average lifetime of bugs 1.8 years Clustering exists.

20 For next Tuesday Chapter 10 Assignment on data presentation. Actually the more examples the better but I’d rather have 1 exceptionally bad example than a survey of garden-variety plots. Of potential interest: BugBench – a benchmark suite of known buggy programs


Download ppt "An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability."

Similar presentations


Ads by Google