An Empirical Study of OS Errors Chou, Yang, Chelf, Hallem, and Engler SOSP 2001 Characterizing a workload w.r.t reliability.
Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark programs Synthetic benchmark programs Traces Distributions & other statistics monitor analysis generator Synthetic traces Made-up © 2006, Carla Ellis Data sets Linux Compiler analysis
Method: Checkers Evolution: 21 snapshots of Linux over 7 years Structure: 7 main subdirectories Over 1000 unique errors detected.
Metrics Inspected errors – manually reviewed and propagated back through versions Projected errors – automatically found by low false positive checkers Notes – number of time check applied Relative error rate – errors/notes
Caveats Compiler analysis – is the set targeted representative of all bugs? All bugs treated equally vs. important bugs Narrow focus – claim: unlikely to have bad code that doesn’t expose some of the errors they look for Low level bookkeeping operations
Size of Subdirectories
Projected Bug Counts
Where are the errors?
Error-rate by function size
Log series distribution o data points x distribution = 0.567
Bug Lifetimes
Error birth for 2.4.1
Birth & Death Just Block, Null, and Var (low false positive checkers) Bottom graph – shift to connect peaks Mostly using odd numbered releases toward lifetimes
Kaplan-Meier Estimates of Lifetime Method deals with censoring (truncating) Survives at least as long as… Issues included granularity & interference by finding errors in previous work.
Do bugs cluster? Expect that the #errors would be stable fraction of # notes, but spikey A: 80% errors accounted for by 50% of files with errors B&C: random exp
Global cluster metric c theor uses the log series distr. c > 1 means more clustering than random
Intuitively, why clusters? Wide-spread ignorance of the system rules Poor programming in focused place Cut and paste errors Less executed code is less well-tested
Summary Driver code is error-prone Error distributions seem to fit log series distribution Average lifetime of bugs 1.8 years Clustering exists.
For next Tuesday Chapter 10 Assignment on data presentation. Actually the more examples the better but I’d rather have 1 exceptionally bad example than a survey of garden-variety plots. Of potential interest: BugBench – a benchmark suite of known buggy programs