Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:

Similar presentations


Presentation on theme: "Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:"— Presentation transcript:

1 Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter: Sara Salahi Northwestern University

2 Agenda Importance of this work Key ideas PeerPressure: Architecture & Algorithm Prototype Performance Future Work

3 Tech support = 17% total cost of ownership of today’s desktop PCs Large amount of Tech support is spent on troubleshooting Many troubleshooting cases are due to misconfiguration Misconfiguration is often caused by data that is in shared persistent stores (e.g. Windows registry) Importance Authors focus on this

4 Key Ideas: Misconfigurations Can have many different “root causes” –Seemingly innocuous changes to shared system configurations –System bugs –Security patches may introduce incompatible registry settings –Failed uninstallation of applications –Manual intervention using Registry editor

5 Key Ideas: The Golden State “Golden State” – a perfect configuration Assume that the golden state is in the mass Combine statistical golden state with Bayesian statistics to identify anomalous misconfigurations on “sick” machines

6 Key Ideas: Goals of Troubleshooting Effectiveness –System should identify a small set of sick configuration candidates in a short amount of time Automation –Minimize number of manual steps and number of users involved

7 PeerPressure: Architecture 1) Sick computer  2) I found you 3) Turns user- or machine-specific entries into canonicalized form 4) Database containing a number of machine configuration snapshots 5) Bayesian estimation used to calculate probability of a suspect being sick

8 Manual Steps –User runs faulty application to record suspects –User determines if sickness is cured Manual steps involve only the troubleshooting user and no second-party PeerPressure: Architecture

9 PeerPressure: Algorithm Intuition and Objectives e1: Probably healthy e2: Most probably sick e3: “Natural biological diversity” Type I : application configuration states –e1 and e2 Type II : operational states (timestamps, caches etc) –e3 –Want to weed out; most likely false positives

10 PeerPressure: Algorithm Formulation: (3) + (1)  when m=0, P(S|V) = 1 Bayesian estimation used to overcome this. Vector p j : probability of event happening and its outcome being V j ; p j follows Direchtlet distribution. m j : count of number of values matching suspect value

11 PeerPressure: Algorithm Asymptotic Analysis:

12 Prototype GeneBank Database: Microsoft SQL Server 2000 containing snapshots from 87 Windows XP PCs PeerPressure troubleshooter implemented in C# “Data Sanitization” –Unification of different representations of the same value Dual Intel Xeon 2.4 GHz CPU workstation with 1 Gb RAM hosts SQL Server

13 Performance Response Time vs. Number of Suspects 20 real-world troubleshooting cases used Database queries dominate troubleshooting response time (one query per suspect entry)

14 Prototype: GeneBank Registry characteristics in GeneBank Unseen – values that are unknown to the GeneBank, increments observed cardinality by 1 –Any entry from GeneBank has cardinality of at least 2 Entries that do no exist on some sample machines have value no entry When cardinality is low, conformity among samples is strong

15 Performance Root-Cause Ranking Results 87% have cardinality of 2, 94% no more than 3, 97% no more than 4

16 Performance False Positives Large cardinality of root-cause entry Relation between root-cause entry and other entries in the suspect set GeneBank is not pristine

17 Performance Impact of Sample Set Size

18 Performance Sick Machine Sensitivity Format: RootCauseRanking (NumberOfTies) / NumberOfSuspects

19 Future Work Multi-gene troubleshooting –Multiple sick entries among suspects Cross-application misconfiguration Heavy customization of apps can break assumption of strong conformance in most configuration entries GeneBank maintenance – privacy issue


Download ppt "Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:"

Similar presentations


Ads by Google