Download presentation
Presentation is loading. Please wait.
Published byEdward French Modified over 9 years ago
1
1 Automatic Misconfiguration Disagnosis with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang Microsoft Research OSDI 2004, San Francisco, CA
2
2 Misconfiguration Diagnosis Technical support contributes 17% of TCO [Tolly2000] Much of application malfunctioning comes from misconfigurations Why? –Shared configuration data (e.g., Registry) and uncoordinated access and update from different applications How about maintaining the golden config state? –Very hard [Larsson2001] Complex software components and compositions Third party applications …
3
3 Outline Motivation Goals Design Prototype Evaluation results Future work Concluding remarks
4
4 Goals Effectiveness –Small set of sick configuration candidates that contain the root-cause entries Automation –No second party involvement –No need to remember or identify what is healthy
5
5 Intuition behind PeerPressure Assumption –Applications function correctly on most machines -- malfunctioning is anomaly Succumb to the peer pressure
6
6 An Example SuspectsMineP1’sP2’sP3’sP4’s e101111 e2on off e3574010034 Is R1 sick? Most likely Is R2 sick? Probably not Is R3 sick? Maybe not – R3 looks like an operational state We use Bayesian statistics to estimate the sick probability of a suspect -- our ranking metric
7
7 Registry Entry Suspects 0HKLM\System\Setup\... OnHKLM\Software\Msft\... nullHKCU\%\Software\... DataEntry PeerPressure Search & Fetch Statistical Analyzer Canonicalizer Peer-to-Peer Troubleshooting Community Database Troubleshooting Result 0.2HKLM\System\Setup\... 0.6HKLM\Software\Msft\... 0.003HKCU\%\Software\... Prob.Entry App Tracer Run the faulty app System Overview
8
8 The Sick Probability P(Sick) = (N + c) / (N + ct + cm (t-1) ) –N: # of the samples –C: cardinality –t: the number of suspects –m: the number of entries that match the suspect entry value Properties: –As m increases, P decreases –As c increases, P decreases; when m = 0, smaller c implies smaller p
9
9 The PeerPressure Prototype Database of 87 live Windows XP registry snapshots as our sample pool –hierarchical persistent storage for named, typed entries PeerPressure troubleshooter implemented in C# Needed to “sanitize” the entry values –1, “1”, “#1” –Heuristics: unifying values of entries with different types
10
10 Outline Motivation Goals Design Prototype Evaluation results Future work Concluding remarks
11
11 Windows Registry Characteristics Max size: 333,193 Min size: 77,517 Average size: 198,376 Median size: 198,608 Cardinality: 87% 1, 94% <=2 Distinct canonicalized entries in GeneBank 1,476,665 Common canonicalized entries 43,913 Distinct entries data-sanitized 1,820,706
12
12 Evaluation Data Set 87 live Windows XP registry snapshots (in the database) –Half of these snapshots are from three diverse organizations within Microsoft: Operations and Technology Group (OTG) Helpdesk in Colorado, MSR-Asia, and MSR-Redmond. –The other half are from machines across Microsoft that were reported to have potential Registry problems 20 real-world troubleshooting cases with known root-causes
13
13 Response Time # of suspects: 8 to 26,308 with a median: 1171 45 seconds in average for SQL server hosted on a 2.4GHz CPU workstation with 1 GB RAM Sequential database queries dominate
14
14 Troubleshooting Effectiveness Metric: root cause ranking Results: –Rank = 1 for 12 cases –Rank = 2 for 3 cases –Rank = 3, 9, 12, 16 for 4 cases, respectively –cannot solve one case
15
15 Source of False Positives Nature of the root-cause entry –Root cause entry has a large cardinality How unique other suspects –A highly customized machine likely produces more noise The database is not pristine
16
16 Impact of the Sample Set Size Larger sample set doesn’t necessarily indicate better accuracy –Strong conformity doesn’t depend on the number of samples –Operational state doesn’t depend on the number of samples –Only helps with non-pristine sample set 10 samples are large enough for most cases
17
17 Related Work Blackbox-based techniques –Strider: need to identify the healthy [Wang ‘03] –Hardware, software component dependencies [Brown ‘01] Much prior on leveraging statistics to pinpoint anomaly –Bug as deviant behavior [Engler et al SOSP ‘01] –Host-based intrusion detection based on system calls [Forrest ’96] and based on registry behavior [Apap et al, ‘99]
18
18 Future Work Only scratch the surface! Multiple root cause entries Cross-application troubleshooting Database maintenenance Privacy –Friends Troubleshooting Network
19
19 Concluding Remarks Automatic misconfiguration diagnosis is possible –Use statistics from the mass to automate manual identification of the healthy –Initial results promising
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.