Download presentation
Presentation is loading. Please wait.
Published byCarmel Terry Modified over 9 years ago
1
Before Terabytes Fall Disk reliability in Windows Vista and beyond Frank Shu Program Manager WDEG-Storage Microsoft Corporation Matthew Kerner Program Manager Windows Diagnosis Microsoft Corporation
2
Windows Storage Devices Strategic pillars Optical Platform Client/Consumer Storage Fabrics Server/Enterprise Personal Storage Client/Consumer Preferred Storage Platform Partner/Customer Timely, comprehensive, quality platform support for optical devices Optimized platform features enabling your Windows experience, here and now Leading platform enabling storage fabric adoption Preferred platform for developing, deploying, and using storage devices
3
Session Outline Introduction (Frank Shu) Windows Vista Disk Diagnostics (Matthew Kerner) Future Technology (Frank Shu) Demo (Microsoft and Samsung)
4
What Matters Most To Our Users? A consumer bought a new computer and it works great at work and at home. She couldn’t do her everyday tasks without it. What matters most to her? a) CPU power b) Network connection c) Battery life d) Something else…
5
The Answer Is… The Data
6
Protecting Data: Windows Vista disk diagnostics Matthew Kerner
7
Quantifying Disk Failures Catastrophic disk failures ~200 disks replaced per week at Microsoft in 2003 Top driver of Microsoft support’s hardware- related support calls in both client and server Based on Microsoft figures, disk failures cost many millions of dollars per year in enterprises Localized failures (bad blocks) Kernel and user-mode crashes 1.7% of customer-report Microsoft Online Crash Analysis crashes are due to disk errors Application hangs during read recovery
8
Disk Failure Mitigations Prevention Hybrid hard disks (mobile systems) RAID Catastrophic failure recovery Data backup Disk replacement Localized failure recovery Repair from redundant copy Restore from backup
9
Windows Vista Disk Diagnostics Purpose: Save user data before catastrophic disk failure Client SKUs Self Monitoring And Reporting Technology (S.M.A.R.T.) polling triggers diagnostic Uses S.M.A.R.T. trip status – no threshold/attribute comparison Warns user of impending failure and walks them through backup and replacement Windows Vista backup improvements
10
Disk Diagnostics Details Disk class driver polls S.M.A.R.T. status hourly as it has done since Windows 2000 Based on industry feedback, no use of Disk Self-Test or attribute comparison Failure triggers user-mode code Filter out duplicate failures Log SMART READ LOG details to OS event log Device error count from summary error log sector Life timestamp from most recent error log entry Trigger user-context interactive resolution Customizable by Group Policy Print instructions, walk user through backup
11
Startup Repair/Windows Recovery Environment Purpose: Recover from non-bootable states, including those caused by disk failures Automatic failover on boot failure to recovery partition Optionally deployed by OEM Available on installation media Hands-free diagnosis and repair of top non-boot issues
12
Corrupted File Recovery Purpose: Turn repeat user-mode crashes caused by corrupted system binaries into one-time crash with silent repair from cache Windows Error Reporting crash handler triggers diagnostic on inpage error crashes due to bad blocks Diagnoses corrupted system files Silent repair from System File Cache
13
Windows Vista Disk Diagnostics Matthew Kerner Program Manager Windows Diagnosis
14
Opportunities For Future Technology Proactive failure prevention Reduce scenario pain by enabling resolutions other than just data recovery Requires finer-grained failure description to help host choose the best resolution Increase warning time before failures to allow users to save data
15
Future Technology: Protecting User Data And Preventing Hard Drive Failure Proactively Frank Shu
16
What Is PRCS? Proactive Reporting and Correcting Safeguard (PRCS) enables a device and host to correct failure conditions proactively Device can report hostile conditions before damage or failure occurs Host reacts to a device event in real time based on policy and user preference A proposal for the PRCS protocol has been submitted to T13
17
Why Is PRCS Important? User’s digital data is more valuable than ever before Disk drive capacity continue to increase Not every PC user can afford RAID Deliver on opportunities for improvements beyond S.M.A.R.T.
18
Goals Of PRCS Proactively protect user data Improve the user experience when data is at risk Reduce OEM’s customer support costs Reduce warranty costs for disk drive vendors
19
PRCS Features Device monitors its own conditions in real time Reduce host monitoring performance impact Device sends meaningful PRCS events to the host for correction of hostile conditions and data protection No translations or guesses required Host acts on device’s PRCS event proactively according to policy and user preference
20
PRCS Advantages PRCS is proactive Taking a corrective action before errors occur Protecting data when it is at risk PRCS is designed for end users, not just computer experts No need to understand a cryptic message to benefit from PRCS. For example: “The previous self-test completed having the electrical element of the test failed” PRCS enables transparent mitigation of a hostile condition or a recovery process Users do not need to configure a self-test mode or reporting method Users control policy as desired
21
Proactive Disk Diagnostics Debasis Baral Vice President of Engineering Samsung
22
HDD Reliability 101 HDD reliability and performance is negatively impacted by extremes in the following operating conditions TemperatureDemo VibrationDemo Shock Demo Duty cycle AltitudeHumidity A combination of the above conditions A history of the above combinations
23
Reliability Versus Temperature HDD life decreases with temperature Failure rates increase exponentially with temperature for all HDD suppliers Environmental temperature increase from 25C to 100C could translate into 10 – 50x shorter life Ref.: Samsung reliability tests Samsung HDD Lab Engineering Sample Data
24
Performance Versus Vibration Data throughput or drive performance can be significantly affected in the presence of vibration Effect of vibration is reversible Cumulative effects of vibration on long term drive reliability is a subject of ongoing research Samsung HDD Lab Engineering Sample Data
25
Reliability Versus Shock Excessive shock is the major cause of failure in both PC and consumer electronics environments Shock Modeling Courtesy: E. Jayson and Frank Talke, UC San Diego Op. Shock Scratches Damage by corners, leading edge, and side edges of the slider. Operating shock damage Non-operating shock damage
26
Reliability Design Guidelines Failure modes and failure rates of disk drives depend on their operating environments Temperature and Handling (shock and vibration) are major factors impacting HDD reliability HDD reliability will be enhanced if OS detects and manages reliability risks and stress events intelligently (PRCS) Users can improve HDD data reliability by correctly responding to PRCS events
27
PRCS Kai Chen Microsoft Corporation Debasis Baral Samsung
28
Call To Action Test your drives with Windows Vista Disk Diagnostics and send feedback Ensure your drives comply with ATA-7 specs to surface device error count and life timestamp Engage with the Startup Repair team to build a plan for Startup Repair in OEM factory processes Participate in T13 discussions on PRCS Plan your device designs in line with PRCS guidelines
29
Additional Resources Whitepapers Windows Recovery Environment/Startup Repair/Built-in Diagnostics: http://www.microsoft.com/technet/windowsvista/evaluate/feat/relperf.mspx http://www.microsoft.com/technet/windowsvista/evaluate/feat/relperf.mspx Feedback/Questions Windows Vista Disk Diagnosis: Corrupt File Recovery: Windows Recovery Environment/Startup Repair: PRCS: Dfdfeed @ microsoft.com Recovery @ microsoft.com Prcsdisc @ microsoft.com
30
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.