Download presentation
Presentation is loading. Please wait.
1
University of WashingtonComputing & Communications Ten Minutes on Five Nines Terry Gray Associate VP, IT Infrastructure University of Washington Common PROBLEMS Group 6 January 2005
2
University of WashingtonComputing & Communications Vision Systems/Services (and Staff!) characterized as Reliable and Responsive Reliability = job one But: I.T. = Inevitable Tensions We all want: High MTTF, Performance and Function Low MTTR and support cost The art is to balance those conflicting goals we are jugglers and technology actuaries
3
University of WashingtonComputing & Communications Success Metrics Tom’s Nobody gets hurt Nobody goes to jail Terry’s “Works fine, lasts a long time” Low ROI (Risk Of Interruption)
4
University of WashingtonComputing & Communications Design Tradeoffs Fault Zone size vs. Economy/Simplicity Reliability vs. Complexity Prevention vs. (Fast) Remediation Security vs. Supportability vs. Functionality Networks = Connectivity; Security = Isolation Balancing priorities (security vs. ops vs. function)
5
University of WashingtonComputing & Communications Context: A Perfect Storm Increased dependency on I.T. Decreased tolerance for outages Deferred maintenance Inadequate infrastructure investment Some extraordinarily fragile applications Fragmented host management Increasingly hostile network environment esp. spam, spyware, social engr attacks Increasing legal/regulatory liability Highly de-centralized culture Growth of portable devices
6
University of WashingtonComputing & Communications System Elements Environmentals (Power, A/C, Physical Security) Network Client Workstations (incl. portable devices) Servers Applications Personnel, Procedures, Policy, and Architecture Failures at one level can trigger problems at another level; need Total System perspective
7
University of WashingtonComputing & Communications Dimensions How often is there a user-visible failure? How many people are affected? For how long? How severely?
8
University of WashingtonComputing & Communications Basics How many nines? Problem one: what to measure? How do you reduce behavior of a complex net to a single number? Difficult for either uptime or utilization metrics Problem two: data networks are not like phone or power services… Imagine if phones could assume anyone’s number Or place a million calls per second!
9
University of WashingtonComputing & Communications Security vs. Reliability Obviously lack of security is bad… but: Defense in depth is not free Each add’l defensive perimeter increases MTTR Defense-in-depth conjecture (for N layers) –Security: MTTE (exploit) N**2 –Functionality: MTTI (innovation) N**2 –Supportability: MTTR (repair) N**2 Next-gen threats: firewalls won’t help
10
University of WashingtonComputing & Communications Complexity vs. Reliability How do you measure avail in complex systems? Death of the Network Utility Model Organizational vs. geographic networking SAN virtualization Web load-leveler appliances Organizational boundary conditions Networks: from stochastic to non-deterministic Subnets with clients and critical servers Documentation deficiencies
11
University of WashingtonComputing & Communications Complex System Failures: Inevitable? Jan 2004 (?) IEEE Spectrum on Power Grid failures Point: it will happen, so plan for mitigation
12
University of WashingtonComputing & Communications Work in Progress New trouble-ticket system New network management system Next-generation network architecture Next-generation security architecture Improving change control process Improving DRBR process Lots of work on improving mon/diag tools
13
University of WashingtonComputing & Communications In Short… Expectations are growing (unrealistically?) Complexity is growing Few are prepared to pay for true HA Cultural barriers to change control Hospitals are a whole other world Biggest SPoF: power/HVAC Organizational complexity undermines HA Both security and lack of it undermine HA Redundancy can mask failures too well! With redundancy, must have better tools Need Ops-centric design, better DRBR Need application procurement standards
14
University of WashingtonComputing & Communications Questions? Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.