Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of WashingtonComputing & Communications Ten Minutes on Five Nines Terry Gray Associate VP, IT Infrastructure University of Washington Common.

Similar presentations


Presentation on theme: "University of WashingtonComputing & Communications Ten Minutes on Five Nines Terry Gray Associate VP, IT Infrastructure University of Washington Common."— Presentation transcript:

1 University of WashingtonComputing & Communications Ten Minutes on Five Nines Terry Gray Associate VP, IT Infrastructure University of Washington Common PROBLEMS Group 6 January 2005

2 University of WashingtonComputing & Communications Vision Systems/Services (and Staff!) characterized as Reliable and Responsive Reliability = job one But: I.T. = Inevitable Tensions  We all want:  High MTTF, Performance and Function  Low MTTR and support cost  The art is to balance those conflicting goals  we are jugglers and technology actuaries

3 University of WashingtonComputing & Communications Success Metrics  Tom’s  Nobody gets hurt  Nobody goes to jail  Terry’s  “Works fine, lasts a long time”  Low ROI (Risk Of Interruption)

4 University of WashingtonComputing & Communications Design Tradeoffs  Fault Zone size vs. Economy/Simplicity  Reliability vs. Complexity  Prevention vs. (Fast) Remediation  Security vs. Supportability vs. Functionality  Networks = Connectivity; Security = Isolation  Balancing priorities (security vs. ops vs. function)

5 University of WashingtonComputing & Communications Context: A Perfect Storm  Increased dependency on I.T.  Decreased tolerance for outages  Deferred maintenance  Inadequate infrastructure investment  Some extraordinarily fragile applications  Fragmented host management  Increasingly hostile network environment  esp. spam, spyware, social engr attacks  Increasing legal/regulatory liability  Highly de-centralized culture  Growth of portable devices

6 University of WashingtonComputing & Communications System Elements  Environmentals (Power, A/C, Physical Security)  Network  Client Workstations (incl. portable devices)  Servers  Applications  Personnel, Procedures, Policy, and Architecture Failures at one level can trigger problems at another level; need Total System perspective

7 University of WashingtonComputing & Communications Dimensions  How often is there a user-visible failure?  How many people are affected?  For how long?  How severely?

8 University of WashingtonComputing & Communications Basics  How many nines?  Problem one: what to measure?  How do you reduce behavior of a complex net to a single number?  Difficult for either uptime or utilization metrics  Problem two: data networks are not like phone or power services…  Imagine if phones could assume anyone’s number  Or place a million calls per second!

9 University of WashingtonComputing & Communications Security vs. Reliability  Obviously lack of security is bad… but:  Defense in depth is not free  Each add’l defensive perimeter increases MTTR  Defense-in-depth conjecture (for N layers) –Security: MTTE (exploit)  N**2 –Functionality: MTTI (innovation)  N**2 –Supportability: MTTR (repair)  N**2  Next-gen threats: firewalls won’t help

10 University of WashingtonComputing & Communications Complexity vs. Reliability  How do you measure avail in complex systems?  Death of the Network Utility Model  Organizational vs. geographic networking  SAN virtualization  Web load-leveler appliances  Organizational boundary conditions  Networks: from stochastic to non-deterministic  Subnets with clients and critical servers  Documentation deficiencies

11 University of WashingtonComputing & Communications Complex System Failures: Inevitable?  Jan 2004 (?) IEEE Spectrum on Power Grid failures  Point: it will happen, so plan for mitigation

12 University of WashingtonComputing & Communications Work in Progress New trouble-ticket system New network management system Next-generation network architecture Next-generation security architecture Improving change control process Improving DRBR process Lots of work on improving mon/diag tools

13 University of WashingtonComputing & Communications In Short… Expectations are growing (unrealistically?) Complexity is growing Few are prepared to pay for true HA Cultural barriers to change control Hospitals are a whole other world Biggest SPoF: power/HVAC Organizational complexity undermines HA Both security and lack of it undermine HA Redundancy can mask failures too well! With redundancy, must have better tools Need Ops-centric design, better DRBR Need application procurement standards

14 University of WashingtonComputing & Communications Questions? Comments?


Download ppt "University of WashingtonComputing & Communications Ten Minutes on Five Nines Terry Gray Associate VP, IT Infrastructure University of Washington Common."

Similar presentations


Ads by Google