Chapter 2: Non functional Attributes.  It infrastructure provides services to applications  Many of these services can be defined as functions such.

Slides:



Advertisements
Similar presentations
Upgrade services, hosting & business continuity Simpler is better: Delivering IT as a Service John Allen: Business Development Manager Anix Managed Services.
Advertisements

Service Design – Figure 4.15 Expended Incident Lifecycle.
Business Plug-In B4 MIS Infrastructures.
Reliability of the electrical service Business Continuity Management Business Impact Analysis (BIA) Critical ITC Services Minimum Business Continuity Objective.
Chapter 13 Network Design and Management
Module – 9 Introduction to Business continuity
Business Continuity Section 3(chapter 8) BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT1.
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
High availability is one of the most important issues in computing today. Understanding how to achieve the highest possible availability of systems has.
Business Continuity Planning and Disaster Recovery Planning
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Disaster Recovery in IT David Irakiza CSC 585-High Availability and Performance Computing 2012.
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
1 Pertemuan 23 Contingency Planning Matakuliah:A0334/Pengendalian Lingkungan Online Tahun: 2005 Versi: 1/1.
Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Keith Burns Microsoft UK Mission Critical Database.
Disaster Prevention and Recovery Presented By: Sean Snodgrass and Theodore Smith.
Security+ Guide to Network Security Fundamentals, Fourth Edition
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin CHAPTER FIVE INFRASTRUCTURES: SUSTAINABLE TECHNOLOGIES CHAPTER.
John Graham – STRATEGIC Information Group Steve Lamb - QAD Disaster Recovery Planning MMUG Spring 2013 March 19, 2013 Cleveland, OH 03/19/2013MMUG Cleveland.
CHAPTER OVERVIEW SECTION 5.1 – MIS INFRASTRUCTURE
Oracle High Availability Doug Smith CIS 764 Fall Semester 2007.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Disaster Recovery as a Cloud Service Chao Liu SUNY Buffalo Computer Science.
Security+ All-In-One Edition Chapter 16 – Disaster Recovery and Business Continuity Brian E. Brzezicki.
IT Business Continuity Briefing March 3,  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
CHAPTER FIVE INFRASTRUCTURES: SUSTAINABLE TECHNOLOGIES
Security in Practice Enterprise Security. Business Continuity Ability of an organization to maintain its operations and services in the face of a disruptive.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Business Continuity & Disaster recovery
Co-location Sites for Business Continuity and Disaster Recovery Peter Lesser (212) Peter Lesser (212) Kraft.
By Srosh Abdali.  Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure.
High Availability for Information Security Managing The Seven R’s Rich Schiesser Sr. Technical Planner.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
1 Nassau Community CollegeProf. Vincent Costa Session 7 Infrastructures Sustainable Technologies CMP 117 Business Computing: Concepts &Applications.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Disaster Recovery and Business Continuity Planning.
Safety-Critical Systems T Ilkka Herttua. Safety Context Diagram HUMANPROCESS SYSTEM - Hardware - Software - Operating Rules.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Phases of BCP The BCP process can be divided into the following life cycle phases: Creation of a business continuity and disaster recovery policy. Business.
Failures and Reliability Adam Adgar School of Computing and Technology.
Security+ Guide to Network Security Fundamentals, Fourth Edition Chapter 13 Business Continuity.
2006 Infrastructure Projects Four Themes: Storage – room to grow Security – reacting to threats Virtual Systems – increased efficiency Service Management.
Erman Taşkın. Information security aspects of business continuity management Objective: To counteract interruptions to business activities and to protect.
This courseware is copyrighted © 2016 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
Information Security Crisis Management Daryl Goodwin.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
1 High-availability and disaster recovery  Dependability concepts:  fault-tolerance, high-availability  High-availability classification  Types of.
1 Introduction to Engineering Spring 2007 Lecture 16: Reliability & Probability.
Information Systems Security
CompTIA Security+ Study Guide (SY0-401)
Server Upgrade HA/DR Integration
Module – 9 Introduction to Business continuity
High Availability Linux (HA Linux)
Embracing Failure: A Case for Recovery-Oriented Computing
Maximum Availability Architecture Enterprise Technology Centre.
Processing Integrity and Availability Controls
Operations Management
CompTIA Security+ Study Guide (SY0-501)
SpiraTest/Plan/Team Deployment Considerations
Operations Management
Minimize Unplanned Downtime and Data Loss with OpenEdge
DBMS Module III DBMS
Operations Management
Presentation transcript:

Chapter 2: Non functional Attributes

 It infrastructure provides services to applications  Many of these services can be defined as functions such as disk space, processing, connectivity  However most of these services are non functional in nature  Non functional attributes describe the qualitative behavior of the system rather than its specific functionality and these include  Availability  Security  Performance  Recoverability  Testability  Scalability

 This describes the major groups of non functional attributes

 Based on these groups ISO 9126 defines 27 non functional attributes each with their own scope. In the following table they are defined and mapped to the three major non functional attributes and on issues that are more relevant for the system management realm

 It is not unusual to encounter conflicting NFRs for instance users may want a system that is secure but not want to be bothered by passwords  It is the task of the infrastructure architect to balance these NFRs, in some cases some NFRs may take priority over others and the architect must involve the relevant stakeholders

 Everyone expects their infrastructure to be always on all the time but regardless the amount of time invested there is always a chance of downtime and 100% uptime is impossible  Calculating Availability  Availability cannot be calculated nor guaranteed upfront but rather is reported after the system has run for sometime probably years  Fortunately over the years a lot of information has accumulated on the subject and certain design patterns have emerged such as redundancy, failover, structured programming, avoiding Single Points of Failures and implementing proper systems management

 Availability is always given as a percentage uptime given a time period which is usually one year, the following table shows the permitted downtime given a certain availability over one year

 Most requirements used today are 99.9% (three nines) or 99.95% for a full IT system  % is also known as carrier grade, this availability originates from the telecommunications components that need a very high availability  Although 99.9% availability means 525 minutes of downtime a year, this downtime must not occur in a single event and there should also not be 525 one minute downtime events in a year, in other words unavailability intervals must be defined

 Unavailability intervals are the product of MTBF (Mean Time Between Failure) which is the average time between successive downtime events and MTTR (Mean Time To Repair) which is the average duration of a downtime event

 Usually manufactures run tests on large batches of devices for instance they could test 1000 hard disks for 3 months (1/4 a year)  If 5 hard disks fail then over a year the extrapolated figure is 4 x 5 which is 20 hard disks  The total uptime for 1000 disks is 1000 x 365 x 24 which is 8, 760, 000 hours  So MTBF is total uptime 8,760,000/20 failed drives (each failed drive is a single failure event) which gives 438,000 hours per drive

 Usually the MTTR for components is kept low by having a service contract with the suppliers of the component  Sometimes spares are kept onsite  MTTR contains the following components  Notification of the fault (time before seeing an alarm message)  Process the alarm  Diagnose the problem  Look up repair information  Get spare components  Retrieve the components  Repair the fault

 Availability = 100% x MTBF / (MTBF + MTTR)  As a system becomes more complex availability normally reduces  If the failure of any system component leads to failure of the system as a whole then it is said to have serial availability  To calculate the availability of such a system you multiply the availability of all its components

 As can be seen from the illustration the availability of the full server is less than that of any individual component, to increase availability the components can be arranged in parallel

 Human Error  Software Bugs  Planned Maintenance  Physical defects  Environmental issues  System complexity: Generally it is much more difficult to maintain availability of large, complex systems with several components

 The likelihood of failure of a component is highest at the beginning of its life cycle  Sometimes a component does not work at all after it is unpacked, the so called DOA or Dead on Arrival  If a component works without failure for the first month it becomes increasingly more likely that it will work uniterrupted till the end of its lifecycle which is the other end of the bathtub where the likelihood of failure increases exponentially

 Single Points of Failures (SPOFs): Are infrastructure components whose failure implies system downtime. They are not desirable but in practice may be difficult to eliminate  Redundancy: Is the duplication of infrastructure components to eliminate a SPOF  Failover: The semi automatic changeover from a failed component to a standby component in the same location e.g. Oracle Real Application Clusters (RAC) and VMWare’s high availability technology  Fallback: The changeover from a failed computer to another with an identical configuration in a different location

 Hot site  Is a fully configured fallback computer facility with cooling and redundant power, applications that permits rapid restoration of services in the event that the primary system fails. As is apparent it is expensive to maintain  Warm site  Is a mix between a warm site and a cold site. Like a hot site it has power, cooling and computers but applications may not be installed or configured  Cold Site  A cold site differs from the other two in that there are no computers onsite, it is a room with power and cooling facilities and in order for it to be brought online computers must be brought in rapidly

 Although measures can be taken to provide high availability there are always situations that can not be completely safeguarded against like natural disasters and in such cases you have to think of Business Continuity Management(BCM) and Disaster Recovery Planning (DRP). BCM is concerned with the business issues including IT whereas DRP is about the IT

 Is about identifying the threats an organization faces and creating appropriate contingencies. BCM is about ensuring a business continues operating in times of disaster and includes managing business processes, availability of people and work places in disaster situations.  It includes disaster recovery, business recovery, crisis management, incident management, emergency management, product recall and contingency planning  BCM has two objectives namely RTO (Recovery Time Objective) and Recovery Point Objective (RPO)  RTO defines the time and service level within which an organization must be restored after a disaster so as to avoid the unacceptable consequences of non operation  RPO describes the acceptable amount of data loss an organization is willing to accept. Defined in time it is the point to which data must be restored considering some acceptable data loss during a disaster  DRP is the IT component of BCM