Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.

Slides:



Advertisements
Similar presentations
Managing Hardware and Software Assets
Advertisements

Copyright © 2006 Quest Software SQL 2005 Disk I/O Performance By Bryan Oliver SQL Server Domain Expert.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID: Redundant Array of Inexpensive Disks Supplemental Material not in book.
RAID Redundant Array of Independent Disks
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Lecture 11: Operating System Services. What is an Operating System? An operating system is an event driven program which acts as an interface between.
SQL Server Disaster Recovery Chris Shaw Sr. SQL Server DBA, Xtivia Inc.
Database Administration and Security Transparencies 1.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Chapter 10 Site Architecture McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Distributed Information Systems - The Client server model
Chapter 9: Moving to Design
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
By : Nabeel Ahmed Superior University Grw Campus.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.
Data Center Infrastructure
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Components of Database Management System
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
Chapter 8 Evaluating Alternatives for Requirements, Environment, and Implementation.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Distributed Database Systems Overview
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Chapter 1 Introduction to Databases. 1-2 Chapter Outline   Common uses of database systems   Meaning of basic terms   Database Applications  
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
1.1 Introduction DATA COMMUNICATIONS The term telecommunication means communication at a distance. The word data refers to information presented.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Group 2 Bernard Smith Thomas Laborde Hannah Prather Fault Tolerance Environment Power Topology and Connectivity Servers Hurricane Preparedness Network.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
1 High-availability and disaster recovery  Dependability concepts:  fault-tolerance, high-availability  High-availability classification  Types of.
Networking Objectives Understand what the following policies will contain – Disaster recovery – Backup – Archiving – Acceptable use – failover.
Managing Multi-User Databases
Network Operating Systems (NOS)
Storage Virtualization
Unit 27: Network Operating Systems
RAID RAID Mukesh N Tekwani
Introduction to Databases Transparencies
TECHNICAL SEMINAR PRESENTATION
PLANNING A SECURE BASELINE INSTALLATION
RAID RAID Mukesh N Tekwani April 23, 2019
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Reliability Week 11 - Lecture 2

What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be available within the agreed time frame Consistency – provide much the same response time on each occasion

Service Level Agreement Reliability and performance requirements are usually built into an SLA or Service Level Agreement An SLA defines the level of service the organisation and the users can expect from the DIS It is negotiated between the organisation and the service provider, be that the internal IT dept or an outside body

All components affect reliability Any component can effect the reliability of the whole system, but each component can affect different aspects: correctness, availability and consistency We will look at: Application software System software – O/S, DBMS & Middleware Server hardware Network Storage Change management and Problem management

Application Software Application software can affect availability for a few, some or all customers in the event of a failure. Main area for bugs – particularly if developed in- house or modified. Can affect correctness and consistency if changes to application software are not rigorously tested.

System software (DBMS, O/S, etc) System software failures generally affect availability for all customers on a server. Operating at high utilisation (90-95% capacity) can affect reliability. Parts of system not often used can become active (eg. queuing logic).

Server hardware Hardware failure will affect availability for all users on the server. One server supporting an application/database provides a Single Point of Failure (to be avoided). Server problems can affect consistency (eg failure of one procesor in multi-processor server will affect performance.)

Networks - LAN Lan failures will affect availability for a few or many users. Changes to routers, switches or cabling can affect availability. Lan component failures/changes generally affect availability and consistency.

Networks - WAN It is a Purchased service, controlled by an external company. Wan failure will generally affect all users (eg ISP failure will affect all access to the Internet) It requires Careful selection of supplier Sufficient capacity for peak loads Carefully negotiated SLA Capable network management

Planning for Reliability Managing problems and changes. Planning for application and system software reliability Planning for hardware reliability Planning for disaster recovery

Managing Problems/Changes The cause of all problems MUST be determined and then resolved (or they will simply return again and again to affect availability) All application and system software changes MUST –be reviewed by a committee before implementation –have been thoroughly tested –have a back-out plan –be APPROVED by all affected parties –implemented out of normal availability periods

Planning System Reliability Server selection and operating system must fit the scale of the operation. Regular system software update plan should be followed to fix bugs, implement new features. Update plan should be fully investigated –update may introduce new bugs –may cause problems for applications –may intoduce performance problems

Planning Application Reliability Starts in design – how the objects and components are packaged and the interfaces designed Software package selection must place high weight on reliability factors (availability etc.) Implementations need formal processes Test plans Testing techniques Test scripts

Planning for Harware Reliability Build in redundancy, avoid single points of failure (even within hardware items). Use servers with multiple processors and hot-swap capability. Use server clusters if appropriate. Build redundancy and alternate routes into the network. Lan can be controlled. Disks have many mechanical parts and will fail often. Use Raid or redundancy when-ever possible

RAID Redundant Arrays of Independent Disks Groups of drives are linked to a special controller They appear as a single logical drive Take advantage of multiple physical drives to store data redundantly Six different RAID approaches numbered 0 to 5

0 Data striping, block oriented No redundancy – no protection from disk loss Reads and writes for contiguous block overlap, giving improved performance No space overhead

1 Disk mirroring – all data written to two disks Full data protection Improved read access Doubles disk space required Easy to implement, easy to recover

5 Data striping, block oriented, distributed parity Full error protection, but slower to recover than 1 Slow write, good read performance 25% overhead in disk space

Planning for Business Continuance (or Disaster/Recovery) Planning to continue business in the event of a disaster - is a design job and 9/11. Consider all scenarios, plan recovery approach, test & document. Common causes are fires (Sydney), floods (Brisbane) or back-hoes. Test recovery regularly (3- 6 months)

Performance Week 11 - Lecture 2

Why is Performance Important DIS systems have potential for performance issues New systems almost always require performance tuning DIS performance affects user productivity Performance is a measure of value for money

A simple test In most systems, what is likely to be the highest priority for users? –Improved functionality –Improved reliability –Improved performance

Performance Measures Response time - time taken to complete a task or transaction Throughput - the amount of work (transactions) that can be completed in a set time period (sec or hour) The relationship between the two is generally inverse (although not always)

Concurrency is the answer Slow response time High throughput Fast response time Low throughput Time

A user requires consistency, then speed. A user wants a transaction to run consistently. The faster, the better. A user sees response time at the PC or terminal. A user is not concerned with the entire infrastructure that supports a transaction. It staff see reponse time only in their domain of responsibility (server, database, network etc)

Difficult to measure total response time How do you add together web server + application server + database server + network Do you get statistics from each group ? Will each group maintain statistics is the same format ? You need to measure total response time and response in each area (server, database etc). New network monitors may be able to provide statistics closer to what you need

Improving performance You can add more resources (faster servers, faster disks, networks etc) to improve response time and throughput. However, performance improvements may not be proportional to the additional resources. 100% increase in resources may only bring, say, 70% performance improvement. Scalability.

Monitoring Performance Performance is a process, not a task. Performance should be constantly monitored. Cost of monitoring must weighed against “do nothing” Performance tuning should be carried out to correct performance problems.