High Availability and Fault- Tolerance in Real-Time Databases Jan Lindström University of Helsinki Department of Computer Science.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.
Remus: High Availability via Asynchronous Virtual Machine Replication
Database Architectures and the Web
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Transaction.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
© 2015 Dbvisit Software Limited | dbvisit.com An Introduction to Dbvisit Standby.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Making Services Fault Tolerant
Distributed components
City University London
8. Fault Tolerance in Software
Figure 1.1 Interaction between applications and the operating system.
Presentation by Krishna
Lesson 1: Configuring Network Load Balancing
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
DISTRIBUTED COMPUTING
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
 Separating system’s concerns from programmer’s concerns  Language constructs for programming distributed systems  Transparency to various system dependent.
National Manager Database Services
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
High-Availability Linux.  Reliability  Availability  Serviceability.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
© 2005 Mt Xia Technical Consulting Group - All Rights Reserved. HACMP – High Availability Introduction Presentation November, 2005.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
S OFTWARE F AULT T OLERANCE I N A C LUSTERED A RCHITECTURE : T ECHNIQUES & R ELIABILITY M ODELING Hüsnü Şensoy.
Transparency in Distributed Operating Systems Vijay Akkineni.
DB-2: OpenEdge® Replication: How to get Home in Time … Brian Bowman Sr. Solutions Engineer Sandy Caiado Sr. Solutions Engineer.
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
August 3-4, 2004 San Jose, CA Developing a Complete VoIP System Asif Naseem Senior Vice President & CTO GoAhead Software.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
OSIsoft High Availability PI Replication
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
High Availability in DB2 Nishant Sinha
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
Virtual Machine Movement and Hyper-V Replica
Seminar On Rain Technology
Click to edit Master title style Sytel’s High Availability Strategy © 2012 Sytel Limited. All rights reservedVersion 2.5.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Services DFS, DHCP, and WINS are cluster-aware.
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Unit OS10: Fault Tolerance
#01 Client/Server Computing
RAID RAID Mukesh N Tekwani
Co-designed Virtual Machines for Reliable Computer Systems
RAID RAID Mukesh N Tekwani April 23, 2019
Database System Architectures
Distributed Systems and Concurrency: Distributed Systems
#01 Client/Server Computing
Presentation transcript:

High Availability and Fault- Tolerance in Real-Time Databases Jan Lindström University of Helsinki Department of Computer Science

Overview b The causes of the downtime b Availability solutions b CASE 1: Clustra b CASE 2: TelORB b CASE 3: RODAIN

The Causes of Downtime b Planned downtime Hardware expansionHardware expansion Database software upgradesDatabase software upgrades Operating system upgradesOperating system upgrades b Unplanned downtime Hardware failureHardware failure OS failureOS failure Database software bugsDatabase software bugs Power failurePower failure DisasterDisaster Human errorHuman error

Traditional Availability Solutions b Replication b Failover b Primary restart

CASE 1: Clustra b Developed for telephony applications such as mobility management and intelligent networks. b Relational database with location and replication transparency. b Real-Time data locked in main memory and API provides precompiled transactions. b NOT a Real-Time Database !

Clustra hardware architecture

Data distribution and replication

How Clustra Handles Failures b Real-Time failover: Hot-standby data is up to date, so failover occurs in milliseconds. b Automatic restart and takeback: Restart of the failed node and takeback of operations is automatic, and again transparent to users and operators. b Self-repair: If a node fails completely, data is copied from the complementary node to standby. This is also automatic and transparent. b Limited failure effects

How Clustra Handles Upgades b Hardware, operating system, and database software upgrades without ever going down. Process called “rolling upgrade”Process called “rolling upgrade” –I.e. required changes are performed node by node. –Each node upgraded to catch up to the status of complementary node. –When this is completed, the operation is performed to next node.

CASE 2: TelORB Characteristics  Very high availability (HA), robustness implemented in SW  (soft) Real Time  Scalability by using loosely coupled processors Openness  Hardware: Intel/Pentium  Language: C++, Java  Interoperability: CORBA/IIOP, TCP/IP, Java RMI  3:rd party SW: Java

TelORB Availability  Real-time object-oriented DBMS supporting  Distributed Transactions  ACID properties expected from a DBMS  Data Replication (providing redundancy)  Network Redundancy  Software Configuration Control  Automatic restart of processes that originally executed on a faulty processor on the ones that are working  Self healing  In service upgrade of software with no disturbance to operation  Hot replacement of faulty processors

Automatic Reconfiguration reloading

Software upgrade  Smooth software upgrade when old and new version of same process can coexist  Possibility for application to arrange for state transfer between old and new static process (unless important states aren’t already stored in the database)

Partioning: Types and Data AB A B

Advantages  Standard interfaces through Corba  Standard languages: C++, Java  Based on commercial hardware  (Soft) Real-time OS  Fault tolerance implemented in software  Fully scalable architecture  Includes powerful middleware: A database management system and functions for software management  Fully compatible simulated environment for development on Unix/Linux/NT workstations

CASE 3: RODAIN b Real-Time Object-Oriented Database Architechture for Intelligent Networks b Real-Time Main-Memory Database System b Runs on Real-Time OS: Chorus/ClassiX (and Linux)

Rodain Cluster

Rodain Database Node Distributed Database Subsystem User Request Interpreter Subsystem Watchdog Subsystem Fault-Tolerance and Recovery Subsystem Object- Oriented Database Management Subsystem Database Primary Unit User Request Interpreter Subsystem Watchdog Subsystem Object- Oriented Database Management Subsystem Database Mirror Unit Distributed Database Subsystem Fault-Tolerance and Recovery Subsystem shared disk

Distributed Database Subsystem User Request Interpreter Subsystem Watchdog Subsystem Fault-Tolerance and Recovery Subsystem Object- Oriented Database Management Subsystem Database Primary Unit User Request Interpreter Subsystem Watchdog Subsystem Object- Oriented Database Management Subsystem Database Mirror Unit Distributed Database Subsystem Fault-Tolerance and Recovery Subsystem shared disk RODAIN Database Node II

ORD Architechture TRP FTRS DDSORD OCCDataIndex

Fault-Tolerance b Based on logs and mirroring b Logs send to Mirror b Mirror stores the logs on disk in SSS b Mirror maintains copy of main-memory database b Mirror makes disk copies of its database image

Recovery b Based on role switching b When Primary fails Mirror updates its MMDB up to dateMirror updates its MMDB up to date Mirror starts acting as new PrimaryMirror starts acting as new Primary Active transactions are restarted or lostActive transactions are restarted or lost b When Mirror fails Primary stores logs directly to SSSPrimary stores logs directly to SSS

Recovery II b During recovery the failed Node always starts as a mirror nodealways starts as a mirror node loads most recent database image from disks in SSSloads most recent database image from disks in SSS updates the log tail to loaded imageupdates the log tail to loaded image receives the logs from primary nodereceives the logs from primary node continues as normal mirror nodecontinues as normal mirror node

Further reading b Bratsberg, Humborstad: Online Scaling in a Highly Available Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp , b Clustra Database: Technical Overview, b Björnerstedt, Ketoja, Sintorn, Sköld: Replication between Geographically Separated Clusters - An Asynchronous Scalable Replication Mechanism for Very High Availability, Proceedings of the International Workshop on Databases in Telecommunications II, LNCS vol 2209, pp , b Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Main-Memory Database for Telecommunications, Proceedings of the International Workshop on Databases in Telecommunications, LNCS vol 1819, pp , 2000.