DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.

Slides:



Advertisements
Similar presentations
Distributed System Services Prepared By:- Monika Patel.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Cryptography and Network Security 2 nd Edition by William Stallings Note: Lecture slides by Lawrie Brown and Henric Johnson, Modified by Andrew Yang.
Efficient Solutions to the Replicated Log and Dictionary Problems
A Dependable Auction System: Architecture and an Implementation Framework
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Applied Cryptography for Network Security
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Cryptography and Network Security Third Edition by William Stallings Lecture slides by Lawrie Brown.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Distributed Deadlocks and Transaction Recovery.
Dr. Lo’ai Tawalbeh 2007 INCS 741: Cryptography Chapter 1:Introduction Dr. Lo’ai Tawalbeh New York Institute of Technology (NYIT) Jordan’s Campus
Cryptography and Network Security
Eng. Wafaa Kanakri Second Semester 1435 CRYPTOGRAPHY & NETWORK SECURITY Chapter 1:Introduction Eng. Wafaa Kanakri UMM AL-QURA UNIVERSITY
Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
DM Rasanjalee Himali CSc8320 – Advanced Operating Systems (SECTION 2.6) FALL 2009.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
Session-8 Data Management for Decision Support
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Distributed Database Systems Overview
A.Obaid - Wilfried Probst - Rufin Soh INE4481 DISTRIBUTED DATABASES & CLIENT-SERVER ARCHITECTURES1 Chapter 1. Distributed systems: Definitions, design.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
CONTI'20041 Event Management in Distributed Control Systems Gheorghe Sebestyen Technical University of Cluj-Napoca Computers Department.
Lecture 4: Sun: 23/4/1435 Distributed Operating Systems Lecturer/ Kawther Abas CS- 492 : Distributed system & Parallel Processing.
Checkpointing and Recovery. Purpose Consider a long running application –Regularly checkpoint the application Expensive task –In case of failure, restore.
ADITH KRISHNA SRINIVASAN
Synchronization Chapter 5.
Diskless Checkpointing on Super-scale Architectures Applied to the Fast Fourier Transform Christian Engelmann, Al Geist Oak Ridge National Laboratory Februrary,
Topic 1 – Introduction Huiqun Yu Information Security Principles & Applications.
LESSON 3. Properties of Well-Engineered Software The attributes or properties of a software product are characteristics displayed by the product once.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.
Cryptography and Network Security Chapter 1. Background  Information Security requirements have changed in recent times  traditionally provided by physical.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
VCS Building Blocks. Topic 1: Cluster Terminology After completing this topic, you will be able to define clustering terminology.
1 CS-550: Materials for the Final Exam - Tue., Dec. 11, 2001, 8 a.m. Distributed Systems 8.Client-server computing Architecture, Relational database applications.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Parallel and Distributed Simulation Techniques
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
CS60002: Distributed Systems
Fault Tolerance Distributed Web-based Systems
Middleware for Fault Tolerant Applications
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Abstractions for Fault Tolerance
Last Class: Naming Name distribution: use hierarchies DNS
Transaction Communication
Outline Theoretical Foundations
Presentation transcript:

DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi

DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected by a communication network. Motivations: –Information exchange (collaborative work) –resources sharing (e.g. printer, backup storage, disk units, etc.) –cost reduction –partial-failure (enables increase of availability) –increase of performance through parallelism,...

Main characteristics No shared memory between different nodes –each node has its memory –communication by messages No global clock –each node has its own clock Impossible for a node to obtain an instantaneous global state of the system

Why do we need distributed algorithms? Distributed algorithms are backbone of distributed computing systems –they are essential for the implementation of distributed systems distributed operating systems distributed databases, communication systems, real-time process-control systems, transportation, etc.

Focus of the course How to design distributed algorithms –Study of fundamental problems –Analysis of distributed algorithms How to achieve fault-tolerance in a distributed system –Very important for high availability –I.e. almost nonstop operation

Categories of Distributed Algorithms Fully decentralized –more difficult in general With a centralized coordinator –conceptually simpler. But require efficient mechanisms for selecting a new coordinator if the current one fails

Material Book –Distributed Operating Systems & Algorithms By Randy Chow and Theodore Johnson Others

Content Introductory chapter (chapter 9) –Causality Ordering of events (timestamps) Causal communication –Distributed snapshots Detecting stable properties, Diffusing computation –Modeling a distributed computation Expressing correctness properties of a dist. algo.

Content (cont.) Introductory chapter (cont.) –Failures in a distributed system In a distributed system, a node may fail, a communication link may fail. There is then a need to minimize the damage of a component failure to the users of the system. This is the goal of fault-tolerance

Content (cont.) Chapter 10: –Synchronization distributed mutual exclusion: needed to regulate accesses to a common resource that can be used only by one process at a time –Election as previously mentioned, when a central coordinator fails, a distributed algorithm is needed to select a new coordinator (important for fault-tolerance)

Content (cont.) Chapter 11: –Distributed agreement How to get a set of nodes to agree on a value. Distributed agreement is used for instance, –to determine which nodes are alive in the system –to confine malicious behavior of some components »Fault-tolerance

Content (cont.) Chapter 12: –Replicated data management A key for high availability is to replicate components (data/files, servers, etc.) we shall be concerned with –techniques for maintaining replicated data in a distributed system, (database techniques) –atomic broadcast/multicast –membership

Content (cont.) Chapter 13: –Checkpointing and recovery Error recovery is essential for fault-tolerance When a processor fails and then is repaired, it will need to recover its state of the computation. To enable recovery, checkpointing (recording of the state into a stable storage) is needed We will be concerned with techniques used for this, in the context of distributed systems