Challenges to address for distributed systems Yvon Kermarrec Télécom Bretagne Institut Mines Télécom.

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CS542: Topics in Distributed Systems
Broker Pattern Pattern-Oriented Software Architecture (POSA 1)
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Distributed components
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization and Design Goals Dr. Michael R. Lyu Computer.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed Systems Architectures
City University London
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS533 - Concepts of Operating Systems
EECE 411: Design of Distributed Software Applications What is a Distributed System? You know when you have one … … when the failure of a computer you’ve.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
PRASHANTHI NARAYAN NETTEM.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Ch4: Distributed Systems Architectures. Typically, system with several interconnected computers that do not share clock or memory. Motivation: tie together.
Computer System Architectures Computer System Software
CS431 Distributed Systems
1 Distributed Computing Class: BIT 5 & 6 Instructor: Aatif Kamal Chapter 01: Character of Distributed Systems Dated: 06 th Sept 2006.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
DM Rasanjalee Himali CSc8320 – Advanced Operating Systems (SECTION 2.6) FALL 2009.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
CH2 System models.
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ Bộ môn Mạng và Truyền Thông Máy Tính.
1 MSCS 237 Communication issues. 2 Colouris et al. (2001): Is a system in which hardware or software components located at networked computers communicate.
Transparency in Distributed Operating Systems Vijay Akkineni.
Exercises for Chapter 2: System models
Distributed Systems: Concepts and Design Chapter 1 Pages
Source: George Colouris, Jean Dollimore, Tim Kinderberg & Gordon Blair (2012). Distributed Systems: Concepts & Design (5 th Ed.). Essex: Addison-Wesley.
Architectures of distributed systems Fundamental Models
Session-8 Data Management for Decision Support
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Chapter 2: System Models. Objectives To provide students with conceptual models to support their study of distributed systems. To motivate the study of.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
OS2- Sem ; R. Jalili Introduction Chapter 1.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
1 MSCS 237 Communication issues. 2 Colouris et al. (2001): Is a system in which hardware or software components located at networked computers communicate.
Shuman Guo CSc 8320 Advanced Operating Systems
Prepared By: Md Rezaul Huda Reza
Chapter 1: Distributed Systems Overview. Objectives To be aware of the characteristics of concurrency, independent failure of components and lack of a.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization of Distributed & Mobile Systems Dr. Michael R.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Distributed System Models
Exercises for Chapter 2: System models From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education 2005.
Section 2.1 Distributed System Design Goals Alex De Ruiter
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Chapter 1 Characterization of Distributed Systems
Distributed Shared Memory
Chapter 17 – Introduction to Distributed Systems
Advanced Operating Systems
Fault Tolerance Distributed Web-based Systems
Introduction to locality sensitive approach to distributed systems
Architectures of distributed systems Fundamental Models
Architectures of distributed systems Fundamental Models
COMP28112 Lecture 2 A few words about parallel computing
Architectures of distributed systems
Introduction To Distributed Systems
Architectures of distributed systems Fundamental Models
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Challenges to address for distributed systems Yvon Kermarrec Télécom Bretagne Institut Mines Télécom

Dpt/Auteur Challenges in Distributed System Design Distributed systems are great … but we need a change in considering a system : From centralized to distributed From a programming and admin perspectives A New way to develop applications that target not one PC but thousands of them… New paradigms to deal with difficulties related to DS : faults, network, coordination, ….

Dpt/Auteur Challenges in Distributed System Design Heterogeneity Openess Security Scalability Failure handling Transparencies

Dpt/Auteur Challenge 1 : heterogeneity networks (protocols), operating systems (APIs) and hardware programming languages (data structures, data types) implementations by different developers (lack of standards) Solution : Middleware -can mask heterogeneity -Provides an augmented machine for the users :more services -provides a uniform computational model for use by the programmers of servers and distributed applications

Dpt/Auteur Challenge 2 : Openness The degree to which new resource-sharing services can be added and be made available for use by a variety of client programs -Specification and documentation of the key software interfaces of the components can be published, discovered and then used -Extension may be at the hardware level by introducing additional computers

Dpt/Auteur Challenge 3 : security Classic security issues in an open world … -Confidentiality -Integrity -Origin and trust Continued challenges -Denial of service attacks -Security of mobile code

Dpt/Auteur Challenge 4 : scalability (1/2) Scalability : system remains effective when there is a significant increase in the number of resources and the number of users controlling the cost of performance loss preventing software resources from running out avoiding performance bottlenecks

Dpt/Auteur Challenge 4 : scalability (2/2) Example of a DNS organization Performance must not degrade with growth of the system. Generally, any form of centralized resources become performance bottlenecks: -components (single server), -tables (directories), or -algorithms (based on complete information).

Dpt/Auteur Challenge 5 : failure handling In distributed systems, some components fail while others continue executing -Detected failures can be hidden, made less severe, or tolerated –messages can be retransmitted –data can be written to multiple disks to minimize the chance of corruption –Data can be recovered when computation is “ rolled back ” –Redundant components or computations tolerate failure -Failures might result in loss of data and services

Dpt/Auteur Challenge 6 : concurrency Several clients may attempt to access a shared resource at the same time -ebay bids Generally multiple requests are handled concurrently rather than sequentially All shared resources must be responsible for ensuring that they operate correctly in a concurrent environment Thread, synchronization, dead lock …

Dpt/Auteur Transparency ? It is the concealment from the user and the application program of the separation of the components of a distributed system (single image view). It is a strong property that often is difficult to achieve. There are a number of different forms of transparency Transparency : the system is perceived as a whole rather than as a collection of independent components

Dpt/Auteur Different forms of transparencies Location: Users are unaware of location of resources Migration: Resources can migrate without name change Replication: Users are unaware of the existence of multiple copies Failure: Users are unaware of the failure of individual components Concurrency: Users are unaware of sharing resources with others Parallelism: Users are unaware of parallel execution of activities

Dpt/Auteur How to deal with these transparencies ? For each of the transparency level, indicate how you would implement them ?

Dpt/Auteur How to develop a distributed application A sequential application + communication calls (similar to C + Thread library) A middleware + an application A specific language See next course….

Dpt/Auteur One approach to ease the development of an application Client-server model client processes interact with individual server processes –servers processes are in separate host computers –clients access the resources the servers manage –servers may be clients of other servers Examples –Web servers are clients of the DNS service

Dpt/Auteur Client-Server

Dpt/Auteur Multiple Servers Separate processors interact to provide a service

Dpt/Auteur Peer Processes All processors play a similar role - eliminate servers

Dpt/Auteur Distributed Algorithms A definition of the steps to be taken by each of the processes of which the system is composed, including the messages transmitted between them Types of distributed algorithms Interprocess Communication (IPC) Timing Model Failure Model

Dpt/Auteur Distributed Algorithms Address problems of –resource allocation-- deadlock detection –communication-- global snapshots –consensus-- synchronization –concurrency control-- object implementation Have a high degree of -uncertainty and independence of activities –unknown # of processes & network topology –independent inputs at different locations –several programs executing at once, starting at different times, operating at different speeds –processor non-determinism –uncertain message ordering & delivery times –processor & communication failures

Dpt/Auteur Interprocess Communication Distributed algorithms run on a collection of processors -communication methods may be shared memory, point-point or broadcast messages, and RPC -Communication is important even for the system –Multiple server processes may cooperate with one another to provide a service »DNS partitioning and replicating its data at multiple servers throughout the Internet –Peer processes may cooperate with one another to achieve a common goal

Dpt/Auteur Difficulties and algorithms For sequential programs An algorithm consists in a a set of successive steps Execution rate is immaterial For distributed algorithms Processor execute at unpredictable and all different rates Communication delays and latencies Errors and failure may happen A global state (ie, memory …) does not exist Debug is difficult

Dpt/Auteur 3 major difficulties Time issues Interaction model failures

Dpt/Auteur Time issues Each processor has an internal clock Used to date local events Clock may drift Different time values when reading the clock at the « same time » Issues Local time is not enough to time stamp events Difficulties to order events and compare them Necessities to resynchronize the clocks

Dpt/Auteur Time issues Events order MSC : Message Sequence Chart – a way to present interactions and communications X Y Z A X site broadcasts a message to all sites – the other broadcast Their response. Due to different network speed / latencies Node A, receives the response of Z before the question from X. Idea : be able to order the events / to compare them

Dpt/Auteur Time issues In the MSC presented earlier, all processes see different order of the messages / events How to order them (resconstruct a logic) so that processes can take coherent decisions

Dpt/Auteur Synchronization model Synchronous model Simple model Lower and upper bounds for execution times and communication are known No clock drift Asynchronous Execution speed are ‘ random ’ / comm Universal model in LAN + WAN -Routers introduce delays -Servers may be loaded / the CPU may be shared -Errors and faults may occur

Dpt/Auteur Timing Model Different assumptions can be made about the timing of the events in the system Synchronous -processor communication and computation are done in lock- step Asynchronous -processors run at arbitrary speeds and in arbitrary order Partially synchronous -processors have partial information about timing

Dpt/Auteur Synchronous Model (1/2) Simplest to describe, program, and reason about components take steps simultaneously -not what actually happens, but used as a foundation for more complexity –intermediate comprehension step –impossibility results care over Very difficult to implement Synchronous language for specialized purposes

Dpt/Auteur Synchronous Model (2/2) 2 armies – one leader : the 1rst to attack – the 2 armies must attack together or not Message transmission (min, max) is known and there is no fault 1 sends « attack ! » and wait for min and then attacks 2 receives « attack ! » and wait for one TU. 1 is the leader and 2 charges within max- min+1

Dpt/Auteur Asynchronous Model (1/2) Separate components take steps arbitrarily Reasonably simple to describe - with the exception of liveness guarantees Harder to accurately program Allows ignorance to timing considerations May not provide enough power to solve problems efficiently

Dpt/Auteur Asynchronous Model (2/2) Coordination is more difficult for the armies Select a sufficient large T 1 sends « attack ! » and wait for T and then attacks 2 receives « attack ! » and wait for one TU. Cannot guarantee 1 is the leader

Dpt/Auteur Partially Synchronous Model Some restrictions on the timing of events, but not exactly lock-step Most realistic model Trade-offs must be considered when deciding the balance of the efficiency with portability

Dpt/Auteur Failure Model (1/6) The algorithm might need to tolerate failures processors -might stop -degrade gracefully -exhibit Byzantine failures may also be failures of -communication mechanisms

Dpt/Auteur Failure Model (2/6) Various types of failure Message may not arrive : omission failure Processes may stop and the other may detect this situation (stopping failure) Processes may crash and the others may not be warned (crash failure) For real time, deadline may not be met -Timing failure

Dpt/Auteur Failure Model (3/6) Failure type Benign : omission, stopping, timing failures Severe : Altered message, bad results, Byzantine failures

Dpt/Auteur Failure Model (4/6) Crash failure Processes crash and do not respond anymore Crash detection -Use time out -Difficulties with asynchronous model –Slow processes –Non arrived message –Stopped process, etc.

Dpt/Auteur Failure Model (5/6) Stopping failure Processes stop their execution and can be observed Synchronous model -Time out -Asynchronous model –Hard to distinguish between a slow message and if a stopping failure has occurred

Dpt/Auteur Failure Model (6/6) Byzantine failure The most difficult to deal with 3 processes cannot resolve the situation in presence of one faute Need n > 3 * f (f number of faulty processes and n number of processes) Complex algorithms which monitor all the messages exchanged between the nodes / processes

Dpt/Auteur Conclusions Distributed algorithm are sensitive to The interaction model Failure type Timing issues Design issues Control timing issues with time outs Introduce fault tolerance and recovery

Dpt/Auteur Conclusions Quality of a distributed algorithm Local state vs. Global state Distribution degree Fault tolerance Assumptions on the network Traffic and number of messages required

Dpt/Auteur Design issues Use direct call to the O/S Simple and complex Use a middleware to ensure portability and ease of use PVM, MPI, Posix CORBA, DCE, SOA and web services Use a specific distributed language Linda, Occam, Java RMI, Ada 95

Dpt/Auteur Various forms of communications Communication paradigms Message passing : send + receive Shared memory : rd / write Distributed object : remote invocation Service invocation Communication patterns Unicast Multicast and broadcast RPC