Introduction & Background Lakshmish Ramaswamy. Why Distributed Systems? A collection of independent computers that appears to its users as a single coherent.

Slides:



Advertisements
Similar presentations
IDA / ADIT Lecture 10: Database recovery Jose M. Peña
Advertisements

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
REK’s adaptation of Prof. Claypool’s adaptation of
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Introduction Consistency models –Data-centric consistency models.
EECE 411: Design of Distributed Software Applications What is a Distributed System? You know when you have one … … when the failure of a computer you’ve.
20101 Overview Distributed systems Layers Communication is logically on the application layer Only that has to be considered except for speed,
Data Networking Fundamentals Unit 7 7/2/ Modified by: Brierley.
Wide-area cooperative storage with CFS
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Distributed Databases
Synchronization.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Presentation on Osi & TCP/IP MODEL
Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Real-Time & MultiMedia Lab Synchronization Chapter 5.
1 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
OS2- Sem ; R. Jalili Introduction Chapter 1.
Kyung Hee University 1/41 Introduction Chapter 1.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
Global State (1) a)A consistent cut b)An inconsistent cut.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
Databases Illuminated
Synchronization Chapter 5. Outline 1.Clock synchronization 2.Logical clocks 3.Global state 4.Election algorithms 5.Mutual exclusion 6.Distributed transactions.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Distributed synchronization and mutual exclusion Distributed Transactions.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
OS2- Sem1-83; R. Jalili Introduction Chapter 1. OS2- Sem1-83; R. Jalili Definition of a Distributed System (1) A distributed system is: A collection of.
Network Protocols and Standards (Part 2). The OSI Model In 1984, the International Organization for Standardization (ISO) defined a standard, or set of.
Distributed Systems: Principles and Paradigms By Andrew S. Tanenbaum and Maarten van Steen.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Introduction Chapter 1. Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users.
Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Synchronization. Clock Synchronization In a centralized system time is unambiguous. In a distributed system agreement on time is not obvious. When each.
Synchronization CSCI 4900/6900. Transactions Protects data and allows processes to access and modify multiple data items as a single atomic transaction.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Data Communication Network Models
TEXT: Distributed Operating systems A. S. Tanenbaum Papers oriented on: 1.OS Structures 2.Shared Memory Systems 3.Advanced Topics in Communications 4.Distributed.
Lecture on Synchronization Submitted by
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Atomic Tranactions. Sunmeet Sethi. Index  Meaning of Atomic transaction.  Transaction model Types of storage. Transaction primitives. Properties of.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CS6320 – Performance L. Grewe.
Definition of Distributed System
CHAPTER 3 Architectures for Distributed Systems
Data Networking Fundamentals
Replication Middleware for Cloud Based Storage Service
Advanced Operating Systems
Replication Improves reliability Improves availability
Consistency and Replication
Communication.
Transaction Communication
Presentation transcript:

Introduction & Background Lakshmish Ramaswamy

Why Distributed Systems? A collection of independent computers that appears to its users as a single coherent system Reasons for distribution –Distributed (and mobile) users –Distributed data/information –Distributed organizations –Distributed resources Enabling technology – Communications and networking

Distributed System Organization A distributed system organized as middleware. Note that the middleware layer extends over multiple machines. 1.1

Design Goals Enable controlled resource sharing Transparency Openness Scalability Performance Failure resilience Security & privacy

Examples of Distributed Systems World Wide Web –Information disseminations –E-commerce Distributed file systems Distributed databases Web-farms P2P file sharing systems Ad-hoc networks Sensor networks

Middleware Layer on top of Network OS services Hide heterogeneity Doesn’t manage individual nodes Provides complete set of services

Client Server Model Earliest model –Simple –Still applicable in many scenarios Server –Implements specific service Client –Requests service Models of communication –Connectionless –Connection-oriented

Clients and Servers General interaction between a client and a server. 1.25

Multitiered Architectures (2) An example of a server acting as a client. 1-30

Modern Architectures An example of horizontal distribution of a Web service Vertical Distribution: Different components on different machines Horizontal Distribution: Each part operates on its own complete Hybrid: Incorporates features of both vertical and horizontal

Peer-to-Peer Architectures No distinction between client and server –Nodes can act both as client and server Promotes interaction within social groups Provides better scalability File sharing has been the dominant application –Napster, Gnutella, Kazaa Other applications are still in nascent stages Decentralized protocols

Network Protocols Layers, interfaces, and protocols in the OSI model. 2-1

Functionalities of Layers Physical: Standardizes signaling interfaces Data link: Organizes bits to form frames, detects and corrects transmission errors Network layer: Routing (Internet protocol [IP]) Transport layer: Reliability (retransmission, ordering of packets) Session layer: Dialog control and synchronization Presentation layer: Formats of messages and records Application layer: Specific to applications (HTTP, FTP)

Types of Communication Persistence –Persistent communication – Stores message until communicated to user –Transient communication – Stored only when sending and receiving processes are alive Transport level protocols provide transient communication Synchronicity –Asynchronous – Sender continues after sending message –Synchronous – Sender blocks until message is stored at receiver's local buffer, delivered to receiver or processed by receiver

Message Oriented Transient Communication -Berkeley Sockets Communication pattern using TCP/IP sockets Interface for transport layer A communications end point

Processes & Threads Virtual processors –Created by OS to execute a program Process is a program in execution –Executed on one of the virtual processors Operating systems ensure that processes are independent and transparent –Resource sharing is transparent Creating processes is costly Switching processes is costly too

Threads Similar to a process –Perceived as execution of (a part of) program –Information maintained for sharing CPU is minimal Context of threads is captured by CPU context –May be a little more information is needed for management (like locks) Very little overheads –Thread switching is easy Can provide performance gains

Names & Naming System Required for identifying entities, locating them, communicating to them Name can be resolved to the entity it refers to Name is a string of bits used to refer to an entity Entity can resources/users/data/processes Access Point – Host of another entity –Name of access point is its address Naming system resolves names Naming system in distributed systems can itself be distributed

Name Spaces A general naming graph with a single root node. Organization of names usually as a directed graph Leaf Node – Represents named entity Directory node – Enlists other names

Name Space Distribution An example partitioning of the DNS name space, including Internet-accessible files, into three layers.

Importance of Clocks & Synchronization Avoiding simultaneous access of resources Process may need to agree upon ordering of events Synchronization & ordering is difficult in distributed setting Notion of time is tricky in distributed setting –How to deal with clock drifts? Logical clocks –Agreement with regards to ordering of events suffices Happens-before relation

Mutual Exclusion Ensuring consistency of data sometimes needs exclusive access to data Critical regions for mutual exclusion When a process wants to read/update shared data structures it first enters a critical region Only one process allowed to be in the critical region Coordinator-based centralized algorithm Ricart and Agrawala’s algorithm Token ring algorithm

Transactions Protects data and allows processes to access and modify multiple data items as a single atomic transaction –If process backs out halfway, everything is restored back Originated in business world –Parties free to negotiate and back-off during negotiation –No backing-off after the contract is signed Initiator process announces the beginning of a transaction Processes create, update, and delete entries Initiator announces that it wants others to “commit” –Transaction made permanent if everyone agrees –Otherwise transaction is aborted and all entries are restored back

Transaction Primitives Examples of primitives for transactions. PrimitiveDescription BEGIN_TRANSACTIONMake the start of a transaction END_TRANSACTIONTerminate the transaction and try to commit ABORT_TRANSACTIONKill the transaction and restore the old values READRead data from a file, a table, or otherwise WRITEWrite data to a file, a table, or otherwise

ACID Properties of Transactions Atomic – Happens indivisibly to the outside world Consistent – Does not violate system constraints Isolated – Concurrent transactions do not interfere with each other Durable – Changes are permanent when a transaction commits

How to Implement Transactions? Private workspace –When a process starts a transaction, it gets a private workspace of all files it needs to use –Operations only on private workspace –Private workspace is written back (ignored) on commit (abort) –Efficiency problems – copying everything is costly.

Distributed Transactions Distributed transaction is a transaction where in data is distributed 2 Phase commit protocol Commit request phase –Coordinator sends query to commit message to all nodes –Nodes place an entry into their undo and redo logs –Nodes send agreement/abort messages Commit phase –Coordinator places an entry into log –Sends commit/abort messages to all nodes –Nodes send acknowledgements

Concurrency Control Concurrent transactions are isolated –Final result should be the same as if the transactions were executed one after another in some order Synchronization classification –Locking –Timestamps Two phase locking – Growing & shrinking phases –Transaction acquires all locks before releasing any of them Distributed 2PL –Coordinator manages all lock operations

Replication Two primary reasons –Improving reliability of system –Improving scalability and performance of system Reliability –Resilience to failures –Protection against data corruption: Byzantine failures and quorum-based systems Scalability –Scaling in numbers –Geographical scaling

Problems of Replication Creating and maintaining replicas is not free Multiple copies leads to consistency problems –What happens when one of the replicas gets modified? –Modifications have to be carried out at all replicas –How and when determines the cost of replication WWW-based systems –Browser and client side caches –May lead to stale pages –TTL model, Update/Invalidate model

Consistency Models Strict Sequential Linearizable Causal Fifo Weak Release Entry

Fault Tolerance & Dependability Availability –Ready to be used IMMEDIATELY Reliability –Run continuously without FAILURE Safety –When fails, nothing catastrophic happens Maintainability –How easy a failed system can be repaired Failures can be malicious or non-malicious

Failure Masking Hiding failures from other processes Fault tolerance by redundancy Information redundancy – Error correcting codes Temporal redundancy – Transactions Physical redundancy – Multiple disks