Introduction & Background Lakshmish Ramaswamy
Why Distributed Systems? A collection of independent computers that appears to its users as a single coherent system Reasons for distribution –Distributed (and mobile) users –Distributed data/information –Distributed organizations –Distributed resources Enabling technology – Communications and networking
Distributed System Organization A distributed system organized as middleware. Note that the middleware layer extends over multiple machines. 1.1
Design Goals Enable controlled resource sharing Transparency Openness Scalability Performance Failure resilience Security & privacy
Examples of Distributed Systems World Wide Web –Information disseminations –E-commerce Distributed file systems Distributed databases Web-farms P2P file sharing systems Ad-hoc networks Sensor networks
Middleware Layer on top of Network OS services Hide heterogeneity Doesn’t manage individual nodes Provides complete set of services
Client Server Model Earliest model –Simple –Still applicable in many scenarios Server –Implements specific service Client –Requests service Models of communication –Connectionless –Connection-oriented
Clients and Servers General interaction between a client and a server. 1.25
Multitiered Architectures (2) An example of a server acting as a client. 1-30
Modern Architectures An example of horizontal distribution of a Web service Vertical Distribution: Different components on different machines Horizontal Distribution: Each part operates on its own complete Hybrid: Incorporates features of both vertical and horizontal
Peer-to-Peer Architectures No distinction between client and server –Nodes can act both as client and server Promotes interaction within social groups Provides better scalability File sharing has been the dominant application –Napster, Gnutella, Kazaa Other applications are still in nascent stages Decentralized protocols
Network Protocols Layers, interfaces, and protocols in the OSI model. 2-1
Functionalities of Layers Physical: Standardizes signaling interfaces Data link: Organizes bits to form frames, detects and corrects transmission errors Network layer: Routing (Internet protocol [IP]) Transport layer: Reliability (retransmission, ordering of packets) Session layer: Dialog control and synchronization Presentation layer: Formats of messages and records Application layer: Specific to applications (HTTP, FTP)
Types of Communication Persistence –Persistent communication – Stores message until communicated to user –Transient communication – Stored only when sending and receiving processes are alive Transport level protocols provide transient communication Synchronicity –Asynchronous – Sender continues after sending message –Synchronous – Sender blocks until message is stored at receiver's local buffer, delivered to receiver or processed by receiver
Message Oriented Transient Communication -Berkeley Sockets Communication pattern using TCP/IP sockets Interface for transport layer A communications end point
Processes & Threads Virtual processors –Created by OS to execute a program Process is a program in execution –Executed on one of the virtual processors Operating systems ensure that processes are independent and transparent –Resource sharing is transparent Creating processes is costly Switching processes is costly too
Threads Similar to a process –Perceived as execution of (a part of) program –Information maintained for sharing CPU is minimal Context of threads is captured by CPU context –May be a little more information is needed for management (like locks) Very little overheads –Thread switching is easy Can provide performance gains
Names & Naming System Required for identifying entities, locating them, communicating to them Name can be resolved to the entity it refers to Name is a string of bits used to refer to an entity Entity can resources/users/data/processes Access Point – Host of another entity –Name of access point is its address Naming system resolves names Naming system in distributed systems can itself be distributed
Name Spaces A general naming graph with a single root node. Organization of names usually as a directed graph Leaf Node – Represents named entity Directory node – Enlists other names
Name Space Distribution An example partitioning of the DNS name space, including Internet-accessible files, into three layers.
Importance of Clocks & Synchronization Avoiding simultaneous access of resources Process may need to agree upon ordering of events Synchronization & ordering is difficult in distributed setting Notion of time is tricky in distributed setting –How to deal with clock drifts? Logical clocks –Agreement with regards to ordering of events suffices Happens-before relation
Mutual Exclusion Ensuring consistency of data sometimes needs exclusive access to data Critical regions for mutual exclusion When a process wants to read/update shared data structures it first enters a critical region Only one process allowed to be in the critical region Coordinator-based centralized algorithm Ricart and Agrawala’s algorithm Token ring algorithm
Transactions Protects data and allows processes to access and modify multiple data items as a single atomic transaction –If process backs out halfway, everything is restored back Originated in business world –Parties free to negotiate and back-off during negotiation –No backing-off after the contract is signed Initiator process announces the beginning of a transaction Processes create, update, and delete entries Initiator announces that it wants others to “commit” –Transaction made permanent if everyone agrees –Otherwise transaction is aborted and all entries are restored back
Transaction Primitives Examples of primitives for transactions. PrimitiveDescription BEGIN_TRANSACTIONMake the start of a transaction END_TRANSACTIONTerminate the transaction and try to commit ABORT_TRANSACTIONKill the transaction and restore the old values READRead data from a file, a table, or otherwise WRITEWrite data to a file, a table, or otherwise
ACID Properties of Transactions Atomic – Happens indivisibly to the outside world Consistent – Does not violate system constraints Isolated – Concurrent transactions do not interfere with each other Durable – Changes are permanent when a transaction commits
How to Implement Transactions? Private workspace –When a process starts a transaction, it gets a private workspace of all files it needs to use –Operations only on private workspace –Private workspace is written back (ignored) on commit (abort) –Efficiency problems – copying everything is costly.
Distributed Transactions Distributed transaction is a transaction where in data is distributed 2 Phase commit protocol Commit request phase –Coordinator sends query to commit message to all nodes –Nodes place an entry into their undo and redo logs –Nodes send agreement/abort messages Commit phase –Coordinator places an entry into log –Sends commit/abort messages to all nodes –Nodes send acknowledgements
Concurrency Control Concurrent transactions are isolated –Final result should be the same as if the transactions were executed one after another in some order Synchronization classification –Locking –Timestamps Two phase locking – Growing & shrinking phases –Transaction acquires all locks before releasing any of them Distributed 2PL –Coordinator manages all lock operations
Replication Two primary reasons –Improving reliability of system –Improving scalability and performance of system Reliability –Resilience to failures –Protection against data corruption: Byzantine failures and quorum-based systems Scalability –Scaling in numbers –Geographical scaling
Problems of Replication Creating and maintaining replicas is not free Multiple copies leads to consistency problems –What happens when one of the replicas gets modified? –Modifications have to be carried out at all replicas –How and when determines the cost of replication WWW-based systems –Browser and client side caches –May lead to stale pages –TTL model, Update/Invalidate model
Consistency Models Strict Sequential Linearizable Causal Fifo Weak Release Entry
Fault Tolerance & Dependability Availability –Ready to be used IMMEDIATELY Reliability –Run continuously without FAILURE Safety –When fails, nothing catastrophic happens Maintainability –How easy a failed system can be repaired Failures can be malicious or non-malicious
Failure Masking Hiding failures from other processes Fault tolerance by redundancy Information redundancy – Error correcting codes Temporal redundancy – Transactions Physical redundancy – Multiple disks