CS542: Topics in Distributed Systems

Slides:



Advertisements
Similar presentations
Distributed Computing
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Jaringan Informasi Pengantar Sistem Terdistribusi oleh Ir. Risanuri Hidayat, M.Sc.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
Distributed components
CHARACTERIZATION OF DISTRIBUTED SYSTEMS
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Distributed Systems.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) August 26 – December 9, 2014 Lecture 1-29 Web: courses.engr.illinois.edu/cs425/ All.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization and Design Goals Dr. Michael R. Lyu Computer.
City University London
CMPT Dr. Alexandra Fedorova Distributed Systems.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Introduction to Distributed Systems CS412: Programming Distributed Applications Computer Science Southern Illinois University CS412: Programming Distributed.
EECE 411: Design of Distributed Software Applications What is a Distributed System? You know when you have one … … when the failure of a computer you’ve.
Lecture 1-1 CS542: Topics in Distributed Systems Diganta Goswami.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
Challenges to address for distributed systems Yvon Kermarrec Télécom Bretagne Institut Mines Télécom.
DISTRIBUTED COMPUTING
Chapter 2 Architectural Models. Keywords Middleware Interface vs. implementation Client-server models OOP.
CS431 Distributed Systems
1 Distributed Computing Class: BIT 5 & 6 Instructor: Aatif Kamal Chapter 01: Character of Distributed Systems Dated: 06 th Sept 2006.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Chapter 1 Characterization of Distributed Systems Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education.
CH2 System models.
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ Bộ môn Mạng và Truyền Thông Máy Tính.
Characterization of Distributed Systems
1 MSCS 237 Communication issues. 2 Colouris et al. (2001): Is a system in which hardware or software components located at networked computers communicate.
Distributed Systems: Concepts and Design Chapter 1 Pages
Distributed Software Engineering Lecture 1 Introduction Sam Malek SWE 622, Fall 2012 George Mason University.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Transparency In Distributed Systems Hiremath,Naveen
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
OS2- Sem ; R. Jalili Introduction Chapter 1.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
1 MSCS 237 Communication issues. 2 Colouris et al. (2001): Is a system in which hardware or software components located at networked computers communicate.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
Prepared By: Md Rezaul Huda Reza
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Design of Parallel and Distributed.
Chapter 1: Distributed Systems Overview. Objectives To be aware of the characteristics of concurrency, independent failure of components and lack of a.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization of Distributed & Mobile Systems Dr. Michael R.
Examples of distributed systems Resource sharing and the web
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Introduction to Distributed Systems. Distributed System Definitions:  “A distributed system is a collection of independent computers that appear to the.
Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Chapter 1 Characterization of Distributed Systems.
Distributed Systems Architecure. Architectures Architectural Styles Software Architectures Architectures versus Middleware Self-management in distributed.
Chapter 1 Characterization of Distributed Systems
Introduction to Distributed Systems
Examples of distributed systems Resource sharing and the web
Storage Virtualization
Advanced Operating Systems
Distributed Systems Bina Ramamurthy 11/12/2018 From the CDK text.
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
COMP28112 Lecture 2 A few words about parallel computing
Distributed Systems Bina Ramamurthy 4/7/2019 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
Introduction To Distributed Systems
Distributed System 1.
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

CS542: Topics in Distributed Systems Diganta Goswami

Distributed System A collection of independent computers that appears to its users as a single coherent system. Autonomous computers Many components – connected by a network – sharing resources.

Distributed System A System of networked components that communicate and coordinate their actions only by passing messages concurrent execution of programs no global clock components fail independently of one another

Another definition You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done. inter-dependencies shared state independent failure of components

A working definition for us A distributed system is a collection of entities, each of which is autonomous, programmable, asynchronous and failure-prone, and which communicate through an unreliable communication medium using message passing. Entity=a process on a device (PC, PDA) Communication Medium=Wired or wireless network Our interest in distributed systems involves design and implementation, maintenance, algorithmics Designers and progammers interested in : algorithmics, maintenance. (informal definition, for us programmers of distributed systems) Ok: peer to peer systems Contradiction: a computer without ROM or disk drives that needs to boot over the network. Is a collection of these computers a distributed system?

“Important” Distributed Systems Issues No global clock: no single global notion of the correct time (asynchrony) Unpredictable failures of components: lack of response may be due to either failure of a network component, network path being down, or a computer crash (failure-prone, unreliable) Highly variable bandwidth: from 16Kbps (slow modems or Google Balloon) to Gbps (Internet2) to Tbps (in between DCs of same big company) Possibly large and variable latency: few ms to several seconds Large numbers of hosts: 2 to several million

There are a range of interesting problems for Distributed System designers Real distributed systems Cloud Computing, Peer to peer systems, Hadoop, distributed file systems, sensor networks, graph processing, … Classical Problems Failure detection, Asynchrony, Snapshots, Multicast, Consensus, Mutual Exclusion, Election, … Concurrency RPCs, Concurrency Control, Replication Control, … Security Byzantine Faults, … Others…

Typical Distributed Systems Design Goals Common Goals: Heterogeneity – can the system handle a large variety of types of PCs and devices? Robustness – is the system resilient to host crashes and failures, and to the network dropping messages? Availability – are data+services always there for clients? Transparency – can the system hide its internal workings from the users? Concurrency – can the server handle multiple clients simultaneously? Efficiency – is the service fast enough? Does it utilize 100% of all resources? Scalability – can it handle 100 million nodes without degrading service? (nodes=clients and/or servers) Security – can the system withstand hacker attacks? Openness – is the system extensible?

Challenges and Goals of Distributed Systems Heterogeneity Openness Security Scalability Failure handling Concurrency Transparency

Challenges Heterogeneity (variety and difference) of underlying network infrastructure, Internet consists of many different sorts of network – their differences are masked by the fact that all of the computers attached to them use the Internet Protocols for communication. e.g. a computer attached to an Ethernet has an implementation of the Internet Protocols over the Ethernet, whereas a computer on a different sort of network will need an implementation of the Internet Protocols for that network.

Heterogeneity Computer hardware and software e.g., operating systems, compare UNIX socket and Winsock calls Programming languages : in particular, data representations

Some approaches: Middleware A S/W layer that provides a programming abstraction as well as masking the heterogeneity of the underlying networks, H/W, O/S and programming languages. Middleware (e.g., CORBA): transparency of network, hard- and software and programming language heterogeneity. JAVA RMI In addition to solving the problems of heterogeneity, middleware provides a uniform computational model for use by the programmers of servers and distributed applications.

Positioning Middleware General structure of a distributed system as middleware. 1-22

Openness Characteristic that determine whether the system can be extended and re-implemented in various ways. Determined primarily by the degree to which new resource sharing services can be added and be made available for use by a variety of client programs. Cannot be achieved unless the specification and documentation of the key s/w interfaces are made available to s/w developers (i.e. key interfaces are published)

Openness Designers of the Internet protocols introduced a series of documents called RFCs Specifications of the Internet communication protocols Specifications for applications run over them e.g., email, telnet, file transfer, etc. (by the mid 80’s) RFCs are not the only means --- e.g. CORBA is published through a series of documents, including a complete specification of the interfaces of its services (www.omg.org)

Openness Offering services according to standard rules that describe the syntax and semantics of those services e.g., Network protocol rules (RFCs) Services specified through interfaces Interface definition languages (IDLs) specifies names and available functions as well as parameters, return values, exceptions etc.

Security Distributed systems must protect the shared information and resources The openness of DS makes them vulnerable to security threats Providing security is a significant challenge for DS

Security. Integrity: protection against alteration or corruption Privacy / Confidentiality: protection against disclosure to unauthorized individuals Integrity: protection against alteration or corruption Availability: protection against interference with the means to access the resources

Scalability Scalable system—system that can handle additional number of users/resources without suffering noticeable loss of performance Three metrics of a scalable system No of user/resources Distance between the farthest nodes in the system (network radius) Number of organizations exerting control over the pieces of the system

Challenges in designing scalable DS Controlling the cost of physical resources: As the demand for a resource grows, it should be possible to extend the system, at reasonable cost, to meet it. e.g. it must be possible to add server computers to avoid the performance bottleneck that would arise if a single file server had to handle all file access request when the freq. of file access request grows in an intranet with the increase in users and computers. www.amazon.com is more than one computer

Challenges in designing scalable DS Controlling the performance loss: Management of a set of data whose size is proportional to the number of users or resources in the system e.g. the Domain Name System holds the table with the correspondence between domain names of computers and their Internet address Hierarchic structures scale better than linar structures.

An example of dividing the DNS name space into zones. Scaling Techniques 1.5 An example of dividing the DNS name space into zones.

Challenges in designing scalable DS Preventing s/w resources running out: Numbers used as Internet address --- 32 bits was used in the late 70’s but may run out soon. Change from 32 bits to 128 bits? Difficult to predict the demand. Over-compensating for future growth may be worse than adapting to a change when we are forced to - large Internet address occupy extra space in messages and in computer storage.

Failure Handling Failure in a DS is partial Some components fail while others continue to function This makes handling of failures difficult.

Techniques for dealing with failures Detecting failures may be impossible – remote site crash or delay in message transmission? Some can be. Ex. - Checksums can be used to detect corrupted data

Techniques for dealing with failures Masking failure Some can be hidden or made less severe Retransmission – when messages fail to arrive

Techniques for dealing with failures Tolerating failures Would not be practical to detect and hide all of the failures. Can be designed to tolerate some of those e.g. timeouts when waiting for a web resource – clients give up after a predetermined number of attempts and take other actions & inform the user.

Failure Handling Recovery from failures Redundancy Rollback Undo/Redo in transactions Redundancy Makes the system more available through replication of resources/data Redundant routes in the network Replication of name tables in multiple domain name servers

Concurrency In a distributed system it is possible that multiple machines/processes/users may try to access shared data/resource concurrently Can potentially lead to incorrect results and/or Deadlocks The operations must be synchronized/serialized so that the end result is correct

Transparency Concealing the heterogeneous and distributed nature of the system so that it appears to the user like one system Making the user believe that there is only a single, undivided system i.e., to hide the notion of distribution completely What are the challenges of transparency?

Transparency Categories Access transparency - access local and remote resources using identical operations e.g., users of UNIX NFS can use the same commands and parameters for file system operations regardless of whether the accessed files are on a local or remote disk.

Transparency categories Location Transparency: Access without knowledge of location of a resource e.g., URLs, email addresses (hostname, IP addresses, etc. not required --- the part of the URL that identifies a web server domain name refers to a computer name in a domain, rather than to an Internet address)

Transparency Categories Concurrency transparency: Allow several processes to operate concurrently using shared resources in a consistent fashion w/o interference between them. That is, users and programmers are unaware that components request services concurrently. Replication transparency Use replicated resource as if there was just one instance. Increase reliability and performance w/o knowledge of the replicas by users or application programmers.

Failure transparency Enables the concealment of faults, allowing users and application programs to complete their task despite failures of h/w or s/w components. Retransmit of email messages – eventually delivered even when servers or communication links fail – it may even take several days.

Failure transparency Failure transparency depends on concurrency and replication transparency. Replication can be employed to achieve failure transparency Message transmission governed by TCP is a mechanism for providing failure transparency

Mobility Transparency Mobility transparency: allow resources to move around w/o affecting the operation of users or programs e.g., 700 phone number – but URLs are not, because someone’s personal web page cannot move to their new place of work in a different domain – all of the links in other pages will still point to the original page!

Transparency Categories Performance transparency: adaptation of the system to varying load situations without the user noticing it. Scaling transparency: allow system and applications to expand without need to change structure or application algorithms

Degree of transparency There are systems in which attempting to blindly hide all distribution aspects from users is not always a good idea Requesting your electronic newspaper in your mailbox before 7 am local time – while you are at the other end of the world living in a different time zone (Your morning paper will not be the morning paper you are used to)

Degree of transparency There is trade-off between a high degree of transparency and the performance of a system Masking transient server failure by retransmitting the request may slow down the system If it is necessary to guarantee that several replicas need to be consistent all the time, a single update may take a long time – something that cannot be hidden from the user.