Reliable Distributed Systems

Slides:



Advertisements
Similar presentations
Computers Are Your Future © 2008 Prentice-Hall, Inc.
Advertisements

Reliable Distributed Systems Fundamentals. Overview of Lecture Fundamentals: terminology and components of a reliable distributed computing system Communication.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 2: August 29.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
Hands-On Microsoft Windows Server 2003 Networking Chapter Three TCP/IP Architecture.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 3: TCP/IP Architecture.
Common Devices Used In Computer Networks
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Application Layer CHAPTER 2. Announcements and Outline  Administrative Items  Questions? Recap 1.Introduction to Networks 1.Network Type 2.N etwork.
ACM 511 Chapter 2. Communication Communicating the Messages The best approach is to divide the data into smaller, more manageable pieces to send over.
Computers Are Your Future Tenth Edition Chapter 8: Networks: Communicating & Sharing Resources Copyright © 2009 Pearson Education, Inc. Publishing as Prentice.
© McLean HIGHER COMPUTER NETWORKING Lesson 1 – Protocols and OSI What is a network protocol Description of the OSI model.
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
William Stallings Data and Computer Communications
Chapter2 Networking Fundamentals
Lecture (Mar 23, 2000) H/W Assignment 3 posted on Web –Due Tuesday March 28, 2000 Review of Data packets LANS WANS.
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Chapter 7: Using Windows Servers
Chapter Objectives In this chapter, you will learn:
More on WANs Fiber Optic Cables Used in Internet backbone
Chapter 16 – Networking Outline 16.1 Introduction
Lecture (2).
Introduction to Technology Infrastructure
Computer Networks.
NETWORK Unit 1 Module: 2 Objective: 7.
CHAPTER 3 Architectures for Distributed Systems
Lecture 6: TCP/IP Networking By: Adal Alashban
CHAPTER 2 Application Layer.
Introduction to Networks
CT1303 LAN Rehab AlFallaj.
Data Networking Fundamentals
#01 Client/Server Computing
Introduction to Technology Infrastructure
Designing a local area network
Wednesday, September 19, 2018 What Is the Internet?
Chapter 16: Distributed System Structures
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Topic 5: Communication and the Internet
Packet Switching To improve the efficiency of transferring information over a shared communication line, messages are divided into fixed-sized, numbered.
Distributed Systems Bina Ramamurthy 11/12/2018 From the CDK text.
Process-to-Process Delivery:
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Multiple Processor Systems
Chapter Goals Compare and contrast various technologies for home Internet connections Explain packet switching Describe the basic roles of various network.
Architectures of distributed systems Fundamental Models
Lecture 4 Communication Network Protocols
Computer communications
Unit 11- Computer Networks
TCP/IP Protocol Suite: Review
Networking Theory (part 2)
NETWORK Unit 1 Module: 2 Objective: 7.
System Models and Networking Chapter 2,3
NETWORK Unit 1 Module: 2 Objective: 7.
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Introduction to Networking & TCP/IP
Distributed Systems Bina Ramamurthy 4/7/2019 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
Architectures of distributed systems Fundamental Models
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, July 2010.
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
#01 Client/Server Computing
COMPUTER NETWORKING Presented by Pushpanjali Associate professor
Networking Theory (part 2)
Presentation transcript:

Reliable Distributed Systems Fundamentals Chapter 1 1/13/2019

Some terminology A program is the code you type in A process is what you get when you run it A message is used to communicate between processes. Arbitrary size. A packet is a fragment of a message that might travel on the wire. Variable size but limited, usually to 1400 bytes or less. A protocol is an algorithm by which processes cooperate to do something using message exchanges. 1/13/2019

More terminology A network is the infrastructure that links the computers, workstations, terminals, servers, etc. It consists of routers They are connected by communication links A network application is one that fetches needed data from servers over the network A distributed system is a more complex application designed to run on a network. Such a system has multiple processes that cooperate to do something. 1/13/2019

A network is like a “mostly reliable” post office 1/13/2019

Why isn’t it totally reliable? Links can corrupt messages Rare in the high quality ones on the Internet “backbone” More common with wireless connections, cable modems Routers can get overloaded When this happens they drop messages This is very common But protocols that retransmit lost packets can increase reliability 1/13/2019

How do distributed systems differ from network applications? Distributed systems may have many components but are often designed to mimic a single, non-distributed process running at a single place. “State” is spread around in a distributed system Networked application is free-standing and centered around the user or computer where it runs. (E.g. “web browser.) Distributed system is spread out, decentralized. (E.g. “air traffic control system”) 1/13/2019

Web connectivity Browser is independent: fetches data you request when you ask for it. Web servers do not keep track of who is using them. Each request is self-contained and treated independently of all others. Cookies do not count: they are stored on client machine And the database of account info doesn’t count either… this is “ancient” history, nothing recent ... So the web has two network applications that talk to each other The browser on your machine The web server it happens to connect with… which has a database “behind” it 1/13/2019

Web (contd.) Cookie identifies this user, encodes past preferences Database HTTP request Web browser with stashed cookies Web servers are kept current by the database but usually don’t talk to it when your request comes in 1/13/2019

Web servers immediately forget the interaction Web (contd.) Web servers immediately forget the interaction Reply updates cookie 1/13/2019

Web (contd.) Web servers have no memory of the interaction Purchase is a “transaction” on the database 1/13/2019

Web (contd.) The data center that serves your request may be a complex distributed system Many servers and perhaps multiple physical sites Opinions about which clients should talk to which servers Data replicated for load balancing and high availability Complex security and administration policies So: we have a “networked application” talking to a “distributed system” 1/13/2019

Other examples of distributed systems Air traffic control system with workstations for the controllers Banking/brokerage trading system that coordinates trading (risk management) at multiple locations Factory floor control system that monitors devices and re-plans work as they go on/offline 1/13/2019

Is the Web “reliable”? We want to build distributed systems that can be relied upon to do the correct thing and to provide services according to the user’s expectations Not all systems need reliability If a web site doesn’t respond, you just try again later Reliability is a growing requirement in “critical” settings but these remain a small percentage of the overall market for networked computers 1/13/2019

Reliability is a broad term Fault-Tolerance: remains correct despite failures High or continuous availability: resumes service after failures, doesn’t wait for repairs Performance: provides desired responsiveness Recoverability: can restart failed components Consistency: coordinates actions by multiple components, so they mimic a single one Security: authenticates access to data, services Privacy: protects identity, locations of users 1/13/2019

“Failure” also has many meanings Halting failures: component simply stops Fail-stop: halting failures with notifications Omission failures: failure to send/recv. message Network failures: network link breaks Network partition: network fragments into two or more disjoint subnetworks Timing failures: action early/late; clock fails, etc. Byzantine failures: arbitrary malicious behavior 1/13/2019

Examples of failures My PC suddenly freezes up while running a text processing program. No damage is done. This is a halting failure A network file server tells its clients that it is about to shut down, then goes offline. This is a failstop failure. (The notification can be trusted) An intruder hacks the network and replaces some parts with fakes. This is a Byzantine failure. 1/13/2019

More terminology A real-world network is what we work on. It has computers, links that can fail, and some problems synchronizing time. But this is hard to model in a formal way. An asynchronous distributed system is a theoretical model of a network with no notion of time A synchronous distributed system, in contrast, has perfect clocks and bounds all events, like message passing. 1/13/2019

ISO protocol layers: Oft-cited Standard ISO is tied to a TCP-style of connection Match with modern protocols is poor We are mostly at “layer 4” – session 1/13/2019

Internet protocol suite Can be understood in terms of ISO Defines “addressing” standard, basic network layer (IP packets, limited to 1400 bytes), and session protocols (TCP, UDP, UDP-multicast) For example, TCP is a “session” protocol Includes standard “domain name service” that maps host names to IP addresses DNS itself is tree-structured and caches data 1/13/2019

Typical hardware options Ethernet: 10Mbit CSMA technology, limited to 1400 byte packets. Uses single coax cable. FDDI: twisted pair, self-repairing if cable breaks Bridged Ethernet: common in big LAN’s, ring with multiple ethernet segments Fast Ethernet: 100Mbit version of ethernet ATM: switching technology for fiber optic paths. Can run at 155Mbits/second or more. Very reliable, but mostly used in telephone systems. 1/13/2019

Implications for reliability? Protocol designers have problems predicting the properties of local-area networks Latencies and throughput may vary widely even in a single installation Hardware properties differ widely; often, must assume the least-common-denominator Packet loss a minor problem in hardware itself 1/13/2019

Technology trends Did the sudden growth in in LAN speed give us the Web? Source: Scientific American, Sept. 1995 1/13/2019

Typical latencies (milliseconds) WAN, disk latencies are fairly constant due to physical limitations Note dramatic drop in LAN latencies over ATM: This is the hardware used telephone systems 1/13/2019

O/S latency: the most expensive overhead on LAN communication! 1/13/2019

Broad observations A discontinuity is currently occurring in WAN communication speeds! Especially in military systems, where ATM networking hardware has been deployed widely Other performance curves are all similar Disks have “maxed out” and hence are looking slower and slower Memory of remote computers looks “closer and closer” O/S imposed communication latencies has risen in relative terms over past decade! 1/13/2019

Implications? The revolution in WAN communication we are now seeing is not surprising and will continue Look for a shift from disk storage towards more use of access to remote objects “over the network” O/S overhead is already by far the main obstacle to low latency and this problem will seem worse and worse unless O/S communication architectures evolve in major ways. 1/13/2019

More Implications Look for full motion video to the workstation by around 2010 or 2015… today we already see this in bits and pieces but not as a routine option Low LAN latencies: an unexploited “niche” One puzzle: what to do with extremely high data throughput but relatively high WAN latencies O/S architecture and whole concept of O/S must change to better exploit the “pool of memory” of a cluster of machines; otherwise, disk latencies will loom higher and higher 1/13/2019

Discovery Consider the problem of discovering the right server to connect with Your computer needs current map data for some place, perhaps an amusement park Can think of it in terms of layers – the basic park layout, overlaid with extra data from various services, such as “length of the line for the Cyclone Coaster” or “options for vegetarian dining near here” 1/13/2019

Why is discovery hard? Client has opinions You happen to like vegetarian food, but not spicy food. So your search is partly controlled by client goals But a given service might have multiple servers (e.g. Amazon might have data centers in Europe and in the US…) and may want your request to go to a particular one Once we find the server name we need to map it to an IP address And the Internet itself has routing “opinions” too 1/13/2019

Fundamental terms: Protocol Protocol is a set of rules that end points in a telecommunication system use when exchanging information. IP: Internet protocol defines an unreliable packet transfer protocol. TCP: Transmission Control Protocol builds on IP to define a reliable data delivery protocol. LDAP: Lightweight Directory Access Protocol builds on TCP to define a query-response protocol for querying the state of a remote database. HTTP: Hyper Text Transfer Protocol builds on TCP to facilitate hyper-text document exchange. 1/13/2019

Fundamental terms: Service Service is a network-enabled entity that provides a specific capability. Service = Protocol + Behavior A service definition permits many implementations. Examples: ability to move files, create processes, verify access rights An FTP server speaks File Transfer Protocol and supports remote read and write access to a collection of files. 1/13/2019

Fundamental terms: API Application Program Interface (API) defines a standard interface for invoking a specified set of functionality. Examples: The Generic Security Service (GSS) API defines standard functions for verifying identity of communicating parties, encrypting messages and so forth. 1/13/2019

Client/Server Server: refers to a process on a networked computer that accepts requests from other (local or remote) processes to perform a service and responds appropriately. Client: requesting process in the above is referred to as the client. Request and response are in the form of messages. Client is said to invoke an operation on the server. Many distributed systems today are constructed out of interacting clients/servers. 1/13/2019

Middleware (as defined by NSF) Middleware refers to the software which is common to multiple applications and builds on the network transport services to enable ready development of new applications and network services. Middleware typically includes a set of components such as resources and services that can be utilized by applications either individually or in various subsets. Examples of services: Security, Directory and naming, end-to-end quality of service, support for mobile code. OMG’s CORBA defines a middleware standard. OMG: Object Management Group CORBA: Common Object Request Broker Architecture 1/13/2019

Middleware “desktop” server server client client middleware middleware BR server server client client “desktop” middleware middleware “network”

Summary In this course, we will study distributed systems at the middleware level: how to define, design and implement services, how to use the middleware services in a distributed application. We will study Java RMI as a case study for simple distributed system. We will also introduce “webservices” standard. 1/13/2019