Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 19/09/2007
Fall 2007cs4252 Agenda Course details Introduction
Fall 2007cs4253 Reading material Text book: –Andrew S. Tanenbaum, Maarten van Steen “Distributed systems: principles and paradigms”. Reference books: –Jean Dollimore, Tim Kindberg, George Coulouris “Distributed Systems: Concepts and Design” –Joe Wigglesworth and Paula McMillan “Java Programming: Advanced Topics”, 3rd edition –Bruce Eckel “Thinking in Java” 3rd edition Availabe online from Bruce Eckel's web site Reference material –Selected publications available online
Fall 2007cs4254 Course details Lectures & Handouts –Will be available online Office hours: –Usually one (1) hour after the lecture
Fall 2007cs4255 Grading policy Assignments 5% Quizzes 10% Survey report/Project 15% OHT 30% End-term 40% Assignments –Individual –No late submission Quizzes –Mostly unannounced –Occasionally announced
Fall 2007cs4256 Objective of this course? Provide an understanding of the technical issues involved in the design of modern distributed systems Present some of the major current paradigms Outcomes –Appreciation of the main principles underlying distributed systems: processes, communication, naming, synchronization, consistency, fault tolerance, and security. –Familiarity with some of the main paradigms in distributed systems: object-based systems, file systems, and coordination-based systems
Fall 2007cs4257 Lets begin! Ref: csc.liv.ac.uk
Fall 2007cs4258 Definition of a Distributed System A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
Fall 2007cs4259 Examples of Distributed Systems Computer world: –University computer network Search for ExtraTerrestrial Intelligence –GRID (distributed computing facilities) Ordinary life: –WWW, P2P systems (such as Emule, Azureus etc) –Banks (Cash machines) –Ticket reservation
Fall 2007cs42510 Goals of a Distributed Systems User-side goals (why distributed system) –Easily connect users/resources –Exhibit transparency Technical goals (how to achieve) –Be open –Be scalable Looking at these goals helps us answer the question: “is building a distributed system worth the effort?”
Fall 2007cs42511 Connecting Users and Resources Typical resources –Printers, computers, computing power, data Why sharing –Economics –Collaboration, Information Exchange (groupware) Problems with sharing –Security –Unwanted collaboration
Fall 2007cs42512 Transparency in distributed systems Transparent distributed system: –Appears to its users as if it were only a single computer system
Fall 2007cs42513 Forms of transparency TransparencyDescription Access Hide differences in data representation and how a resource is accessed LocationHide where a resource is located MigrationHide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be shared by several competitive users Concurrency Hide that a resource may be shared by several competitive users FailureHide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk
Fall 2007cs42514 Degree of transparency Transparency is –Not always desirable Users located in different continents (context-aware) –Not always possible Hiding failures (you can distinguish a slow computer from a failing one) Trade-off between a high degree of transparency and the performance of the system
Fall 2007cs42515 Openness Open systems –Offer services according to standard rules that describe the syntax and semantics of these services –Enjoy neutral and complete specifications Network protocols Advantages of being open: –Interoperability Open systems can work together –Portability Ability to transform an application from one software or hardware platform to another Open ≠ Free!
Fall 2007cs42516 Scalability Along three different dimensions –Size (the number of users and/or processes) –Geographical (maximum distance between participants) –Administrative (number of administrative domains)
Fall 2007cs42517 Scalability problems ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line telephone book Centralized algorithmsDoing routing based on complete information
Fall 2007cs42518 Scaling Techniques Three techniques: –Hiding communication latencies Try to avoid waiting for responses to remote service requests as much as possible –Distribution –Replication
Fall 2007cs42519 Scaling Techniques (1) 1.4 The difference between letting: a)a server or b)a client check forms as they are being filled
Fall 2007cs42520 Scaling Techniques (2) 1.5 An example of dividing the DNS name space into zones.
Fall 2007cs42521 Scaling Techniques (3) Replication –Increases availability –Balances the load –Reduces communication latency –But causes consistency problems Caching (client-driven)
Fall 2007cs42522 Modelling distributed systems When building distributed applications, system builders have often looked to the non- distributed systems world for models to follow (… inspiration?) Consequently, distributed systems tend to exhibit certain characteristics that are already familiar to us This applies equally to hardware concepts as it does to software concepts
Fall 2007cs42523 Classification of Multiple CPU Computer Systems Two groups: Multiprocessors –Shared memory –Several processors Multicomputers –Each machine has its own memory –Network communication
Fall 2007cs42524 Bus based multiprocessor systems Simplified representation of a shared-memory supercomputer Initially for supercomputers only, but –AMD and Intel provide 2-, 4-, and 8-processor workstations –Intel Hyper-Thread technology
Fall 2007cs42525 Multicomputer Systems a) Grid b) Hypercube How to run multiprocessor software on multicomputer systems? –Distributed operating systems
Fall 2007cs42526 Multicomputer Operating Systems General structure of a (DOS) multicomputer operating system
Fall 2007cs42527 Modeling on Multicomputers Excessively tightly connected –Such systems tend to be homogeneous (same hardware and software in all nodes) –Compromises openness and scalability
Fall 2007cs42528 Network Operating System General structure of a network operating system – all the systems are of different types: heterogeneous
Fall 2007cs42529 Network Operating System Two clients and a server in a network operating system – relatively primitive set of services provided
Fall 2007cs42530 Distributed Systems on top of Network Operating Systems Too primitive, low level communication –Compromises transparency What to do???
Fall 2007cs42531 Best of both worlds “Middleware” – best possible compromise? Middleware = NOS + additional software layer
Fall 2007cs42532 Distributed Systems as Middleware A distributed system organized as middleware. Note that the middleware layer extends over multiple machines.
Fall 2007cs42533 Middleware and Openness In an open middleware-based distributed system, the protocols used by each middleware layer should be the same, as well as the interfaces they offer to applications. This is a much higher level of abstraction than (for instance) the NOS Socket API.
Fall 2007cs42534 Modelling Software Concepts SystemDescriptionMain Goal DOS Tightly-coupled operating system for multi- processors and homogeneous multicomputers Hide and manage hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency
Fall 2007cs42535 Comparing DOS/NOS/Middleware A comparison between multiprocessor operating systems, multicomputer operating systems, network operating systems, and middleware based distributed systems. Item Distributed OS Network OS Middleware- based OS Multiproc.Multicomp. Degree of transparencyVery HighHighLowHigh Same OS on all nodesYes No Number of copies of OS1NNN Basis for communication Shared memory MessagesFilesModel specific Resource management Global, central Global, distributed Per node ScalabilityNoModeratelyYesVaries OpennessClosedMostly ClosedOpen
Fall 2007cs42536 Summary Distributed Systems … autonomous computers working together to give the appearance of a single, coherent system. They are transparent, scalable and open. Unfortunately, they also tend to be complex.
Fall 2007cs42537 Questions? That’s all for today!