Introduction to DISTRIBUTED COMPUTING Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology
Outline Why distributed computing needed ? – performed by distributed systems Examples Definitions Goals to build distributed systems
Why distributed systems needed ? (1) Functional distribution: computers have different functional capabilities – Client/server – Host/terminal – Data gathering/data processing sharing of resources with specific functionalities Inherent distribution: stemming from application domain, e.g., – cash register and inventory systems for supermarket chains – computer supported collaborative work
Why distributed systems needed ? (2) Load distribution/balancing: assign tasks to computers such that overall performance is optimized Replication of processing power: independent computers working on the same task – collection of microcomputers may have processing power that no supercomputer will ever achieve
Why distributed systems needed ? (3) Physical separation: relying on the fact that computers are physically separated (e.g., to satisfy reliability requirements) Economics: collections of microprocessors offer a better price/performance ratio than large mainframes – mainframes: 10 times faster, 1000 times as expensive
Examples (1) Network of workstations – all files accessible from all machines in the same way and using the same path name – system looks for the best place to execute a command distributed system Workflow information system: automatic order processing – people from several departments at different locations – users unaware how an order to be processed distributed system
Examples (2) World Wide Web: offering uniform model of distributed documents – in theory, no need to know where the document is fetched – in practice, the location should be awared
Examples (3) Internet interconnected collection of computer networks of many different types computer interacts by passing messages using a common means of communication
Examples (4) Intranet resources shared to different computers
Definitions (1) “A system in which hardware or software located at networked computers communicate and coordinate their actions only by message passing”. [Coulouris] “A system that consists of a collection of two or more independent computers which coordinate their processing through exchange of synchronous or asynchronous message passing”
Definitions (2) “A distributed system is a collection of independent computers that appear to the users of the system as a single computer”. [Tanenbaum] “A distributed system is a collection of autonomous computers linked by a network with software designed to produce an integrated computing facility”
Definitions (3) There are several autonomous computational entities, each of which has its own local memory. [Andrews et al]
Computer networks vs. Distributed systems Computer network: autonomous computers are explicitly visible (have to be explicitly addressed) Distributed system: existence of multiple computers is transparent However, – many problems in common – in some sense networks (or parts of them, e.g. name services) are also distributed systems – normally, every distributed system relies on services provided by a computer network
Which examples are distributed systems ? Network of workstations distributed system Workflow information system: automatic order processing distributed system World Wide Web not fully qualified as a distributed system (Tanenbaum) distributed system (Coulouris)
Machine A Local OS Machine B Local OS Machine C Local OS Distributed applications Middleware service To guarantee – supporting heterogeneous computers – providing single view to users
Goals to build a distributed systems (1) Connecting users and resources – sharing resource – easier to collaborate and exchange information disadvantage: security (intrusion), privacy violation (communication tracking)
Goals to build a distributed systems (2) Transparency Description Access Hide differences in data representation and how a resource is accessed LocationHide where a resource is located MigrationHide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use ReplicationHide that a resource may have many copies ConcurrencyHide that a resource may be shared by several competitive users FailureHide the failure and recovery of a resource PersistenceHide whether a (software) resource is in memory or on disk tradeoff between a high degree of transparency and the performance of system
Goals to build a distributed systems (3) Openness – Offering services according to standard rules that describe syntax and semantics of those services syntax specification: in interface definition language semantic specification: in natural language – Interoperability and portability – Flexibility: using different components from different developers
Goals to build a distributed systems (4) Scalability – Measured in three dimensions size: more users, resources can be added easily geographics: users, resources may lie far apart administration: still easy to manage even spanning many independent administrative organizations – Some problems must be solved size: centralization – centralized service: single server for all users – centralized data: single online telephone book – centralized algorithm: routing based on complete information
Goals to build a distributed systems (5) size: centralization – centralized service: single server for all users – centralized data: single online telephone book – centralized algorithm: routing based on complete information geographics: synchronous & unreliable communication, – some system only designed for LAN (blocking communication depends strongly on quick response) administration: conflicting policies w.r.t. resource usage, management, security
Scaling techniques Asynchronous communication Distribution Replication, caching
Typical properties tolerate failures in individual computers The structure of the system (network topology, network latency, number of computers) is not known in advance Each computer has only a limited, incomplete view of the system
Architectures Client-server: – permanent data on server 3-tier architecture: – stateless client, – N-tier: web applications Tightly-coupled (clustered): – NOW, cluster of machines Peer-to-peer – Grid computing (VO level) Space-based – virtualization as one single address-space
source:wikipedia.org
Some numbers (1) Computers in the Internet Date Computers Web servers 1979, Dec , July130, , July56,218,0005,560, , Jan.171,638,29735,424,956
Some numbers (2) Computers vs. Web servers in the Internet DateComputersWeb serversPercentage 1993, July 1,776, , July 6,642,000 23, , July19,540,0001,203, , July56,218,0006,598, , July125,888,197 31,299,592 25
Text books & materials Andrew S. Tanenbaum, Maaten Van Steen, Distributed Systems: Principles and Paradigms, Prentice Hall, Second Edition, 2007 George Coulouris, Jean Dollimore, Tim Kindberg, Distributed Systems: Concepts and Design, Addison Wesley, Fourth Edition, 2005 Google