Meng Han Presentation 09/11/2013 CS8320 – Advanced Operating Systems Fall 2013 – Section 2.6 Presentation
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Introduction A distributed system mainly consists[1]: Coordination of concurrent distributed processes Management of distributed resources Functioning of distributed algorithms However… Network may beUNRELIABLE Components may beUNTRUSTED These raise the design and implementation issues, in particular how to support transparency.
Introduction design and implementation issues: How to model and identify objects in system How to co-ordinate the interaction among objects How to communicate with each other How to shared/replicated objects be managed in controlled fashion How to protect objects and security of system
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Object Models and Naming Schemes Objects in a computer system: processes, data files, memory, devices, processors, and networks. Objects are encapsulated in servers process servers, file servers, memory servers etc. a client is a null server that accesses object servers.
Object Models and Naming Schemes Identify a server[2] by name ( name server ) by either physical or logical address ( network server ) by service that the servers provide Following all depend on the naming scheme for system objects: Structure of the system, management of name space, name resolution, access methods
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Distributed Coordination Coordination to achieve synchronization Different types of synchronization: Barrier synchronization Process must reach a common synchronization point before they can continue Condition coordination process must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution Mutual exclusion Concurrent processes must have mutual exclusion when accessing a critical shared resource
Synchronization Issues State information sent by messages: Typically only partial state information is known about other processes making synchronization difficult. Information not current due to transfer time delay. Decision if process may continue must rely on a message resolution protocol. Centralized Coordinator: Central point of failure Deadlocks[3] Circular Waiting for the other process Deadlock detection and recovery strategies
Synchronization Issues Deadlocks Four conditions must hold for deadlock to occur Exclusive use Hold and wait No preemption Cyclical wait The problem of deadlocks can be handled in following ways Prevention, avoidance and detection
Deadlock Prevention Schemes that guarantee the deadlocks can never happen because of the way the system is structured. One of the four conditions is prevented, thus preventing deadlocks. For example, to impose an order on the resources and require processes to request resources in increasing order. This prevents cyclical wait and thus makes deadlocks impossible.
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Interprocess Communication Lower level: Interprocess communication can be accomplished by using simple message passing primitives. Higher level: logical communication methods provides the transparency: Hide the physical details of message passing Two important concepts : The client/server model Remote Procedure Call (RPC)
The Client/Server Model The client/ server model is a programming example for structuring processes in distributed systems[4]. logical communication request reply actual communication network client server kernel
The RPC Model The remote procedure call model is similar to that of the local model: The caller places arguments to a procedure in a specific location (such as a result register). The caller temporarily transfers control to the procedure. When the caller gains control again, it obtains the results of the procedure from the specified location. The caller then continues program execution.
The RPC Model On the server side, a process is dormant (inactive, sleeping)— Awaiting the arrival of a call message. When one arrives, the server process computes a reply that it then sends back to the requesting client. After this, the server process becomes dormant again.
The RPC Model
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Distributed Resources Load Distribution multiprocessor scheduling (Static) load sharing (Dynamic) Distributed shared memory Distributed file systems
Load Distribution Multiprocessor scheduling[5] Minimize communication overhead with efficient scheduling. Load sharing Process migration strategy & mechanism
Distributed File Systems and Distributed Shared Memory Distributed file systems Issues are based on a file point of view Distributed shared memory Issues are based on a process perception of the system. The common issues central to them: Sharing and replication of data
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Fault Tolerance and Security Security threats and failures are both system faults. The problem of failures can be alleviated if there is redundancy in the system. The system should transparently handle failures or removal of machines, network links, and other resources without loss of data or functionality. This should hold true for both the system itself and for its applications.
Fault Tolerance and Security Security[6] Authentication -- clients and also servers and messages must be authenticated. Authorization -- access control has to be performed across a physical network with heterogeneous components under different administrative units using different security models.
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Design for BIG-DATA Emergence of Big Data Big data is a foundational element of social networking and Web 2.0-based information companies. The enormous amount of data is generated as a result of democratization and ecosystem factors such as the following: Mobility trends Data access and consumption Ecosystem capabilities
Design for BIG-DATA Mobility trends: Mobile devices, mobile events and sharing, and sensory integration Data access and consumption: Internet, interconnected systems, social networking, and convergent interfaces and access models Ecosystem capabilities: Major changes in the information processing model and the availability of an open source framework; the general-purpose computing and unified network integration
Design for BIG-DATA
Outline IIntroduction DDistributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security DDesign for big-data SSummary RReferences
Summary Given the system architectures, we summarized the important design and implementation issues. These issues include object models and naming schemes, interprocess communication and synchronization, data sharing and replication, and failure and recovery. These problems are unique to distributed systems.
References [1]Randy Chow & Theodore Johnson, 1997, “Distributed Operating Systems & Algorithms”, (Addison-Wesley), p. 45 to 50, 61 to 63. [2] Suresh Sridharan, 2006, “Distributed Operating Systems”, (University of Wisconsin, Madison). 9/Writeups/Survey.pdf [3] Chandy, K. Mani, Jayadev Misra, and Laura M. Haas. “Distributed deadlock detection.” ACM Transactions on Computer Systems (TOCS) 1.2 (1983):
References [4] Holliday, J., and Amr El Abbadi. “Distributed deadlock detection.” Encyclopedia of Distributed Computing. Kluwer Academic Publishers, Dordrecht (accepted for publication) (2005). [5] Babaoglu, Ozalp, and Keith Marzullo. “Consistent global states of distributed systems: Fundamental concepts and mechanisms.” Distributed Systems 2 (1993): 12. [6] Krishna Sankar, Andrew Balinsky, Darrin Miller, Sri Sundaralingam. (Feb 18, 2005)“EAP Authentication Protocols for WLANs”.
References [7] Bohlouli, Mahdi, et al. “Towards an Integrated Platform for Big Data Analysis.”Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer Berlin Heidelberg, [8] Wolf, Marilyn. “Computers as components: principles of embedded computing system design.” Access Online via Elsevier, [9] Provost, Foster, and Tom Fawcett. “Data Science and its Relationship to Big Data and Data-Driven Decision Making.” Big Data 1.1 (2013):
VISA LEADER