Edition 3, © Addison-Wesley 2001 CS 843 - Distributed Computing Systems Chapter 6: Operating System Support Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001
Introduction An important aspect of distributed systems is resource sharing. Applications (clients) and services (resource managers) use middleware for their interaction Middleware provides resource invocation between objects or processes at the nodes of a distributed system.
Software and hardware service layers in distributed systems
Operating System Layer How well the requirements of middleware can be met by the operating system? Those requirements include: efficient and robust access to physical resources flexibility to implement the resource management An operating system is software that controls access to the underlying resources – processors, memory, communications, and storage media.
Two Concepts of Operating Systems Network operating system (There are multiple system images): It has networking capability built in. It can be used to access remote resources. A user use rlogin or telnet to another computer. It does not schedule process across the node. For example, Unix, MacOS, Windows Distributed system (There is a single system image) It controls all nodes. It transparently puts new processes at suitable nodes.
Middleware and Network Operating System (NOS) There are no distributed operating systems in general use for two reasons: For NOS, users can use it in current problem-solving. Users prefer to have a degree of autonomy for their machine Combination of middleware and NOS provides some autonomy and network transparent resources access. NOS enables user to run word processor and other stand-alone applications. Middleware enables them to take advantage of services available in their distributed system.
Operating System Layer Users will only be satisfied if their middleware-OS combination has good performance. Middleware runs on a variety of OS-hardware combinations at a node. The OS running at node provides its own abstractions of local hardware. Middleware utilizes these local resources to implement the remote invocation between objects or processes at the nodes.
Figure 6.1 System layers
OS layer support the middleware Figure 6.1 shows how the operating system layer at each of two nodes supports a common middleware layer. Kernels and the client and server processes are the chief components that manage resources and present client and require: Encapsulation: They provide a useful interface to their resources. Protection: Files are protected from being read by users without read permission. Concurrent processing: Clients may share resources and access them concurrently.
OS layer support the middleware Clients access resources by making: A RMI to a server object A system call to a kernel The following invocation-related tasks are performed: Communication: Operation parameters and results are passed by a network or within a computer. Scheduling: When an operation is invoked, its process must be scheduled with the kernel or server. Figure 6.2 shows the core OS functionality that we shall be concerned with.
Figure 6.2 Core OS functionality
Core OS Functionality Process manager: Handles the creation of and operations upon processes. Thread manager: Thread creation, synchronization and scheduling Communication manager: Threads attached to different processes on the same computer or remote processes. Memory manger: Management of physical and virtual memory. Supervisor: Dispatching of interrupts, system call traps and other exceptions.
Protection Shared resources require protection from illegitimate accesses. The threat to a system’s integrity does not only come from malicious code but also benign code with errors. For example, suppose that open files have only two operation: read and write. Protection involves: Ensure file operations can be performed only by clients with the right to perform it. Prevent a misbehaving client performs some operation that is not specified. For example, access the file pointer directly.
Protection We can protect resources from illegitimate invocations such as setFilePointerRandomly by the following methods: We can use a type-safe programming language, such as Java or Modula-3. No module may access a target module unless it has a reference to it. We can employ hardware support to protect modules from one another at the level of individual invocations. This protection mechanism needs to be built into a kernel.
Kernel and Protection The kernel sets up address spaces to protect itself and other processes. A process can not access the memory outside its address space. The process can safely transfer (switch) from a user level address space to the kernel’s address space via an interrupt or a system call trap. Programs pay a price for protection. Switch between address spaces may take many processor cycles.
Processes and Threads Nowadays, a process consists of an execution environment together with one or more threads. A thread is the operating system abstraction of an activity. An execution environment is the unit of resource management which primarily consists of: An address space Thread synchronization and communication resources Open files and windows
Processes and Threads Execution environments are expensive, but several threads can share them. The use of threads can be very helpful within servers, where concurrent processing of clients’ requests can reduce the tendency for servers to become bottlenecks. For example, one thread can process a client’s request and a second thread waits for a disk access to complete.
Address Spaces An address space is a unit of management of a process’ virtual memory. It is large and can be up to 264 bytes. It consists of one or more regions. A region (Figure 6.3) is an area of contiguous virtual memory and has a following properties: Its context (in lowest address) Read/write/execute permissions for the process’ threads Whether it can be grown upwards or downwards.
Figure 6.3 Address space
Address Spaces This model is page-oriented rather than segment-oriented. This address space model has three general regions: A fixed text region contains code. A heap contains initialized values and dynamically allocated variables and extends to higher address. A stack contains dynamically allocated code (routine) and extends to lower address.
Address Spaces A shared memory region is a region that can be accessed by several processes. The uses of shared regions include the following: Libraries: Library code can be very large and would waste considerable memory. Kernel: Often the kernel code and data are mapped into every address space at the same location. Then there is no need to switch during a system call or an exception. Data sharing and communication: Two processes might need to share data in order to cooperate on some task.
Creating new Processes The process creation in UNIX: fork system call creates a new process with an execution environment inherited from the caller (parent process). exec system call replaces the process. memory space with a new program. wait system call moves the parent process off the ready queue until the termination of the child. Creating a new process has two aspects in a distributed system: Choosing a target host Creating the execution environment
Choosing the Target Host On which node will the process be created? It may depend on the policy used: Transfer Policy: This policy determines where to put the new process - local or remote. Location Policy: Which node should the process be created on. Based on several parameters, there are two types of policies: Static – It is based on a mathematical analysis without regard to the current state of the system. Adaptive – It applies heuristics (unpredictable run-time factors like current load) to make the location decision.
Load-Sharing Systems Load-sharing Systems may be Centralized – There is one centralized load manager. Hierarchical – Load managers are built in a tree structure. Decentralized – Load managers exchange information. Algorithms may also be Sender-Initiated – The node that requests a new process initiates the transfer decision. Receiver-Initiated – The node whose load is low advertise its availability. Transferring a process from one node to another is known as process migration.
Creating New Execution Environments Two methods can determine content of new environment. Statically defined: The address space regions are initialized from an executable file or filled with zeros. Dynamic defined: The address space can be defined based on some type of existing execution environment. In UNIX fork semantics, the newly created child process physically shares the parent’s text region, and has heap and stack that are copies of the parent’s context.
Creating New Execution Environments When based on existing execution environment, an inherited region may be: Shared with the parent. Copied from the parent’s region. Mach and Chorus apply an optimization called copy-on-write when an inherited region is copied from the parent (Figure 6.4). The inherited region is shared between parent and child address spaces. A page is only physically copied when one or other process attempts to modify it. Copy-on-write can be used in copying large message.
Figure 6.4 Copy-on-write a) Before write b) After write Shared frame A's page table B's page Process A’s address space Process B’s address space Kernel RA RB RB copied from RA
Thread vs. Process A thread – lightweight process (LWP) is a basic unit of CPU utilization. It comprises a thread ID, a program counter, a register set, and a stack. A traditional (heavyweight) process has a single thread of control. If the process has multiple threads of control, it can do more than one task at a time.
Single and Multithreaded Processes
Threads concept and implementation Process Thread activations Activation stacks (parameters, local variables) Question: what represents the activation of each thread? Answer: See Figure 6.7 for full details, but essentially: the current instruction pointer and other registers and a pointer to its activation stack. Box on p. 215: An analogy for threads and processes The following memorable, if slightly unsavoury, way to think of the concepts of threads and execution environments was seen on the comp.os.mach USENET group and is due to Chris Lloyd. An execution environment consists of a stoppered jar and the air and food within it. Initially, there is one fly – a thread – in the jar. This fly can produce other flies and kill them, as can its progeny. Any fly can consume any resource (air or food) in the jar. Flies can be programmed to queue up in an orderly manner to consume resources. If they lack this discipline, they might bump into one another within the jar – that is, collide and produce unpredictable results when attempting to consume the same resources in an unconstrained manner. Flies can communicate with (send messages to) flies in other jars, but none may escape from the jar, and no fly from outside may enter it. In this view, a standard UNIX process is a single jar with a single sterile fly within it. Heap (dynamic storage, objects, global variables) 'text' (program code) system-provided resources (sockets, windows, open files) *
Motivation An application typically is implemented as a separate process with several threads of control. For example, a web browser might have one thread display images or text while another thread retrieves data from the network. It is more efficient for a process that contains multiple threads to serve the same purpose. This approach would multithreaded the web-server process.
Figure 6.5 Client and server with threads N threads Input-output Client Thread 2 makes T1 Thread 1 requests to server generates results Requests Receipt & queuing
Benefits Responsiveness: Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user. Resource Sharing: Threads share the memory and the resources of the process to which they belong. Economy: Allocating memory and resources for process creation is costly. Utilization of MP (multiprocessor) Architectures: Each thread may be running in parallel on a different processor.
Threading Architectures Worker Pool Architecture (Figure 6.5) The server creates a fixed pool of worker threads. Benefit: Easy to implement Drawback: It is inflexible. Too few workers High level of switching Thread–Per–Request Architecture (Figure 6.6a) The I/O thread spawns a new work thread for each request. Once it finishes the request, it destroys itself. Benefit: The threads do not contend for a shared queue. Drawback: The overhead of the thread creation and destruction.
Threading Architectures Thread–Per–Connection Architecture (Figure 6.6b) - The I/O thread spawns a new work thread for each connection. Once a client closes the connection, it destroys the thread. Thread–Per–Object Architecture (Figure 6.6c) - The I/O thread spawns a new work thread for each object. Benefit: It has lower overhead. Drawback: Clients may be delayed while a worker thread has several outstanding requests but another thread has no work to do. Add picture to bottom: 6.5
Figure 6.6 Alternative server threading architectures remote workers I/O objects server process remote per-connection threads objects server process remote I/O per-object threads objects server process a. Thread-per-request b. Thread-per-connection c. Thread-per-object Threads implemented by the server-side ORB in CORBA: would be useful for UDP-based service, e.g. Network Time Protocol (NTP) is the most commonly used - matches the TCP connection model is used where the service is encapsulated as an object. E.g. could have multiple shared whiteboards with one thread each. Each object has only one thread, avoiding the need for thread synchronization within objects.
Threads within Clients Threads can be useful for clients as well as servers. Figure 6.5 shows a client process with two threads. The first thread generates and puts values to be passed to a server in a buffer. The second thread reads the value from the buffer and performs the remote invocations Multi-threaded clients are used in Web browsers. Users experience delays while the browser handles multiple concurrent requests during the page fetching. Add picture to bottom: 6.5
Threads vs. Multiple Processes Why are threads better than multiple processes? Threads are cheaper to create. Resource sharing can be achieved more efficiently. Figure 6.7 shows some of the main state components that must be maintained for execution environments and threads. A software interrupt is an event that causes a thread to be interrupted and the control is transferred to the event handler.
Figure 6.7 State associated with execution environments and threads Address space tables Saved processor registers Communication interfaces, open files Priority and execution state (such as BLOCKED ) Semaphores, other synchronization objects Software interrupt handling information List of thread identifiers Execution environment identifier Pages of address space resident in memory; hardware cache entries
Threads vs. Multiple Processes Summary: Threads are cheaper to create. Resource sharing can be achieved more efficiently. Context Switching to a different thread within the same process is cheaper than context switching between threads belonging to different processes. Threads may share data more conveniently and efficiently. However, threads within the same process are not protected from each other.
Thread Programming Threads is concurrent programming in the field of operating systems. Thread programming involves concepts of race condition, critical section, monitor, conditional variable, semaphore. Threads programming can be done: With a threads library Mach operating system IEEE POSIX 1003.4a – pthread library In programming language Ada95 Modula-3 Java
Java Thread Programming Java provides methods for creating threads, destroying them and synchronizing them. The Java Thread class includes the constructor and management methods listed in Figure 6.8. The Thread and Object synchronization methods are in Figure 6.9. Thread lifetimes: new – The thread is created and in the suspended state. start / run – The thread is made in the runnable state. The method of the thread is running. destroy – The thread is terminated.
Figure 6.8 Java thread constructor and management methods Thread(ThreadGroup group, Runnable target, String name) Creates a new thread in the SUSPENDED state, which will belong to group and be identified as name; the thread will execute the run() method of target. setPriority(int newPriority), getPriority() Set and return the thread’s priority. run() A thread executes the run() method of its target object, if it has one, and otherwise its own run() method (Thread implements Runnable). start() Change the state of the thread from SUSPENDED to RUNNABLE. sleep(int millisecs) Cause the thread to enter the SUSPENDED state for the specified time. yield() Enter the READY state and invoke the scheduler. destroy() Destroy the thread.
Figure 6.9 Java thread synchronization calls thread.join(int millisecs) Blocks the calling thread for up to the specified time until thread has terminated. thread.interrupt() Interrupts thread: causes it to return from a blocking method call such as sleep(). object.wait(long millisecs, int nanosecs) Blocks the calling thread until a call made to notify() or notifyAll() on object wakes the thread, or the thread is interrupted, or the specified time has elapsed. object.notify(), object.notifyAll() Wakes, respectively, one or all of any threads that have called wait() on object.
Java Thread Programming Programs can manage threads in groups. It is useful when several applications coexist on the same JVM. In the example of security, one group is not allowed to access the methods in other group. Thread groups facilitate control of the relative priorities of threads. This is useful for browsers running applets, and for servlets. A servlet is a server running program which creates dynamic Web pages. A thread within an applet or servlet can only create a new thread within its own group.
Thread Synchronization Programming a multi-threaded process has the following difficulties: The sharing of objects The techniques used for thread coordination and cooperation. The race condition can happen when threads manipulate shared data. This can be prevented by the thread synchronization. Java provides the synchronized keyword for thread coordination in a monitor control. The monitor guarantee that at most one thread can execute within it at any time. We could serialize the actions by designating a class as a synchronized one.
Thread Synchronization Programming a multi-threaded process has the following difficulties: The sharing of objects The techniques used for thread coordination and cooperation. The race condition can happen when threads manipulate shared data. This can be prevented by the thread synchronization.
Thread Synchronization Java provides the synchronized keyword for thread coordination in a monitor control. The monitor guarantee that at most one thread can execute within it at any time. We could serialize the actions by designating a class as a synchronized one. Java allows threads to be blocked and woken up via arbitrary objects that act as condition variables. A thread that needs to block to wait for an event uses the wait method. Another thread uses the notify method to unblock it.
Java Thread Programming Thread Scheduling Preemptive scheduling – A thread may be suspended by other thread. Non-preemptive (co-routine) scheduling – A thread can only run when requesting the threading system to schedule it. Benefit: Any section of code that dose not contain a call to the threading system is automatically a critical section. Drawback: Multiprocessors can not be utilized. The yield method can allow other thread to be scheduled. Java by default does not provide a real-time scheduling but real-time implementations exist.
Thread Implementations Kernel-level threading – Windows 2000/XP, Solaris, Mach and Chorus User-level threading – POSIX A user-level threads has the following drawbacks: The threads cannot take advantage of a multiprocessor. A thread that takes a page fault blocks the entire process. Threads within different processes cannot be scheduled according to a single scheme of relative prioritization.
Thread Implementations A user-level threads has the following advantages: Thread creations are less costly. The thread-scheduling can be customized or changed to suit specific applications. Many more user-level threads can be supported than the kernel could provide. Hybrid approaches can gain some advantages of both user-level hints to kernel scheduler - Mach hierarchic threads - Solaris 2 event-based - SPIN, FastThreads (Figure 6.10)
Solaris 2 Threads Solaris 2 is a version of UNIX with support for threads at the kernel and user levels, SMP, and real-time scheduling. Solaris 2 implements the Pthread API and UI threads. Between user- and kernel-level threads are ligthweight processes (LWPs). Threads in a process multiplex to connect an LWP. An LWP corresponds a kernel thread. A (un)bound user-level thread is (not) permanently attached to an LWP.
Solaris 2 Threads
Solaris Process
Figure 6.10 Scheduler activations
Communication and Invocation The following design issues are concerned: Communication primitives Protocols and openness Measures to make communication efficient Support for high-latency and disconnected operation Communication primitives found in some research kernels. For examples, Amoeba: doOperation getRequest sendReply
Communication and Invocation Middleware provides most high-level communication facilities including: RPC/RMI Even notification Group communication Middleware is developed over sockets that are found in all common operating systems. The principal reasons for using sockets are portability and interoperability.
Communication and Invocation An operating system should provide standard protocols that enable internetworking between middleware implementations on different platforms. Protocols are normally arranged in a stack of layers. Most operating systems integrate the a layer of protocol statically. Dynamic protocol composition is a technique whereby a protocol stack can be composed on the fly to meet the requirements of a particular application.
Support for Communication and Invocation The performance of RPC and RMI mechanisms is critical for effective distributed systems. Figure 6.11 shows a case of a system call and a remote invocation. Typical times for null procedure call (which could measure a fixed overhead, the latency). Local procedure call is less than 1 microseconds. Remote procedure call is about 10 milliseconds. The time for a null procedure call includes: The network time involving about 100 bytes transferred, at 100 megabits/sec accounts for only 0.01 millisecond. The remaining delays must be in OS and middleware latency, and not communication time.
Figure 6.11 Invocations between Address Spaces
Support for Communication and Invocation Figure 6.12 shows client delay against requested data size. The delay is roughly proportional to the size until the size reaches the threshold. Factors affecting RPC/RMI performance marshalling/unmarshalling data copying - from application to kernel space, across protocol layers, to communication buffers thread scheduling and context switching - including kernel entry protocol processing - for each protocol layer network access delays - connection setup and network latency
Figure 6.12 RPC Delay against Parameter Size
Implementation of invocation mechanisms Shared memory may be used for rapid communication between a user process and the kernel, or between user processes. Most invocation middleware (CORBA, Java RMI, HTTP) is implemented over TCP. The TCP is chosen because of universal availability and unlimited message size and reliable transfer. Sun RPC (used in NFS) is implemented over both UDP and TCP and generally works faster over UDP
Implementation of Invocation Mechanisms Research-based systems have implemented much more efficient invocation protocols, E.g. Firefly RPC (see www.cdk3.net/oss) Amoeba's doOperation, getRequest, sendReply primitives (www.cdk3.net/oss) Light-Weight RPC [Bershad et. al. 1990], described on pp. 237-9 as shown in Figure 6.13.
Bershad's LRPC It uses shared memory for interprocess communication. while maintaining protection of the two processes arguments copied only once (versus four times for conventional RPC) Client threads can execute server code via protected entry points only (uses capabilities) Up to 3 x faster for local invocations
Figure 6.13 A lightweight remote procedure call
Implementation of Invocation Mechanisms High latency is common in a wireless environment. The technique to defeat high latencies is asynchronous operation including concurrent and asynchronous invocations. Concurrent invocations: The middleware provides only blocking invocations, but the applications spawns multiple threads to perform blocking invocations concurrently. A Web browser is an example. The browser fetch images concurrently. Figure 6.14 shows the potential benefits of interleaving invocations between a client and a single server.
Figure 6.14 Times for serialized and concurrent invocations
Implementation of Invocation Mechanisms Asynchronous invocations: It is performed asynchronously with respect to the caller. Middleware or application does not block waiting for reply to each invocation. Persistent synchronous invocations: It tries indefinitely to perform the invocation until it is known to have succeeded or failed, or until the application cancels the invocation. An example is QRPC (Queued RPC). It queues outgoing invocation requests in a stable log while there is no network connection and schedules their dispatch over the network to servers when there is a connection.
Operating System Architecture An open distributed system should make it possible to (plug and play modules): Run only the necessary software module at each node. Allow the software module to be changed without effects on other facilities. Allow the similar software module to be used. Introduce new services without harming the integrity of existing ones.
Operating System Architecture The separation of fixed resource management mechanisms from resource management policies has been a guiding principle in operating system design for a long time: The kernel would provide the most basic mechanisms upon which the general resource management tasks at a node are carried out. Server modules would be dynamically loaded as required. There are two key examples of kernel design: monolithic and microkernel approaches.
Operating System Architecture The UNIX operating system kernel has been called monolithic. This term suggests all basic operating system functions are coded in a non-modular way. Microkernel design: The kernel provides only the basic address spaces, threads, and local interprocess communication. Other services are dynamically loaded. Clients access these system services using the kernel’s message-based invocation mechanisms (Figure 6.15). The place of the microkernel is shown in Figure 6.16.
Figure 6.15 Monolithic Kernel and Microkernel
Figure 6.16 The Role of the Microkernel
Advantages and Disadvantages of Microkernel Flexibility and Extensibility Services can be added, modified and debugged. Small kernel has fewer bugs. Protection of services and resources is still maintained. Service invocation expensive unless LRPC is used. extra system calls by services for access to protected resources. Different approaches and improvements have been made in the microkenel design.
Summary The OS provides local support for the implementation of distributed applications and middleware: Manages and protects system resources (memory, processing, communication) Provides relevant local abstractions: files, processes threads, communication ports
Summary Middleware provides general-purpose distributed abstractions RPC, DSM (Distributed Shared Memory), event notification, streaming Invocation performance is important It can be optimized. For example, Firefly RPC, LRPC Microkernel architecture for flexibility The KISS principle ('Keep it simple – stupid!') has resulted in migration of many functions out of the OS