Download presentation
Presentation is loading. Please wait.
1
1 Distributed Scheduling In Sombrero, A Single Address Space Distributed Operating System Milind Patil
2
2 Contents Distributed Scheduling Features of Sombrero Goals Related Work Platform for Distributed Scheduling Distributed Scheduling Algorithm (Simulation) Scaling of the Algorithm (Simulation) Initiation of Porting to Sombrero Prototype Testing Conclusion Future Work
3
3 Distributed Scheduling A distributed scheduling algorithm provides for sharing as well as better usage of resources across the system. The algorithm will allow threads in the distributed system to be scheduled among the different processors in such a manner that CPU usage is balanced.
4
4 Features of Sombrero Distributed scheduling in Sombrero takes advantage of the distributed SASOS features: The shared memory inherent to a distributed SASOS provides an excellent mechanism to distribute load information of the nodes in the system (information policy). The ability of threads to migrate in a simple manner across machines has a potentially far-reaching affect on the performance of the distributed scheduling mechanism.
5
5 Features of Sombrero (contd.) The granularity of migration is a thread not a process. This allows the distributed scheduling algorithm to have a flexible selection policy (determines which thread is to be transferred to achieve load balancing). This feature also reduces the software complexity of the algorithm.
6
6 Goals Platform for Distributed Scheduling Simulation of Distributed Scheduling Algorithm Scaling of the Algorithm (Simulation) Initiation of Porting to Sombrero Prototype
7
7 Related Work Load-Balancing Algorithms for Sprite Sprite PVM PVM Condor Condor UNIX UNIX
8
8 Requirements A working prototype of Sombrero is needed that has the ability to manage extremely large data sets across a network in a distributed single address space. A functional prototype is needed which implements essential features such as protections domains, Sombrero thread support, token tracking support, etc. The prototype is under construction and not available as development platform. Windows NT is used since the prototype is being developed on it.
9
9 Sombrero Node Load Table Sombrero Node Local Thread Information Selection Policy Communication Thread Distributed Scheduler Sombrero Node Local Thread Information Selection Policy Communication Thread Distributed Scheduler Thread Migration Architecture of Sombrero Nodes
10
10 Sombrero Clusters RMOCB 0x5000 RMOCB 0x7000 RMOCB 0x6000 RMOCB 0x1000 RMOCB 0x2000 Router 0x1 Router 0x11 A B B B A Cluster I RMOCB 0x3000 RMOCB 0x4000 Cluster II Cluster III Load Table 0x1000 Load Table 0x5000 Load Table 0x2000 The Sombrero system is organized into hierarchies of clusters for scalable distributed scheduling.
11
11 Sombrero Router Architecture of Sombrero Routers I/O Completion Port Service Threads Socket
12
12 Inter-node Communication Sombrero nodes communicate with each other through the routers. RMOCB 0x1000 RMOCB 0x2000 RMOCB 0x3000 Router 0x1 Router 0x11 A B B B A Cluster ICluster II
13
13 Router Tables Router 0x1 CD R3 AB R1 A : B : R3:
14
14 Router Tables(contd.) Router 0x3 C : D : R1: CD R3 AB R1
15
15 Address Space Allocation This project implements an address space allocation mechanism to distribute the 2 64 bytes address space amongst the nodes in the system. Example:- Consider a system of four Sombrero nodes (A, B, C and D). The nodes come online for the very first time in the order - A, B, C and D. CD R3 AB R1
16
16 The address space allocated for the nodes when A is initialized will be: A: 0x0000000000000000 – 0xfffffffffffffff The address space allocated for the nodes when B is initialized will be: A: 0x0000000000000000 – 0x7fffffffffffffff B: 0x8000000000000000 – 0xffffffffffffffff Address Space Allocation(contd.)
17
17 The address space allocated for the nodes when C is initialized will be: A: 0x0000000000000000 – 0x3fffffffffffffff B: 0x8000000000000000 – 0xffffffffffffffff C: 0x4000000000000000 – 0x7fffffffffffffff The address space allocated for the nodes when D is initialized will be: A: 0x0000000000000000 – 0x3fffffffffffffff B: 0x8000000000000000 – 0xffffffffffffffff C: 0x4000000000000000 – 0x5fffffffffffffff D: 0x6000000000000000 – 0x7fffffffffffffff Address Space Allocation(contd.)
18
18 Load Measurement A node’s workload can be estimated based on some measurable parameters: Total number of threads on the node at the time of load measurement. Instruction mixes of these threads (I/O bound or CPU bound).
19
19 Load Measurement (contd.) p processor utilization of a thread f heuristic factor (adjusts the importance of thread depending on how it is being used) The heuristic factor ‘f’ should have a large value for I/O intensive threads and a small value for CPU intensive threads. The values of the heuristic factor can be empirically determined by using a fully functional Sombrero prototype. Work Load = i (p i f i )
20
20 Load Measurement - Simulation In the simulation we assume that the processor utilization of all threads is the same: This is sufficient to prove the correctness of the algorithm The measure of load at the node level is the number of Sombrero threads. A threshold policy has been defined: high--number of Sombrero threads HIGHLOAD low--number of Sombrero threads < MEDIUMLOAD medium--number of Sombrero threads < HIGHLOAD and number of Sombrero threads MEDIUMLOAD
21
21 Load Tables Shared memory is used to distribute load information. (In Sombrero the shared memory consistency is managed by the token tracking mechanism) One load table is needed for each cluster. Thresholds of load have been established to minimize the exchange of load information in the network. Only threshold crossings are recorded in the load table.
22
22 Distributed Scheduling Algorithm Highly loaded nodes in minority Sender Initiated Algorithm Lightly loaded nodes in minority Receiver Initiated Algorithm Highly loaded nodes Lightly loaded nodes Medium loaded nodes are not considered
23
23 Distributed Scheduling Algorithm The algorithm used is dynamic i.e. sender initiated at lower loads and receiver initiated at higher loads. 1. Nodes loaded in the medium range do not participate in load balancing. 2. The load balancing is not to be done if the node belongs to the majority (larger of the groups of highly or lightly loaded nodes).
24
24 Distributed Scheduling Algorithm 3. Load balancing is to be done if node belongs to the minority (smaller of the groups of highly or lightly loaded nodes). The node is heavily loaded and the algorithm is sender initiated:- choose a lightly loaded node at random and the RGETTHREADS message protocol is followed for thread migration. The node is lightly loaded and the algorithm is receiver initiated:- choose a highly loaded node at random and the GETTHREADS message protocol is followed for thread migration.
25
25 Scaling the Algorithm Aggregating the clusters provides scalability. Thresholds for clusters are defined as given: high: - no cluster members are lightly loaded and at least one member is highly loaded low: - no cluster members are highly loaded and at least one member is lightly loaded medium: - all other cases of loads where load balancing can occur within the cluster members or when all members of the cluster are medium loaded
26
26 Scaling the Algorithm 1. At any level of cluster only the nodes belonging to the minority group at that level will be active. 2. Load balancing at an n th level cluster will be attempted every (n SOMECONSTANT) times the number of unsuccessful attempts at the node level. 3. A suitable n th level target cluster is found through the corresponding load table and the TRANSFERREQUEST message protocol is followed for thread migration. …... n=1 n=2 n=3
27
27 Testing Eight Nodes Cluster: [# of highly loaded nodes, # of medium loaded nodes, # of lightly loaded nodes]
28
28 Testing Three Clusters …... n=1 n=2
29
29 Testing Six Clusters at Two Levels n=1 n=2 n=3 ………………
30
30 Conclusion The testing of distributed scheduling using the simulator verifies that the algorithm functions correctly. It is observed that the increase in number of messages is proportional to the increase in number of heavily loaded nodes. The number of messages required for load balancing at the first level and above is the same if the ratio of heavily and lightly loaded nodes is kept constant at both levels.
31
31 Conclusion (contd.) Only one additional load table is required per additional cluster. Hence, the required number of messages is expected to increase by a small constant factor as the level of clustering increases. It can be concluded that the algorithm’s complexity is O(n) where n is the number of highly loaded nodes.
32
32 Future Work Porting of code from NT to Sombrero for the Sombrero node - communication code. Changing definition of load measurement to the more general formula. Reuse code from the Sombrero router. Adaptive cluster forming algorithm.
33
33 Acknowledgements Dr. Donald Miller Dr. Rida Bazzi Dr. Bruce Millard Mr. Alan Skousen Mr. Raghavendra Hebbalalu Mr. Ravikanth Nasika Mr. Tom Boyd
34
34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.