Auburn University http://www.eng.auburn.edu/~xqin COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin xqin@auburn.edu Spring, 2012
Current Solutions Disk I/O Systems Limitation Caching Prefetching Parallel I/O Limitation Low level Not Portable
Current Solutions (Cont.) Scheduling/Load balancing Space-sharing (PBS,Backfilling) Time-Sharing Centralized Control (PBS) Distributed Control Coordinated Scheduling (Gang) Non-I/O-aware (Condor, Mosix, DQS, LSF) Disk-I/O-aware Network-I/O-aware load balancing Support Sequential Jobs Support Parallel Jobs Disk-I/O Buffer Management Support Homogeneous Clusters Support Heterogeneous Clusters
System Architecture I/O-intensive jobs Client Services Load Manager Load Manager Load Manager t1 t2 t3 t3 t4 t5 t6 t7 mem disk mem disk mem disk Workstation 1 Workstation 2 Workstation n High Bandwidth network
Methodology I/O Intensive Applications User Specified Access Pattern Data Storage Pattern Measure I/O load Predict Response Time Estimate Overhead Make Decisions Load Balancing Schemes Dispatch and Migration
Outline Motivations A Disk-I/O-Aware Load Balancing Policy with Remote Execution A Disk-I/O-Aware Load Balancing Policy with Preemptive Migration Evaluation of the two Disk-I/O-Aware Policies Load Balancing for Heterogeneous Clusters Contributions and Conclusions
Load Balancing with Remote Execution A newly arrived job Remote Execution Local Execution Node j Node i Running jobs Running jobs High Bandwidth Network
The IOCM-RE Scheme A new parallel job Yes. Find candidate remote nodes Select candidate nodes, balance I/O load I/O overloaded ? no no yes Select candidate nodes, balance memory load mem overloaded ? no no yes Select candidate nodes, balance CPU load CPU overloaded ? no no Remotely Execute Locally execute
Explicit I/O load Explicit I/O load = I/O access rate (1 - buffer hit rate) Applications Applications data data Probability that data is NOT in the I/O buffer of node i Disk I/O Buffer Disk I/O Buffer or data Data is NOT in the buffer Data is in the buffer
Implicit I/O load Given a task s running on node i: Implicit I/O load induced by page faults Given a task s running on node i: Memory space requested by the running tasks Available user memory space if otherwise Page fault rate of task s
Overall I/O load Implicit I/O load induced by page faults Explicit I/O load resulting from tasks accessing disks. I/O load index of node i Implicit I/O load of task s running on node i Explicit I/O requirement of task s