Load Balancing The author of these slides is Dr. Arun Sood of George Mason University. Students registered in Computer Science networking courses at GMU.

Slides:

Advertisements

Similar presentations

Performance Testing - Kanwalpreet Singh.

Advertisements

Ch 11 Distributed Scheduling –Resource management component of a system which moves jobs around the processors to balance load and maximize overall performance.

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

Hadi Goudarzi and Massoud Pedram

CPU Scheduling Tanenbaum Ch 2.4 Silberchatz and Galvin Ch 5.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.

LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.

Distributed Logging in Java with Constrained Resource Usage Sunil Brown Varghese, Daniel Andresen Dept. of Computing and Information Sciences Kansas State.

G. Alonso, D. Kossmann Systems Group

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

A SYSTEM PERFORMANCE MODEL CSC 8320 Advanced Operating Systems Georgia State University Yuan Long.

Agent Caching in APHIDS CPSC 527 Computer Communication Protocols Project Presentation Presented By: Jake Wires and Abhishek Gupta.

The ATHA Environment: Experience with a User Friendly Environment for Opportunistic Computing M.A.R.Dantas Department of Informatics (INE) University of.

Comparing Systems Using Sample Data

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.

DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.

Distributed Process Implementation Hima Mandava. OUTLINE Logical Model Of Local And Remote Processes Application scenarios Remote Service Remote Execution.

Distributed Process Implementation

Models of Influence in Online Social Networks

Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.

Load distribution in distributed systems

Challenges of Process Allocation in Distributed System Presentation 1 Group A4: Syeda Taib, Sean Hudson, Manasi Kapadia.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.

1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Load Balancing in Distributed Computing Systems Using Fuzzy Expert Systems Author Dept. Comput. Eng., Alexandria Inst. of Technol. Content Type Conferences.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Network Aware Resource Allocation in Distributed Clouds.

Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.

1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.

CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Distributed Real-Time Systems.

Static Process Scheduling

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)

C HAPTER 5.4 DISTRIBUTED PROCESS IMPLEMENTAION By: Nabina Pradhan 10/09/2013.

OPERATING SYSTEMS CS 3502 Fall 2017

OPERATING SYSTEMS CS 3502 Fall 2017

Introduction to Load Balancing:

Software Architecture in Practice

Conception of parallel algorithms

OPERATING SYSTEMS CS3502 Fall 2017

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Parallel Algorithm Design

Scheduling Jobs Across Geo-distributed Datacenters

PA an Coordinated Memory Caching for Parallel Jobs

Parallel Programming in C with MPI and OpenMP

Statistical Thinking and Applications

Parallel Programming in C with MPI and OpenMP

Presentation transcript:

Load Balancing The author of these slides is Dr. Arun Sood of George Mason University. Students registered in Computer Science networking courses at GMU may make a single machine-readable copy and print a single copy of each slide for their own reference, so long as each slide contains the copyright statement, and GMU facilities are not used to produce paper copies. Permission for any other use, either in machine-readable or printed form, must be obtained from the author in writing.

References K.P.Chow and Y.K.Kwok, “On Load Balancing for Distributed Multiagent Computing,” IEEE TPDS-13, NO 8, August 2002

Issues and motivation Multi agent systems must be scalable systems Some agents are persistent, while others are not There is a potential for many agents, distributed across several processors –Leads to load imbalance –Agent autonomy contributes to this imbalance Highlights the need for a load balancing service – another set of agents???

Load balancing approaches Static load distribution policies Dynamic load distribution How to measure the goodness of load balancing? –Variance of the load distribution –Processor and memory utilization

Typical Issues Addressed in Load Balancing Single processor, multiple processors, cluster vs widely distributed Jobs are submitted independently by users – unpredictable delivery time Job characteristics are unknown at time of submission – run time, memory requirement, communications requirement Distributing a single task vs several independent tasks Task persistence – short life, long life, persistent Task mobility – what state info to carry; where to locate the task, which task to move

Special Considerations in Distributed Multiagent Systems Agents life is highly distributed (problem dependent) –In AIGA some are deployed for specific tasks, and others are persistent –In E-Commerce agents stay alive for the duration. These agents are all launched at the same time – at start up Agent communication can be high, but the pattern changes – no single static good solution to the load balancing problem

Graphical representation – how to assign tasks? Agent Interaction Computer Network

Credit Based Load Balancing Model Ref [1] focus on Selection and Location policies –Selection policy: which task to move –Location policy: where is the task executed Choose policies such that overall performance improves –Cost of the move –New distribution yields better load balancing than the old distribution

CBLB Model Assign a credit to every agent –Higher the credit means lower the probability of migration when load balancing migration is planned –Credit is based on agent system loads and interagent communication, e.g. if there is large interaction between two agents on the same processor that it is unlikely that either will move This approach excludes the secondary possibility – move both the agents to another processor.

Factors Influencing Agent Credit Value Increase –Agent workload is decreasing –High communication load with other agents on the same processor –Facilities on the processor are required by agent Special I/O, special hardware, etc Decrease –Workload increasing further increase because of recent events/messages (agent behavior assumptions  ) –Extensive communication with agents on other processors –High mobility – not dependent on local resources

Heuristics Used When agent workload increases, then it is likely to continue at a higher level for some time. Hence, such an agent is a good candidate for relocation. Interagent communication used to assign location – move agent to the agent with which it communicates most.

Load Balancing Issues It is not essential to maintain balance at all times – short periods of unbalance can be tolerated Finding and loading under utilized processor has to be balanced against state dissemination cost

Comet Algorithm: Key Assumptions p machines in the cluster – all agents can be implemented on each processor. Undirected graph is used to model workload (agents are persistent). Level and type of inter agent communication is query or task dependent – hence the graph weights are not specified a priori. Inter agent communication pattern is known.

Figure 1 (a): Shows the variation in load variation – compute and communication. (b): Typical agents processing structure

Load model In agent systems computations maybe of the same order (or even smaller) than the communication load. Load on machine k is L k =  w i + u i ) M(a i )=k M(a i )= Machine with agent a i w i = compute load of a i in clock cycles u i = comm load of a i in clock cycles = intra-machine(h) and inter-machine(g) comm load = h i + g i =  c  a i, a j )  0.5*f *  c  a i, a j ) M(a i )= M(a j ) M(a i )<> M(a j ) c  a i, a j ) = comm cost between agents – dependent of message size estimation

Comet Algorithm components Information policy –Each machine compares load to a specified threshold. Central host decides if there is need to migrate agents. Selection policy –Credit for each agent on a machine is computed C i = – x 1 w i + x 2 h i – x 3 g i, where x i s are constants. 1.Intra - machine comm (h i ) increases credit (keep agent on machine). 2.Inter - machine comm (g i ) decreases credit (move agent). 3.Agent will smallest credit is candidate for move. 4.All agents are equally likely to move. To show machine preference, add a constant to the credit Location policy –Determine the target machine –Each agent maintains a p vector – comm between agent and the processors. Largest element indicates the target machine. (Typical load balancing alg use processor load as the location determinant. IGS vs E-commerce agents.)

Comet Algorithm – Location policy (contd) –Using the lowest load processor as the receiver reduces trashing. Note that each move reduces the overall load, variance in the load, and reduces the average utilization. –Figure 3 illustrates this: Ai is selected for migration from mk to ml Load on mk reduces by say , but load on ml increases by less than  – comm load has reduced.

Figure 3 notes pg 791

System Overview Agent Message Router(AMR). Each agent registers name and location. All (complete) messages are routed through AMR, from sender to receiver. What if Recv agent on the same processor as Send agent? Significant overhead. Compare it to Agent Name Server (ANS). Sender queries ANS for translation of logical address to the physical address. Makes a second connection to the physical address and sends the message. AMR vs ANS: Impact of message size.

System Overview contd Central Host. –Startup, suspension, termination, query of current system, and agent distribution –Assess need for migration –Location policy –Agent communication –Figure 4 Compute host: Comm + Work agents –Work agents – all perform the same task –Interaction in Figure 5

System Overview contd Load Info Distribution –Each work agent computes credit –Comm agent gathers work-agent credits –Central Agent gathers credit info Central Agent makes load balancing decisions Migration –Central agent selects the agent to be migrated and the from and to hosts –Migration only requires state info transfer – same Work Agents on all processors

Workload Synthetic (trace workloads not available) –Artificial work load – loops with numeric calculations, use random number to compute the load –Random number generator to compute comm requirements Realistic implementation but artificial load Workload parameters for simulatin –Agent Computation Load: random –Message Size: random –Intermessage Duration: 10 s –Computation-Communication Correlation: Same random number for Workload and Message size – larger workload agent requires bigger messages

Experimental Parameters Credit coefficients –Usually the 3 parameters are set to 1 Number of agents –Main memory constrained – one Java VM for each agent (12 agents for 128 MB machine) Number of hosts –Homogeneous or heterogeneous Period of load balancing decision –Smaller yields better balance. More overhead. (60 s) Communication pattern –Pair ratio to define inter agent comm (1/4 to 1/8) –Lower ratio for more agents, higher ratio for fewer agents

Measured Variables Workload Data Normalization –Account for different background and kernel workloads, normalize w.r.t. workload –Normalized Standard Deviation SD divided by avg workload –Average Normalized Standard Deviation (ANSD) Averaged over the runs conducted Performance Metric –Typical load balancing objective is overall execution time of a set of jobs For persistent agent env, this measure is meaningless –Execution time of a query Need a standard (typical) query – difficult to find universal acceptance –Workload distribution: Use ANSD