Ning Li Jordan Parker Scalable Cluster Resource Management 1 Cluster Resource Management: Scalable Approaches Ning Li Jordan Parker Mid-semester Status.

Slides:

Advertisements

Similar presentations

LOTTERY SCHEDULING: FLEXIBLE PROPORTIONAL-SHARE RESOURCE MANAGEMENT

Advertisements

Hadi Goudarzi and Massoud Pedram

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Imbalanced data David Kauchak CS 451 – Fall 2013.

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

Analyzing the tradeoffs between breakup and cloning in the context of organizational self-design By Sachin Kamboj.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Aki Hecht Seminar in Databases (236826) January 2009

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.

Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Ant Colonies As Logistic Processes Optimizers

Chapter 3: Data Storage and Access Methods

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

Receiver-Driven Bandwidth Sharing for TCP and its Application to Video Streaming Puneet Mehra, Christophe De Vleeschouwer, and Avideh Zakhor IEEE Transactions.

Focus on Distributed Hash Tables Distributed hash tables (DHT) provide resource locating and routing in peer-to-peer networks –But, more than object locating.

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.

Introduction to Systems Analysis and Design Trisha Cummings.

Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.

Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Network Aware Resource Allocation in Distributed Clouds.

Key Words: File systems, Steganography, Encrypted Communications, RAID, Information Hiding, Intelligence, Instagram, flickr Original can be found at:

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah

Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Introduction to Algorithms Chapter 16: Greedy Algorithms.

Serverless Network File Systems Overview by Joseph Thompson.

The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.

240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.

Sofia, Bulgaria | 9-10 October The Query Governor Richard Campbell Stephen Forte Richard Campbell Stephen Forte.

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.

Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.

Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.

Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.

CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.

Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.

Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.

Week#3 Software Quality Engineering.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Cluster-Based Scalable

Data Center Network Architectures

PROTEAN: A Scalable Architecture for Active Networks

Load Weighting and Priority

Replication Middleware for Cloud Based Storage Service

Cluster Resource Management: A Scalable Approach

Systems Issues for Scalable, Fault Tolerant Internet Services

Indexing and Hashing Basic Concepts Ordered Indices

Introduction To Distributed Systems

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Ning Li Jordan Parker Scalable Cluster Resource Management 1 Cluster Resource Management: Scalable Approaches Ning Li Jordan Parker Mid-semester Status Report CS 736 – Fall 2000

Ning Li Jordan Parker Scalable Cluster Resource Management 2 Why Study Cluster Resource Management? Clusters have become increasingly popular for large parallel computing. –Web Servers Clusters are becoming increasingly large to the order of thousands of nodes. Clusters are providing multiple services.

Ning Li Jordan Parker Scalable Cluster Resource Management 3 Multiple Services: Example An Internet Service Provider is hosting many different websites for clients –How do you schedule according to the amount of bandwidth a client is paying for? Proportional Share Cluster Reserves Our technique more scalable.

Ning Li Jordan Parker Scalable Cluster Resource Management 4 Overview Introduction / Reason for Research Related Work Infrastructure Evaluation

Ning Li Jordan Parker Scalable Cluster Resource Management 5 Related Work Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring, Scheduling with Implicit Information in Distributed Systems, Sigmetrics'98 Conference on the Measurement and Modeling of Computer Systems Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Cluster-Based Scalable Network Services, Proc Symposium on Operating Systems Principles (SOSP-16), St-Malo, France, Oct M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: A mechanism for resource management in cluster-based network servers. In Proceedings of ACM SIGMETRICS 2000, June Waldspurger, C.A. and Weihl, W.E., Lottery Scheduling: Flexible Proportional-Share Resource Mangement, Proceedings of the First Symposium on Operating Systems Design and Implementation, Monterey CA, November 1994, pp NS – Network Simulator Manual,

Ning Li Jordan Parker Scalable Cluster Resource Management 6 What make us different? Goal: to provide a scalable solution for resource management. Other papers focused primarily on just having good management –This often meant 1 manager for all the nodes. –Clearly this could present a scalable bottleneck Effectiveness: Other solutions probably better for smaller clusters, we hope to be better for large (>1000 nodes) clusters.

Ning Li Jordan Parker Scalable Cluster Resource Management 7 The Management Scheme Cluster Reserves with multiple managers –Mainly a comparison A new Lottery like algorithm (Banks) A hierarchal management network

Ning Li Jordan Parker Scalable Cluster Resource Management 8 Infrastructure The Hierachal Algorithms Use NS to simulate our algorithms

Ning Li Jordan Parker Scalable Cluster Resource Management 9 Hierarchal View

Ning Li Jordan Parker Scalable Cluster Resource Management 10 A Problem and a Solution Problem: not scalable Solution: Hierarchy! + Fault Tolerance (a nice little example, perhaps with 2 level managers)

Ning Li Jordan Parker Scalable Cluster Resource Management 11 Approach 1: modify "Cluster Reserves" optimization algorithm –use it when manager manages nodes –AND when level_n+1 manager manages level_n managers.

Ning Li Jordan Parker Scalable Cluster Resource Management 12 Approach 2: introduce bank account mechanism –use bank algorithm for manager managing nodes –use transfer strategy for level_n+1 manager managing level_n managers

Ning Li Jordan Parker Scalable Cluster Resource Management 13 Problem Specification: N: # of nodes in a cluster S: # of service classes T: a vector of N elements, T_i: resource (# of tickets) on node I T_total: total resource in cluster (not in "cluster" paper) r and u: NxS matrices, r_ij and u_ij: the percentage resource allocation and resource usage, respectively, at node i for service class j. D: a vector of S elements, D_j: the desired percentage resource allocation for service class j over the cluster. Input: r and u and the vector T and D Output: a NxS matrix R, R_ij: the new percentage resource allocation for service class j on node i.

Ning Li Jordan Parker Scalable Cluster Resource Management 14 Solution Step 1: Compute the least feasible deviation between desired and actual allocations. S | N | Minimize sum|sum R_ij*T_i - T_total*D_j| (1) j=l|i=1 | Resource allocations on any cluster-node should sum to no more than 100. S for any i in 1..N, sum R_ij <= 100 j=1 On any node, new allocation should be no more than the usage if the node is not a resource sink, i.e. if previous allocation exceeds the usage. for any i,j R_ij u_ij

Ning Li Jordan Parker Scalable Cluster Resource Management 15 Solution Step 2: Compute the new resource allocations s.t. 1) the deviation computed in the first step is achieved, and 2) the computed resource allocations are close to the ideal allocation (D) (different from paper, to see which is better) N S Minimize sum sum(R_ij - D_j)^2 (2) i=l j=1

Ning Li Jordan Parker Scalable Cluster Resource Management 16 A New Idea/Addition Distribute unassigned cluster resource to service classes who need it Since manager has the knowledge of when and how much resource a service class contributed before, it can give appropriate priorities to those classes when assigning unused resource.

Ning Li Jordan Parker Scalable Cluster Resource Management 17 Approach 2: Bank Account Mechanism Each manager has a bank. Each bank has an account for each service class. In the account is the # of tickets saved and when they are deposited. Depositing, drawing, and transferring tickets together are used to achieve both performance isolation and resource utilization.

Ning Li Jordan Parker Scalable Cluster Resource Management 18 Bank Algorithm: part 1 Checking each service class j on each node i: compare previous ticket usage u_ij, allocation r_ij and desired allocation D_j 1 u_ij D_j: R_ij = min(u_ij,D_j) deposit D_j - R_ij to its bank account if it's greater than 0 3 u_ij = r_ij and r_ij < D_j: R_ij = D_j (or R_ij = u_ij + k, where k is a small #) 4 u_ij = r_ij and r_ij >= D_j: R_ij = D_j

Ning Li Jordan Parker Scalable Cluster Resource Management 19 Bank Algorithm: part 2 let t_i be # of tickets currently allocated on node i IF t_i >= T_i normalize the tickets so that t_i = T_i ELSE check balance B_ij in bank account for class j in case 4 above

Ning Li Jordan Parker Scalable Cluster Resource Management 20 Bank Algorithm: part 2 (continued) option 1: check classes in decreasing balance order let b_ij = min(B_ij, h), where h is a relatively small # R_ij += b_ij, and draw b_ij from j's bank account t_i += b_ij until t_i >= T_i option 2: check all classes in case 4 above with balance >= 0 allocate T_i - t_i tickets to these classes proportional to their bank account, and draw from bank account accordingly

Ning Li Jordan Parker Scalable Cluster Resource Management 21 Bank Algorithm: part 3 assign to classes in case 4 above proportional to their share or their need if there are still unassigned tickets.

Ning Li Jordan Parker Scalable Cluster Resource Management 22 Notes and Other Strategies: Note: Tickets in bank account has a time-stamp associated with it, and will expire after getting certain age. Strategy: Manager could force some compensation if t_i >= T_i on all the nodes before adjustment, and some classes have high balance in their accounts. Manager could allocate a reasonable amount of tickets as in option 2 above, then normalize so that t_I gets equal to T_i. Strategy: Some class on some node may choose to reserve some tickets for its use on this same node in the near future, but not deposit them in the bank. We'll check this option.

Ning Li Jordan Parker Scalable Cluster Resource Management 23 Transfer Strategy: Very simple Based on the previous usage report from lower-level managers, current manager transfers from one account to another where tickets are badly needed.

Ning Li Jordan Parker Scalable Cluster Resource Management 24 Transfer Strategy: More detailed (if needed) check class-manager pair in decreasing usage/share order, i.e. check those classes that need more tickets most check j's account on other managers l, where usage/share is low transfer min(B_lj,b) tickets from j's acccount on manager l to j's account on manager i, where b is a constant

Ning Li Jordan Parker Scalable Cluster Resource Management 25 Thinking of better strategies. :-) Any Ideas

Ning Li Jordan Parker Scalable Cluster Resource Management Network View

Ning Li Jordan Parker Scalable Cluster Resource Management 27 Full Network Overview WAN

Ning Li Jordan Parker Scalable Cluster Resource Management 28 Failure Design Essentially tried to create a structure similar to a tree structure Thus we try to delete nodes and deal with the recovery similar to removing a node from a tree

Ning Li Jordan Parker Scalable Cluster Resource Management 29 Minor Node(6) Failure

Ning Li Jordan Parker Scalable Cluster Resource Management 30 1 st Level Manger(2) Failure

Ning Li Jordan Parker Scalable Cluster Resource Management 31 2 nd Level Manger(1) Failure

Ning Li Jordan Parker Scalable Cluster Resource Management 32 Node Insertion Simply find a manager with nodes to fill If there is no space simply make a leaf node into a manager

Ning Li Jordan Parker Scalable Cluster Resource Management 33 Why discuss failure? Not relevant to the performance of our scheduler, we don’t even plan to simulate it (unless we have lots of free time), but … It does show that the network layout we’ve designed could easily handle failures Making the tree balance itself and handling failures could be relatively straight forward

Ning Li Jordan Parker Scalable Cluster Resource Management 34 Network Simulator - NS Our Components –A new Agent Class: RsrcAgent Agents are servers running on a node –A script to create ns input file Specifies network layout –Number of Nodes –Nodes per Manager Specifies the request trace

Ning Li Jordan Parker Scalable Cluster Resource Management 35 NS implementation status Look at code

Ning Li Jordan Parker Scalable Cluster Resource Management 36 Evaluation NS should make it easy Just extract information from nodes about load balance More importantly look at the rate queries get handled by the nodes