Load Sharing for Cluster-Based Network Service Jiani Guo and Laxmi Bhuyan Architecture Lab Department of Computer Science and Engineering University of.

Slides:

Advertisements

Similar presentations

Load Balancing in a Cluster-based Active Jiani Guo (Student Member, IEEE) Laxmi Bhuyan (Fellow, IEEE) March 15 th 2005 Seo, Dong Mahn.

Advertisements

DISTRIBUTED MULTIMEDIA SYSTEMS

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture & Protocols TCP-Friendly Transport Protocols.

QoS Aware Scheduling in a Cluster-Based Web Server Jiani Guo Architecture Lab Department of Computer Science and Engineering University of California,

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

CSE 522 Real-Time Scheduling (4)

01. Apr INF-3190: Congestion Control Congestion Control Foreleser: Carsten Griwodz

Traffic Shaping Why traffic shaping? Isochronous shaping

LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.

Sec-TEEN: Secure Threshold sensitive Energy Efficient sensor Network protocol Ibrahim Alkhori, Tamer Abukhalil & Abdel-shakour A. Abuznied Department of.

Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.

Playback-buffer Equalization For Streaming Media Using Stateless Transport Prioritization By Wai-tian Tan, Weidong Cui and John G. Apostolopoulos Presented.

Distributed Multimedia Systems

Implementing a Load-balanced Web Server System. Architecture of A Cluster-based Web System Courtesy: IBM Research Report, The state of the art in the.

Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.

EECB 473 Data Network Architecture and Electronics Lecture 3 Packet Processing Functions.

Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.

NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

Hermes: An Integrated CPU/GPU Microarchitecture for IPRouting Author: Yuhao Zhu, Yangdong Deng, Yubei Chen Publisher: DAC'11, June 5-10, 2011, San Diego,

Differentiated Services. Service Differentiation in the Internet Different applications have varying bandwidth, delay, and reliability requirements How.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,

1 A General Auction-Based Architecture for Resource Allocation Weidong Cui, Matthew C. Caesar, and Randy H. Katz EECS, UC Berkeley {wdc, mccaesar,

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

End-to-End Analysis of Distributed Video-on-Demand Systems P. Mundur, R. Simon, and A. K. Sood IEEE Transactions on Multimedia, Vol. 6, No. 1, Feb 2004.

In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.

Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.

A Real-Time Video Multicast Architecture for Assured Forwarding Services Ashraf Matrawy, Ioannis Lambadaris IEEE TRANSACTIONS ON MULTIMEDIA, AUGUST 2005.

Streaming Video Gabriel Nell UC Berkeley. Outline Scalable MPEG-4 video – Layered coding method – Integrated transport-decoder buffer model RAP streaming.

1 QoS Schemes for IEEE Wireless LAN – An Evaluation by Anders Lindgren, Andreas Almquist and Olov Schelen Presented by Tony Sung, 10 th Feburary.

School of Information Technologies IP Quality of Service NETS3303/3603 Weeks

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Omar Darwish.  Load balancing is the process of improving the performance of a parallel and distributed system through a redistribution of load among.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.

Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

Lecture 1 Introduction to Application Oriented Networking.

 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.

Distributed Multimedia March 19, Distributed Multimedia What is Distributed Multimedia?  Large quantities of distributed data  Typically streamed.

“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

CONGESTION CONTROL and RESOURCE ALLOCATION. Definition Resource Allocation : Process by which network elements try to meet the competing demands that.

Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Quality of Service Karrie Karahalios Spring 2007.

L.R.He, B.M.G. Cheetham Mobile Systems Architecture Group, Department of Computer Science, University of Manchester, Oxford Rd, M13 9PL, U.K.

An Overlay Network Providing Application-Aware Multimedia Services Maarten Wijnants Bart Cornelissen Wim Lamotte Bart De Vleeschauwer.

A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside

Streaming and Content Delivery SECTIONS 7.4 AND 7.5.

On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.

August 23, 2001ITCom2001 Proxy Caching Mechanisms with Video Quality Adjustment Masahiro Sasabe Graduate School of Engineering Science Osaka University.

“A cost-based admission control algorithm for digital library multimedia systems storing heterogeneous objects” – I.R. Chen & N. Verma – The Computer Journal.

Multiplexing Team Members: Cesar Chavez Arne Solas Steven Fong Vi Duong David Nguyen.

Lecture Network layer -- May Congestion control Algorithms.

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Queue Scheduling Disciplines

Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.

Scheduling Mechanisms Applied to Packets in a Network Flow CSC /15/03 By Chris Hare, Ricky Johnson, and Fulviu Borcan.

-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Cluster-Based Scalable

OPERATING SYSTEMS CS 3502 Fall 2017

Providing Real-time Security Support for Multi-level Ad-hoc Networks

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Provision of Multimedia Services in based Networks

Introduction to Packet Scheduling

Introduction to Packet Scheduling

Presentation transcript:

Load Sharing for Cluster-Based Network Service Jiani Guo and Laxmi Bhuyan Architecture Lab Department of Computer Science and Engineering University of California, Riverside

2 Courtesy: “Cluster-Based Scalable Network Services”, Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer and Paul Gauthier.

3 Video on Demand System Transcoding Service Internet a large number of clients heterogeneity in clients’ inbound network bandwidth, CPU/MEM capacity or display resolution Storing multiple copies in server will give rise to server overload and scalability problem

4 Cluster-Based Transcoding Service process the stream on-the-fly according to the client’s requirements => make some money wide range of needs in video rates, sizes, and bandwidths can be met by real-time transcoding service – Need parallel processing Transcoding Service

5 Existing Load Balancing Schemes Plethora of research in the field of load balancing, but most of them only did simulations Random or Round-robin implemented in practice Adaptive load balancing is desirable, but the overhead in collecting statistics is very high – we found no real implementation How does one maintain QoS while doing load balancing? Ex: To reduce out-of-order departures of multimedia units, the GOPs must be assigned to one processor, when a good load balancing needs distribution of the workload

6 EX: Round Robin Worker 1 Dispatcher Worker N Worker 2 Unit Buffer Receiver Manager fetch a unit Find an available Worker Send the unit High communication protocol (UDP) overhead

7 Round Robin – A Multithreaded Model to Reduce Communication Cost Worker 1 Worker N Worker 2 Unit Buffer Receiver Manager fetch a unit Find an available Worker Send the unit Dispatcher 1 Dispatcher M......

8 Load Sharing Schemes Round Robin - First Fit Methods Searches for an available Worker in round robin way The first available Worker is chosen to be dispatched a GOP How a manager detects if a Worker is available is implementation- dependent. Properties Load is naturally balanced among all the Workers. Fast processing rate because no extra load analyzer is needed to guide scheduling. May incur severe delay jitter for each stream because the GOPs of the same stream are most likely to be distributed to different Workers.

9 Round Robin – First Fit Worker 1 Worker N Worker 2 dispatch queues Receiver Manager Node scheduler Dispatcher 1 Dispatcher 2 Dispatcher N GOP Queue Is the Worker available? = Is there a vacancy in the dispatch queue? Depends on power of the worker!

10 Stream-based Mapping Methods The media unit is mapped to a Worker according to the following function: f ( C ) = C mod N where C is the stream number to which the unit belongs; N is the total number of Workers in the cluster. All media units belonging to one stream are sent to the same Worker. Properties Preserves the order of computation among media units. Simple algorithm. Most efficient for some specific input patterns in a homogeneous cluster.  Specific patterns : M is multiple of N, where M is the total number of streams  What if M < N?

11 Adaptive Load Balancing - Least Load First Feedback-based Scheme — Least Load First Efficient load test mechanism is needed for the Manager to monitor load distribution in the cluster.  Workers periodically report their load statistics information to the Manager. The Worker with the least load is chosen to dispatch the job. May incur substantial overhead to implement the feedback mechanism. Each Worker reports to the Manager its load information during each epoch ∆t. Load information reported by each Worker  CPU utilization AU i (t)  Maximal possible throughput A i (t)  Actual throughput: A i (t) – N N is the number of outstanding requests, i.e., the number of GOPs already dispatched to it but not yet completed Manager chooses the least loaded Worker: Worker with the maximal actual throughput

12 Adaptive Load Sharing Unit-to-ComputingPC Mapping ( Done by the dispatcher) Robust Hashing Mapping — The unit identifier ( such as the stream number of the unit in our experiment) and the Computing PC number together are used to assign a random value to each Computing PC. The unit is mapped to the Computing PC with the highest random value. If the Computing PCs have unequal capacity, the random value assigned to each Computing PC may be scaled by a weight which guarantees that the Computing PC with higher capacity can receive a proportionately higher portion of the load. Thus, the mapping is calculated base on three values: the stream number of the unit C  (1,2,…S), the Computing PC number J  (1,2,…, N) and the weight vector (x 1, x 2, x 3, …, x N ). Minimize the probability of units belonging to the same stream being dispatched to different nodes. And this goal is achieved without keeping state information per stream. Dynamic Weight Adaptation (Done by the manager) The workload on the Computing PCs (ρ 1 (t), ρ 2 (t), …, ρ N (t)) is collected periodically and the weight vector (x 1, x 2, x 3, …, x N ) is adapted in a specific way such that the amount of stream re-mappings is minimized as well as load balancing is achieved. The adapted weight vector is fed to the dispatchers.

13 Adaptive Load Sharing Manager (x 1, x 2, x 3, …, x N ) Computing PC 1 Dispatcher 1 Computing PC N Computing PC 2 Dispatcher M Unit Buffer Receiver Routing PC ρ 1 (t) ρ N (t) ρ 2 (t) (x 1, x 2, x 3, …, x N ) Fetch a unit F( C ) = J Send to node JJ available?Start End Yes No

14 Experimental Set-Up Computing PC Gigabit Ethernet Manager Processed packets Un-nrocessed packets Media Server 100M Ethernet Computing PC

15 Transcoding Service What is transoding? Transforming video/audio streams such as changing the bit-rate, resizing video frames, and adjusting the frame resolution and so on. How to transcode? MPEG Stream Raw Stream Manipulated Stream MPEG Stream MPEG Decoder Video/Audio Frame Manipulator MPEG Encoder

16 Transcoding Workload A media unit is a Group Of Pictures(GOP) of MPEG stream A media unit can be transcoded independently by any Worker in the cluster. Transcoding one media unit is considered an independent job. No communication is required among jobs. Each job consumes similar amount of processing time. Consecutive media units in a stream are preferred to be processed in order.

17 Design Goals of the Load Sharing Schemes Balance the transcoding workload among all Workers High system throughput Low overhead taken by the load balancing algorithm itself Good tradeoff between computation and communication Provide good Quality of Service - NEW In-order departure of media units Even output time interval among successive media units of a media stream

18 Computation Model of the Transcoding Cluster

19 Manager Node Receiver Thread Accepts incoming media units into the GOP Queue Scheduler Thread Fetches GOPs from the GOP queue and puts them into an appropriate dispatch queue according to the specific load sharing scheme Dispatcher Thread per Worker Each Dispatcher maintains a dispatch queue Once requested by the corresponding Worker, dispatches one GOP to the Worker Manager Thread — Only for Least Load First Scheme Collects the load statistics information from the Workers during each epoch Feeds the load information to the scheduler Collector Thread Collects processed video units from Workers and sends them out

20 Worker Node Reciever Thread Receives packets from the Manager Node and assembles them into a complete GOP. Once a complete GOP is received, gives it to the Transcoder thread, and then requests for another GOP from the Manager Node. Transcoder Thread Transcode a GOP. Sender Thread Delivers the transcoded GOPs to the clients. Monitor Thread Collects the load statistics information on the Worker node and reports to the Manager Node periodically.

21 Scalability of the System 5 media streams

22 Scalability of the System Throughput System throughput scales well with First Fit and Least Load First. Load test overhead in Least Load First scheme doesn’t affect the system throughput a lot, because the overhead is relatively small compared with the time taken to transcode one GOP. Stream-based Mapping cannot disperse media units of the same stream among different Workers even if a Worker is free. Waste of resources. Occasional imbalance in load distribution. Reduced throughput.

23 Out-of-Order Rate per Stream

24 Out-of-Order Rate per Stream Out-of-order departure of media units Occurs when consecutive GOPs of a stream are transcoded on different Workers The worklaod on different Workes is different Different media units consume different amount of computation time Stream-based Mapping eliminates out-of-order departure of media units. Largest OFO rate for First Fit. Least Load First improves 50% over First Fit.

25 Output Time Interval (OTI) per Stream

26 Output Time Interval(OTI) per Stream Experiment setting 4 homogeneous Workers, 5 media streams First Fit achieves the best performance. Least Load First approaches First Fit. Longer delay for Stream-based Mapping because of the limitation that one stream can only be processed by one Worker.

27 Load Sharing Overhead with LLF Load Test Overhead Average time consumed by the Manager node to poll through all Workers to collect the load statistics information. Load Remapping Overhead Time used to set the current loads for each Worker. Cluster Size1234 Load Test Overhead ( msecs) Cluster Size234 Load Remapping Overhead (usecs)

28 Load Sharing Overhead with LLF Load test overhead increases roughly proportional to the cluster size. Load re-mapping overhead is much smaller than the load test overhead, almost negligible. The operation overhead involved in load re-mapping is much less than the network communication overhead involved in load test.

29

30 Load Sharing Schemes How to take QoS into consideration? Transcoding PC 1 Scheduler Transcoding PC N Transcoding PC 2 Unit Buffer Receiver fetch a unit Find an available Computing PC Send the unit SchedulePC

31 Differentiated Service(Fair Scheduling) A system is said to be capable of affording differentiated service among service classes if The system permits its resources to be proportioned among the service classes Given sufficient request load, a service class receives at least as much resources as were assigned to it irrespective of the load on other service classes Resources not used by some service class may be distributed among other service classes.

32 Framework of Fair Scheduling

33 Fair Scheduleing Fairly distribute resource among streams Streams make reservations Received service is proportional to the reservations UnitScheduler - Weighted Round Robin ( WRR ) Provide differentiated service rate to multiple streams Weights in each round-robin cycle are dynamically adapted to achieve the best performance Weight of stream i W i (t): the number of GOPs scheduled for stream i during one round robin cycle