Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay.

Slides:

Advertisements

Similar presentations

Performance Testing - Kanwalpreet Singh.

Advertisements

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

CS 149: Operating Systems February 3 Class Meeting

Introduction CSCI 444/544 Operating Systems Fall 2008.

XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.

Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

A Stratified Approach for Supporting High Throughput Event Processing Applications July 2009 Geetika T. LakshmananYuri G. RabinovichOpher Etzion IBM T.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Performance Evaluation

Fair Scheduling in Web Servers CS 213 Lecture 17 L.N. Bhuyan.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

5: CPU-Scheduling1 Jerry Breecher OPERATING SYSTEMS SCHEDULING.

Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.

23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.

Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.

Computer Science Cataclysm: Policing Extreme Overloads in Internet Applications Bhuvan Urgaonkar and Prashant Shenoy University of Massachusetts.

Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.

Designing and Deploying a Scalable EPM Solution Ken Toole Platform Test Manager MS Project Microsoft.

Entities and Objects The major components in a model are entities, entity types are implemented as Java classes The active entities have a life of their.

Mechanisms for Quality of Service in Web Clusters V. Cardellini, E. Casalicchio, S.Tucci M. Colajanni University of Roma “Tor Vergata” University of Modena.

1 Challenges in Scaling E-Business Sites  Menascé and Almeida. All Rights Reserved. Daniel A. Menascé Department of Computer Science George Mason.

Online Music Store. MSE Project Presentation III

A Throttling Layer-7 Web Switch James Furness. Motivation & Goals Specification & Design Design detail Demonstration Conclusion.

1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.

(C) J. M. Garrido1 Objects in a Simulation Model There are several objects in a simulation model The activate objects are instances of the classes that.

1 PerfCenter and AutoPerf: Tools and Techniques for Modeling and Measurement of the Performance of Distributed Applications Varsha Apte Faculty Member,

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 32 – Multimedia OS Klara Nahrstedt Spring 2010.

Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.

CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.

Basic Concepts Maximum CPU utilization obtained with multiprogramming

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Web Server Load Balancing/Scheduling

OPERATING SYSTEMS CS 3502 Fall 2017

Chapter 6: CPU Scheduling (Cont’d)

Web Server Load Balancing/Scheduling

Introduction to Load Balancing:

Software Architecture in Practice

Chapter 5a: CPU Scheduling

Chapter 5: CPU Scheduling

Transparent Adaptive Resource Management for Middleware Systems

Chapter 6: CPU Scheduling

CPU Scheduling G.Anuradha

Module 5: CPU Scheduling

Operating System Concepts

Admission Control and Request Scheduling in E-Commerce Web Sites

Chapter 6: CPU Scheduling

CPU SCHEDULING.

Performance And Scalability In Oracle9i And SQL Server 2000

Chapter 6: CPU Scheduling

Module 5: CPU Scheduling

Chapter 6: CPU Scheduling

Client/Server Computing and Web Technologies

Presentation transcript:

Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay

Web site infrastructure Clustered, multi-tiered architectures e-Shopping Open the portal home page Login View items, prices, availability Select an item type Specify the no. of items Confirm by entering the credit card number Logout

WS vs. AS Web servers –Do well defined and quantifiable local work e.g., processing HTTP headers, serving static content Application servers –Run multi-layer programs e.g., scripts involving calls to backends

ReDal In clustered, multi-tiered architectures, two request distribution points: –Web Server Request Distribution (WSRD): Web switch distributes requests to the web server cluster –Application Server Request Distribution (ASRD): Web server distributes requests requiring business logic to the application server cluster ReDal: Request Distribution for the Application Layer An approach for efficient distribution of requests across a cluster of application servers

Web Server Request Distribution Many policies: Random, Round Robin (RR), Weighted Round Robin (WRR), Least Connections –Several of these policies are commercially implemented (e.g., Cisco’s Local Director and F5’s BIG/IP) Two improvements: 1.Session Affinity 2.Locality-Aware Request Distribution (LARD) attempts to exploit locality of working sets on different servers – not applicable to dynamically generated content Session Affinity: Consecutive requests in a given user session will be served faster if they are handled by the same server

Application Server Request Distribution Dynamic scheduling techniques usually presuppose some knowledge of task (e.g., duration, weight) and/ or resource (e.g., queue sizes, service times) –In ASRD, both tasks and resources are highly dynamic So, techniques are adaptations of WSRD techniques Most common technique: combination of RR and Session Affinity –Requests starting new sessions are dispatched according to RR –Subsequent requests in a session are routed to the server where the session’s previous request was served, i.e., where the session object resides => frequently results in load imbalances

ReDal: Motivation Request distribution combining RR and Session Affinity Short and long sessions arrive at at one-minute intervals S S L S S L S L L S

ReDAL Objective Distribute requests across a cluster of application servers such that: Load on each application server is kept below a certain threshold Session affinity is preserved where possible Lightly Loaded #users Trs per Sec Throughput Peak Peak Load Heavily Loaded

ReDAL Components Application Analyzer characterizes behavior of application server Runs in offline phase to record peak throughput/load values, which are used at runtime by Request Dispatcher routes requests to a set of application servers Monitors expected and actual load on each application server Routes a given request to the affined server if lightly loaded else to application server having lowest expected load

ReDAL Algorithm based on key observation: think-time or view-time on a page is predictable based on past behavior Jeffrey Heer and Ed H. Chi (Palo Alto Xerox Research Center), “Mining the Structure of User Activity using Cluster Stability”, Proceedings of the Web Analytics Workshop, SIAM Conference on Data Mining (2002)

ReDal: Capacity Reservation Consider a finite lookahead period partitioned into discrete time periods or slices Current Time Time Slice Timet1t1 t2t2 r1r1 r2r2 Think Time Slice 0Slice 1Slice 2 Load metrics: Actual Load = number of requests in time slice Expected Load = number of requests expected in a time slice based on think time, i.e., time between subsequent requests in a session –e.g., Capacity is reserved for request r 2 on this application server during time slice 2 Modified Load = Actual Load +  Expected Load (0    1)  accounts for prediction errors

ReDal: Algorithm Overview Inputs: Request in a session, Think time, Time slice duration,  Output: Assignment of request to application server A A = NULL A = SessionAffinity() If A is NULL A = LeastLoaded() UpdateLoadMetrics() AdvanceTimeSlice() Return A SessionAffinity If ActualLoad() < PeakLoad() Return AffinedServer() LeastLoaded If request is part of new session A = LeastLoaded(modified) Else A = LeastLoaded(actual) Return A

Consistent global view of metadata Multicasting of changed load info by WS request dispatcher Session objects virtualized in a shared db Web server records time of response in a cookie –useful for estimating think times in web server clusters

ReDal: Evaluation ReDal, RR, HJ implemented as Apache Web Server plug-ins Load generator simulates a varying number of simultaneous user sessions, each session submitting a stream of requests Each request chosen from a uniform distribution across the high and low load transaction requests Load generator (LoadRunner 6), Web server (Apache), 10 application server instances (WebLogic 7.1), and session repository (Oracle 8), each running on separate hardware Machine configuration: single-CPU (900 MHz), 1GB RAM, 20 GB disk, running Windows 2000 Advanced Server (SP3) HJ (Hwang and Jung, 2002) uses “least-active-requests” routing policy not applicable to stateful applications

ReDal: Experimental Results Performance Metrics: Average Throughput per Application Server (ATAS): average number of transactions per second an application server in the cluster provides Average Response Time (ART): average response time provided by the application servers, measured from the end user perspective Web Server CPU Utilization (WSCU): percentage CPU utilization on the web server, measured by OS utilities Peak % CPU on the Application Servers: peak percentage CPU usage among a cluster of application servers measured by OS utilities. Scaling with Application Servers: percentage CPU usage in web server for various number of application servers in application server cluster.

Throughput Performance ReDAL (0.9) is ReDAL algorithm with  = 0.9 ReDAL (0.5) is ReDAL algorithm with  = 0.5 ReDAL with  = 0.9 case has highest throughput

Response Time Performance ReDAL with  = 0.9 case has best response time

CPU Overhead on the Web Server Additional overhead of ReDal algorithm is 1.5% or less

Peak CPU Utilization on Application Servers Highest in the RR case and lowest in the ReDAL (  = 0.9) case

Scaling with Application Servers overhead of ReDAL algorithm is at or below 15% for 100 concurrent sessions Number of Simulatenous Sessions WSCU (%) #App-Server=5 #App-Server=10 #App-Server=20

Real World Evaluation Online credit card application 30 WebLogic application servers on Linux Redhat 9.0 Apache Web Server on Linux RedHat 9.0 Machine hardware configuration: 1 GB RAM, 2.2 GHz dual processors Load was simulated by re-tracing web log collected during various times over a day At a peak load of 1000 simultaneous sessions, ReDAL improved the response time of RR by 100% Number of Simultaneous Sessions ART (ms) ReDal-0.8 HJ RR

Summary ReDal: Application server load Distribution Maximizes affinity Exploits application characteristics Practical and scalable