Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay.

Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay

Web site infrastructure Clustered, multi-tiered architectures e-Shopping Open the portal home page Login View items, prices, availability Select an item type Specify the no. of items Confirm by entering the credit card number Logout

WS vs. AS Web servers –Do well defined and quantifiable local work e.g., processing HTTP headers, serving static content Application servers –Run multi-layer programs e.g., scripts involving calls to backends

ReDal In clustered, multi-tiered architectures, two request distribution points: –Web Server Request Distribution (WSRD): Web switch distributes requests to the web server cluster –Application Server Request Distribution (ASRD): Web server distributes requests requiring business logic to the application server cluster ReDal: Request Distribution for the Application Layer An approach for efficient distribution of requests across a cluster of application servers

Web Server Request Distribution Many policies: Random, Round Robin (RR), Weighted Round Robin (WRR), Least Connections –Several of these policies are commercially implemented (e.g., Cisco’s Local Director and F5’s BIG/IP) Two improvements: 1.Session Affinity 2.Locality-Aware Request Distribution (LARD) attempts to exploit locality of working sets on different servers – not applicable to dynamically generated content Session Affinity: Consecutive requests in a given user session will be served faster if they are handled by the same server

Application Server Request Distribution Dynamic scheduling techniques usually presuppose some knowledge of task (e.g., duration, weight) and/ or resource (e.g., queue sizes, service times) –In ASRD, both tasks and resources are highly dynamic So, techniques are adaptations of WSRD techniques Most common technique: combination of RR and Session Affinity –Requests starting new sessions are dispatched according to RR –Subsequent requests in a session are routed to the server where the session’s previous request was served, i.e., where the session object resides => frequently results in load imbalances

ReDal: Motivation Request distribution combining RR and Session Affinity Short and long sessions arrive at at one-minute intervals S S L S S L S L L S

ReDAL Objective Distribute requests across a cluster of application servers such that: Load on each application server is kept below a certain threshold Session affinity is preserved where possible Lightly Loaded #users Trs per Sec Throughput Peak Peak Load Heavily Loaded

ReDAL Components Application Analyzer characterizes behavior of application server Runs in offline phase to record peak throughput/load values, which are used at runtime by Request Dispatcher routes requests to a set of application servers Monitors expected and actual load on each application server Routes a given request to the affined server if lightly loaded else to application server having lowest expected load

ReDAL Algorithm based on key observation: think-time or view-time on a page is predictable based on past behavior Jeffrey Heer and Ed H. Chi (Palo Alto Xerox Research Center), “Mining the Structure of User Activity using Cluster Stability”, Proceedings of the Web Analytics Workshop, SIAM Conference on Data Mining (2002)

ReDal: Capacity Reservation Consider a finite lookahead period partitioned into discrete time periods or slices Current Time Time Slice Timet1t1 t2t2 r1r1 r2r2 Think Time Slice 0Slice 1Slice 2 Load metrics: Actual Load = number of requests in time slice Expected Load = number of requests expected in a time slice based on think time, i.e., time between subsequent requests in a session –e.g., Capacity is reserved for request r 2 on this application server during time slice 2 Modified Load = Actual Load +  Expected Load (0    1)  accounts for prediction errors

ReDal: Algorithm Overview Inputs: Request in a session, Think time, Time slice duration,  Output: Assignment of request to application server A A = NULL A = SessionAffinity() If A is NULL A = LeastLoaded() UpdateLoadMetrics() AdvanceTimeSlice() Return A SessionAffinity If ActualLoad() < PeakLoad() Return AffinedServer() LeastLoaded If request is part of new session A = LeastLoaded(modified) Else A = LeastLoaded(actual) Return A

Consistent global view of metadata Multicasting of changed load info by WS request dispatcher Session objects virtualized in a shared db Web server records time of response in a cookie –useful for estimating think times in web server clusters

ReDal: Evaluation ReDal, RR, HJ implemented as Apache Web Server plug-ins Load generator simulates a varying number of simultaneous user sessions, each session submitting a stream of requests Each request chosen from a uniform distribution across the high and low load transaction requests Load generator (LoadRunner 6), Web server (Apache), 10 application server instances (WebLogic 7.1), and session repository (Oracle 8), each running on separate hardware Machine configuration: single-CPU (900 MHz), 1GB RAM, 20 GB disk, running Windows 2000 Advanced Server (SP3) HJ (Hwang and Jung, 2002) uses “least-active-requests” routing policy not applicable to stateful applications

ReDal: Experimental Results Performance Metrics: Average Throughput per Application Server (ATAS): average number of transactions per second an application server in the cluster provides Average Response Time (ART): average response time provided by the application servers, measured from the end user perspective Web Server CPU Utilization (WSCU): percentage CPU utilization on the web server, measured by OS utilities Peak % CPU on the Application Servers: peak percentage CPU usage among a cluster of application servers measured by OS utilities. Scaling with Application Servers: percentage CPU usage in web server for various number of application servers in application server cluster.

Throughput Performance ReDAL (0.9) is ReDAL algorithm with  = 0.9 ReDAL (0.5) is ReDAL algorithm with  = 0.5 ReDAL with  = 0.9 case has highest throughput

Response Time Performance ReDAL with  = 0.9 case has best response time

CPU Overhead on the Web Server Additional overhead of ReDal algorithm is 1.5% or less

Peak CPU Utilization on Application Servers Highest in the RR case and lowest in the ReDAL (  = 0.9) case

Scaling with Application Servers overhead of ReDAL algorithm is at or below 15% for 100 concurrent sessions 0 2 4 6 8 10 12 14 020406080100 Number of Simulatenous Sessions WSCU (%) #App-Server=5 #App-Server=10 #App-Server=20

Real World Evaluation Online credit card application 30 WebLogic application servers on Linux Redhat 9.0 Apache Web Server on Linux RedHat 9.0 Machine hardware configuration: 1 GB RAM, 2.2 GHz dual processors Load was simulated by re-tracing web log collected during various times over a day At a peak load of 1000 simultaneous sessions, ReDAL improved the response time of RR by 100%. 0 200 400 600 800 1000 1200 1400 1600 1800 02004006008001000 Number of Simultaneous Sessions ART (ms) ReDal-0.8 HJ RR

Summary ReDal: Application server load Distribution Maximizes affinity Exploits application characteristics Practical and scalable

Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay.

Similar presentations

Presentation on theme: "Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay.

Similar presentations

Presentation on theme: "Request Distribution in Server Clusters Krithi Ramamritham Indian Institute of Technology Bombay."— Presentation transcript:

Similar presentations

About project

Feedback