Internet Applications: Performance Metrics and performance-related concepts E0397 – Lecture 2 10/8/2010.

Slides:



Advertisements
Similar presentations
Chapter 13 Queueing Models
Advertisements

Introduction to Queuing Theory
Closed Queuing Networks (Mean Value Analysis). Closed Queuing Networks Arise in two situations Arise in two situations When “source” of requests is explicitly.
Capacity Setting and Queuing Theory
1 Chapter 8 Queueing models. 2 Delay and Queueing Main source of delay Transmission (e.g., n/R) Propagation (e.g., d/c) Retransmission (e.g., in ARQ)
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
1 Part II Web Performance Modeling: basic concepts © 1998 Menascé & Almeida. All Rights Reserved.
System Performance & Scalability i206 Fall 2010 John Chuang.
ECS 152A Acknowledgement: slides from S. Kalyanaraman & B.Sikdar
Performance analysis for high speed switches Lecture 6.
1 Performance Evaluation of Computer Networks Objectives  Introduction to Queuing Theory  Little’s Theorem  Standard Notation of Queuing Systems  Poisson.
Single queue modeling. Basic definitions for performance predictions The performance of a system that gives services could be seen from two different.
Data Communication and Networks Lecture 13 Performance December 9, 2004 Joseph Conron Computer Science Department New York University
1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.
Little’s Theorem Examples Courtesy of: Dr. Abdul Waheed (previous instructor at COE)
1 Multiple class queueing networks Mean Value Analysis - Open queueing networks - Closed queueing networks.
Introduction to Queuing Theory. 2 Queuing theory definitions  (Kleinrock) “We study the phenomena of standing, waiting, and serving, and we call this.
1 Part VI System-level Performance Models for the Web © 1998 Menascé & Almeida. All Rights Reserved.
Internet Queuing Delay Introduction How many packets in the queue? How long a packet takes to go through?
Lecture 14 – Queuing Systems
Introduction to Queuing Theory
Copyright warning. COMP5348 Lecture 6: Predicting Performance Adapted with permission from presentations by Alan Fekete.
AN INTRODUCTION TO THE OPERATIONAL ANALYSIS OF QUEUING NETWORK MODELS Peter J. Denning, Jeffrey P. Buzen, The Operational Analysis of Queueing Network.
Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application Nikhil R. Ramteke, Advisor: Prof. Varsha Apte, Department of CSA, IISc.
Network Analysis A brief introduction on queues, delays, and tokens Lin Gu, Computer Networking: A Top Down Approach 6 th edition. Jim Kurose.
Introduction to Queuing Theory
Queuing models Basic definitions, assumptions, and identities Operational laws Little’s law Queuing networks and Jackson’s theorem The importance of think.
Performance Analysis of Computer Systems and Networks Varsha Apte Department of Computer Science and Engineering, IIT Bombay July 10, 2003 Wipro Technologies,
 Birth Death Processes  M/M/1 Queue  M/M/m Queue  M/M/m/B Queue with Finite Buffers  Results for other Queueing systems 2.
Introduction to Operations Research
Performance Analysis of Computer Systems and Networks Prof. Varsha Apte.
Lecture 10: Queueing Theory. Queueing Analysis Jobs serviced by the system resources Jobs wait in a queue to use a busy server queueserver.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
Introduction to Queueing Theory
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
1 Queueing Theory Frank Y. S. Lin Information Management Dept. National Taiwan University
Queueing Theory What is a queue? Examples of queues: Grocery store checkout Fast food (McDonalds – vs- Wendy’s) Hospital Emergency rooms Machines waiting.
TexPoint fonts used in EMF.
CS433 Modeling and Simulation Lecture 12 Queueing Theory Dr. Anis Koubâa 03 May 2008 Al-Imam Mohammad Ibn Saud University.
1 Chapters 8 Overview of Queuing Analysis. Chapter 8 Overview of Queuing Analysis 2 Projected vs. Actual Response Time.
yahoo.com SUT-System Level Performance Models yahoo.com SUT-System Level Performance Models8-1 chapter11 Single Queue Systems.
Queueing Models with Multiple Classes CSCI 8710 Tuesday, November 28th Kraemer.
Queuing Theory and Traffic Analysis Based on Slides by Richard Martin.
M/M/1 Queues Customers arrive according to a Poisson process with rate. There is only one server. Service time is exponential with rate  j-1 jj+1...
CS352 - Introduction to Queuing Theory Rutgers University.
CSCI1600: Embedded and Real Time Software Lecture 19: Queuing Theory Steven Reiss, Fall 2015.
1 Part VII Component-level Performance Models for the Web © 1998 Menascé & Almeida. All Rights Reserved.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VI System-level Performance Models for the Web (Book, Chapter 8)
Maciej Stasiak, Mariusz Głąbowski Arkadiusz Wiśniewski, Piotr Zwierzykowski Model of the Nodes in the Packet Network Chapter 10.
1 Queuing Delay and Queuing Analysis. RECALL: Delays in Packet Switched (e.g. IP) Networks End-to-end delay (simplified) = End-to-end delay (simplified)
Queuing Theory.  Queuing Theory deals with systems of the following type:  Typically we are interested in how much queuing occurs or in the delays at.
1 PerfCenter and AutoPerf: Tools and Techniques for Modeling and Measurement of the Performance of Distributed Applications Varsha Apte Faculty Member,
Queueing Fundamentals for Network Design Application ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Random Variables r Random variables define a real valued function over a sample space. r The value of a random variable is determined by the outcome of.
Mohammad Khalily Islamic Azad University.  Usually buffer size is finite  Interarrival time and service times are independent  State of the system.
QUEUING THEORY 1.  - means the number of arrivals per second   - service rate of a device  T - mean service time for each arrival   = ( ) Utilization,
Queuing Theory Simulation & Modeling.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VI System-level Performance Models for the Web.
OPERATING SYSTEMS CS 3502 Fall 2017
Internet Queuing Delay Introduction
B.Ramamurthy Appendix A
Queuing models Basic definitions, assumptions, and identities
System Performance: Queuing
Queuing models Basic definitions, assumptions, and identities
TexPoint fonts used in EMF.
Queueing Theory 2008.
CSE 550 Computer Network Design
Presentation transcript:

Internet Applications: Performance Metrics and performance-related concepts E0397 – Lecture 2 10/8/2010

Generic Internet Service Client Server User (Client) Service Provider (Server systems) Assume no “cloud”

Performance Concerns User/Client:  Response time  Blocking Service Provider/Server  Assuring good performance to client  High volume of usage (many clients)  Keeping costs low

Service Provider’s concerns What should be the configuration of the Web server? (Number of threads, buffer size,…) On which machines should IMAP server be deployed? The Web Server? How will the network affect the performance? (LAN vs WAN) How many machines? Machine configuration? (how many CPUs, what speed, how many disks?) Determining server, host and network architecture Resource sizing to meet user performance expectations

Where’s the conflict? ( in other words, where is the engineering problem?) Clients’ performance needs and Service provider’s needs work against each other Universal Law of Resource Usage and:  Heavy Usage by multiple entities  Degraded performance Queuing Theory allows looking at systems which have resources under contention by multiple entities, in a formal manner  Allows prediction of performance under various system parameters, by using mathematical models

What/Why is a Queue? The systems whose performance we study are those that have some contention for resources  If there is no contention, performance is in most cases not an issue  When multiple “users/jobs/customers/ tasks” require the same resource, use of the resource has to be regulated by some discipline

…What/Why is a Queue? When a customer finds a resource busy, the customer may  Wait in a “queue” (if there is a waiting room)  Or go away (if there is no waiting room, or if the waiting room is full) Hence the word “queue” or “queuing system”  Can represent any resource in front of which, a queue can form In some cases an actual queue may not form, but it is called a “queue” anyway.

Examples of Queuing Systems CPU  Customers: processes/threads Disk  Customers: processes/threads Network Link  Customers: packets IP Router  Customers: packets ATM switch:  Customers: ATM cells Web server threads  Customers: HTTP requests Telephone lines:  Customers: Telephone Calls

Elements of a Queue Server Waiting Room/ Buffer/ Queue Queueing Discipline Customer Inter- arrival time Service time

Elements of a Queue Number of Servers Size of waiting room/buffer Service time distribution Nature of arrival “process”  Inter-arrival time distribution  Correlated arrivals, etc. Number of “users” issuing jobs (population) Queuing discipline: FCFS, priority, LCFS, processor sharing (round-robin)

Elements of a Queue Number of Servers: 1,2,3…. Size of buffer: 0,1,2,3,… Service time distribution & Inter-arrival time distribution  Deterministic (constant)  Exponential  General (any) Population: 1,2,3,…

Queue Performance Measures Queue Length: Number of jobs in the system (or in the queue) Waiting time (average, distribution): Time spent in queue before service Response time: Waiting time+service time Utilization: Fraction of time server is busy or probability that server is busy Throughput: Job completion rate

Queue Performance Measures Let observation time be T  A = number of arrivals during time T  C = number of completions during time T  B = Total time system was busy during time T Then:  Arrival Rate = = A/T  Throughput =  C/T  Utilization = ρ = B/T  Average service time =  = B/C  Service rate = 1/ 

Basic Relationships In “steady-state” for a stable system without loss (i.e. infinite buffer system)   Completion  rate = Arrival Rate, since “in-flow = out-flow”) If arrival rate > service rate, then    Utilization =  B/T = (B/C) x (C/T) = Average Service Time x Completion Rate =  =  for a loss-less system. For loss-full systems, if p = fraction of requests lost,  (1 – p) Throughput of a system,  Utilization x Service Rate =   (C/T) = (B/T) x (C/B)

Little’s Law: N =  R Average number of customers in a queuing system = Throughput x Average Response Time Applicable to any “closed boundary” that contains queuing systems  Some other assumptions Also, if L is the number in queue (not in service), and W is waiting time:  L =  W

Simple Example (Server)  Assume just one server (single thread)  Requests come 3 requests/second  Request processing time = 250 ms.  Utilization of server? 75%  Throughput of the server? 3 reqs/second  What if requests come in 5 reqs/second? Utilization = 100%, Throughput = 4 reqs/second  Waiting time (for 3 reqs/second?) L/3, where L is queue length. But what is L? Need Queuing Model

Classic Single Server Queuing model : “M/M/1” Exponentially distributed service time Exponentially distributed inter-arrival time Single Server Infinite buffer (waiting room) FCFS discipline Can be solved very easily, using theory of Markov chains

Exponential Distribution Memory-less distribution  Distribution of remaining time does not depend on elapsed time Mathematically convenient Realistic in many situations (e.g. inter-arrival times of calls) X is EXP( )  P[X < t] = 1 – e - t  Average value of X = 1/

Important Result! M/M/1 queue results Let be arrival rate and  be service time, and  = 1/  be service rate   Utilization Mean number of jobs in the system   /(1-  ) Throughput  Average response time (Little’s law):  R = N/ 

Response Time Graph Graph illustrates typical behaviour of response time curve This region, after which there is a sharp growth is often termed “knee of the curve”. Note that it is not a very well defined point

Multiple server queue: M/M/c One queue, c servers Utilization,  c a  = Average number of busy servers. Queue length equation exists (not shown here) For c = 2, queue length is: 2  -    Average Response Time?  Little’s Law! For c = 2, R = N/  -    Important quantity: termed traffic intensity or offered load

Finite Buffer Models: M/M/c/K c servers, buffer size K (total in the system can be c+K)  If a request arrives when system is full, request is dropped For this system, there will be a notion of loss, or blocking, in addition to response time etc. Blocking probability (p) is probability that arriving request finds the system full  Response time is relevant only for requests that are “accepted”

...Finite Buffer Queues Arrival rate: Service rate:  Throughput?   – p) Utilization?   c Blocking probability? Queue length (N)?  Formula exists (from queuing model) Waiting time? (Little's law = N/  )

Finite Buffer Queue: Asymptotic Behavior As offered load increases (    infinity) Utilization (  )     Throughput     c  Blocking probability  p  1 Queue length  N  c+ K Waiting time  K/(c  + 1/ 

Examples

Example-1 You are developing an application server where the performance requirements are:  Average response time < 3 seconds Forecasted arrival rate = 0.5 requests/second What should be the budget for service time of requests in the server? Answer:  <1.2 seconds.

Example-4: Multi-threaded Server Assume multi-threaded server. Arriving requests are put into a buffer. When a thread is free, it picks up the next request from the buffer.  Execution time: mean = 200 ms  Traffic = 30 requests/sec  How many threads should we configure? (assume enough hardware). Traffic Intensity = 30 x 0.2 = 6 = Average number of busy servers  At least 6  Response time = (Which formula?)

...Example-4 Related question: estimate average memory requirement of the server.  Likely to have: constant component + dynamic component  Dynamic component is related to number of active threads  Suppose memory requirement of one active thread = M  Avg. memory requirement= constant + M* 6

Example-5: Hardware Sizing Consider the following scenario:  An application server runs on a 24-CPU machine  Server seems to peak at 320 transactions per second  We need to scale to 400.  Hardware vendor recommends going to 32 CPU machine. Should you?

Example-5: Hardware Sizing First do bottleneck analysis! Suppose logs show that at 320 transactions per second, CPU utilization is 67% - What is the problem? What is solution?

Example-5: Hardware Sizing Most likely Explanation: Number of threads in server is < number of CPUs Possible diagnosis:  Server has 16 threads configured  Each thread has capacity of 20 transactions per second Total capacity: 320 reqs/second. At this load, 16 threads will be 100% busy  average CPU utilization will be 16/24=67% Solution: Increase number of threads – no need for CPUs.

Closed Queuing Systems

Typical Scenario M clients issue requests, wait for response, then “think”, then issue request again Let 1/ be mean think time, 1/  be mean service time. Scenario is actually of “closed” queuing network Clients Server

Observations Throughput? Arrival rate?  At steady-state, “flow” through both queues is equal  Server throughput = Server utilization X Service rate U X   Request is generated by each user once every [think time + response time] = 1/ (1/  + R) Overall request arrival rate = M / (1/ + R) = U *  Clients Server

Observations M / (1/ + R) = U *  = Throughput Response time = number of clients/Throughput – think time As number of clients increase, can be shown to tend to: M/  – 1/.  Linear function of M  Increase M by 1  Response time increases by 1/  Clients Server

Metrics from model Saturation number (number of clients after which system is “saturated”)  M * = 1 +  / M R

Queuing networks

Queuing Networks Jobs arrive at certain queues (open queuing networks) After receiving service at one queue (i), they proceed to another server (j), with some probability p ij, or exit

Open Queuing Network - measures Maximum Throughput Bottleneck Server Throughput

Open Queuing Network - measures Total time spent in the system before completion (overall response time, from the point of view of the user) start finish

Open Queuing Network: Example 1 W: Web Server A1, A2: Application Server 1, 2 D1, D2: Database Server 1,2 W A1 D1 D2 A2 p w0 p wA1 p wA2 p A1w p A2w p A1D1 p A2D2

...Open Queuing Network- Example 1 Each server has different service time But what is the request rate arriving to each server?  Need to calculate this using flow equations p w0 p wA1 p wA2 p A1w p A2w p A1D1 p A2D2

...Open Queuing Network- Example 1 Equations for Average number of visits before leaving the server network  v A1 = p wA1. v w + v D1, v D1 = p A1D1. v A1  v A2 = p wA2. v w + v D2, v D2 = p A2D2. v A2  V w = 1 + p A1w. v A1 + p A2w. v A2 p w0 p wA1 p wA2 p A1w p A2w p A1D1 p A2D2

...Open Queuing Network- Example 1 Spreadsheet calculation shows  How bottleneck server changes  How throughput changes  How response time changes Results are accurate for only some types of networks, for others, they are approximate p w0 p wA1 p wA2 p A1w p A2w p A1D1 p A2D2