Review Session Jehan-François Pâris. Agenda Statistical Analysis of Outputs Operational Analysis Case Studies Linear Regression.

Slides:



Advertisements
Similar presentations
E&CE 418: Tutorial-4 Instructor: Prof. Xuemin (Sherman) Shen
Advertisements

Queueing Models and Ergodicity. 2 Purpose Simulation is often used in the analysis of queueing models. A simple but typical queueing model: Queueing models.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Ch11 Curve Fitting Dr. Deshi Ye
10/12/1999(c) Ian Davis1 Predicting performance Topics: –Operational analysis of network models –Markov models.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Point and Confidence Interval Estimation of a Population Proportion, p
Chapter 6 Continuous Random Variables and Probability Distributions
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Statistics.
Little’s Theorem Examples Courtesy of: Dr. Abdul Waheed (previous instructor at COE)
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Chapter 5 Continuous Random Variables and Probability Distributions
7/3/2015© 2007 Raymond P. Jefferis III1 Queuing Systems.
Introduction to Queuing Theory. 2 Queuing theory definitions  (Kleinrock) “We study the phenomena of standing, waiting, and serving, and we call this.
Chapter 4 Continuous Random Variables and Probability Distributions
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Introduction to Queuing Theory
Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
OPSM 301: Operations Management
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Simulation Output Analysis
AN INTRODUCTION TO THE OPERATIONAL ANALYSIS OF QUEUING NETWORK MODELS Peter J. Denning, Jeffrey P. Buzen, The Operational Analysis of Queueing Network.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Verification & Validation
Chapter 6: Probability Distributions
Introduction to Queuing Theory
Copyright ©: Nahrstedt, Angrave, Abdelzaher, Caccamo1 Queueing Systems.
Probability Review Thinh Nguyen. Probability Theory Review Sample space Bayes’ Rule Independence Expectation Distributions.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Lecture 14 – Queuing Networks Topics Description of Jackson networks Equations for computing internal arrival rates Examples: computation center, job shop.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
1 Chapters 8 Overview of Queuing Analysis. Chapter 8 Overview of Queuing Analysis 2 Projected vs. Actual Response Time.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Chapter Outline 2.1 Estimation Confidence Interval Estimates for Population Mean Confidence Interval Estimates for the Difference Between Two Population.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Chapter 01 Probability and Stochastic Processes References: Wolff, Stochastic Modeling and the Theory of Queues, Chapter 1 Altiok, Performance Analysis.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright ©: Nahrstedt, Angrave, Abdelzaher, Caccamo1 Queueing Systems.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Random Variables r Random variables define a real valued function over a sample space. r The value of a random variable is determined by the outcome of.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Managerial Decision Making Chapter 13 Queuing Models.
The simple linear regression model and parameter estimation
Regression and Correlation
OPERATING SYSTEMS CS 3502 Fall 2017
Discrete Random Variables
CPE 619 Mean-Value Analysis
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
A Session On Regression Analysis
Queuing Theory Queuing Theory.
ECE 358 Examples #1 Xuemin (Sherman) Shen Office: EIT 4155
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Lecture 2 Part 3 CPU Scheduling
CSE 550 Computer Network Design
Presentation transcript:

Review Session Jehan-François Pâris

Agenda Statistical Analysis of Outputs Operational Analysis Case Studies Linear Regression

How to use this presentation Most problems have  One slide stating the problem  One slide explaining how to solve the problem  One slide allowing you to check your answer You will learn more by trying first to do the problems on your own than by reading their solutions Do not forget either to review the problems in the original notes

Statistical Analysis of Outputs

The big picture The problems  Constructing confidence intervals  Handling auto correlated data The tools  Central-Limit Theorem  Wilson’s formula  Batch means (and regeneration)  RNG tricks

Confidence Intervals Distinguish between  CIs for means CSIM does it for you  CIs for proportions We are on our own Major issue is independence of data points CSIM uses batch means

Central Limit Theorem If the n mutually independent random variables x 1, x 2, …, x n have the same distribution, and if their mean  and their variance  2 exist then …

Central Limit Theorem The random variable is distributed according to the standard normal distribution (zero mean and unit variance).

CI for means (I) For large values of n, the (1-  )% confidence interval for  is given by with

CI for means (II) F(z) is taken from a table of the normal distribution  F(0.025) = 1.96 For smaller values of n, we have to use Student’s t random variable  Wider CIs We replace  by the sample standard deviation s

Example We have  100 observations for the waiting time  xbar = 4.25 minutes  s 2 = 25

Example We have  100 observations for the waiting time  xbar = 4.25 minutes  s 2 = 25 Answer is  4.25 ± 1.96 sqrt(25/100) = 4.25 ± 0.98

CI for proportions A proportion represents the probability P(X  ) for some fixed threshold  97% of our customers have to wait less than one minute Distributed according to a binomial law  Use Wilson ’ s formula

Wilson’s formula When n > 29, we can use the Wilson’s interval where z  /2 = 1.96 for a 95% C.I.

Example We have want to estimate the proportion of packets that wait more than four slots  400 observations  40 packets waited more than four slots

Answer Divisor:  /400  1.01 (instead of ) Central term  /(2×400)  (instead of 1.048) Half width  sqrt( (0.1×0.9)/ /(2×400 2 ) )  sqrt (0.09/400 + (4/800)/400)  1/20 sqrt ( )  0.3/20 = Result is  (0.105 ± 0.015)/ 1.01 = ± 0.015

Batch means (I) Simulation data are often autocorrelated  Packet delays in ALOHA  Waiting times in queues  … Batch means reduce (but do not completely eliminate) that effect

Batch means (II) Group measurements into fixed-size batches of consecutive data Compute mean of each batch If batches are large enough, these means will be independent  Can use standard-limit theorem, … In case of doubt, compute autocorrelation function for successive batch means

Regeneration (I) The idea  Partition simulation data into intervals such that Data measured inside the same interval might be correlated Data measured in different intervals are independent

Regeneration (II) How?  System goes to a regeneration point each time Its queues become empty All the disk drives are operational …  Criterion is system specific

Streams When you want to evaluate two different configurations of a system, it is often good idea to use separate random number streams for arrivals and service times  Arrival times remain unchanged when we change other parameters of the system

Operational Analysis

Single server (I) We can measure  T the length of the observation period  A the number of arrivals during the observation period  B the total amount of busy times during the observation period  C the number of completions during the observation period

Single server (II) We can compute  = A/Tthe arrival rate  X = C/Tthe output rate  U = B/Tthe utilization  S = B/Cthe mean service time There are two ways to compute U  U = B/T = (C/T )(B/C) = XS In general A  C and  X

Little’s law If W is the total time spent by all tasks inside the system over the observation period, then  N = W/T  R = W/C Since W/T = (C/T)(W/C) = XR, N = XR This is important

A problem An ice-cream parlor  Observed during 6 hours  Visited by 120 customers  Spend an average of 24 minutes inside What is the average number of customers inside the parlor?

Answer We compute X and apply Little’s Law

Answer We compute X and apply Little’s Law  X = 120/6 = 20 customers/hour  R = 24 minutes = 0.4 hours  N = XR = 8 customers

If you did not get it The 120 customers sent a total of 120×24 customer×minutes or 48 customer×hours in the parlor  48 customer×hours/6 hours = 8 customers Same as having 8 customers spending six hours each inside the parlor

Network of servers (I) Arrivals Departures Open network

Network of servers (II) Arrivals Departures Closed network

Operational Quantities Keep same quantities as before but add indices  0 for whole system  k for individual servers Two changes  We never care about the utilization of the whole system  We add number of visits V k of each server

Operational quantities Over the observation period, we measure  C = the number of job completions  C k = the number of tasks completed by device k We define  X 0 = C/T = the system throughput  X k = C k /T = the output rate at server k  V k = C k /C = the visit count at server k

Important relationships C k = V k C  Since each job requires V k visits, there are V k more server completions than job completions X k = V k X 0  Same property applies to throughputs

System response time (I) We define  Nbar = average number of jobs in the system  nbar i = average number of jobs at device i Nbar = Σ i nbar i

System response time (II) Applying Little’s law, we have R = Nbar/X 0 and nbar i = R i X i = R i V i X 0 Hence R = Σ i V i R i

Note This result is trivial  The total time spent by a job in the system is the sum of the times spent at each server This includes the time spent waiting in the server queues

Problem 1 A job requires  100 ms of CPU time  9 disk accesses Each disk access takes 7 ms We want  V CPU and S CPU

Answer We now that jobs get CPU first and last  V CPU = 10 Then  S CPU = 100/10 =10s

Bottleneck analysis (I) A system has one CPU and one disk drive It processes transactions such that  V CPU = 12 and S CPU = 5ms  V Disk = 11 and S DISK = 8ms What is the maximum system throughput?

Bottleneck analysis (II) We compute first the maximum device throughputs Maximum X CPU = 1/0.005 = 200 requests/s Maximum X disk = 1/0.008 = 125 requests/s Since X i = V i X 0  Maximum throughput compatible with CPU workload is 200/12 = 16.7 transactions/s  Maximum throughput compatible with disk workload is 125/11 = 11.4 transactions/s

Bottleneck analysis (III) The disk is this the bottleneck  It has highest V i S i product Identifying feature of any bottleneck device Increasing the system throughput might require  Sharing disk requests with a second disk  Increasing the efficiency of the system I/O buffer

Problem 2 In the previous example, which device was the bottleneck? What would be the throughput of the system if the bottleneck utilization was 80%?

Answer We compare  V CPU S CPU  V disk S disk

Answer We compare  V CPU S CPU = 100ms  V disk S disk = 9×7 = 63 ms The CPU is the bottleneck

Answer If the bottleneck was operating at 100% utilization,  It could process one job each V CPU S CPU time units  Or 1/(V CPU S CPU ) job per time unit At U CPU utilization,  It will process U CPU /(V CPU S CPU ) job per time unit

Answer X 0 = U CPU /(V CPU S CPU ) = 0.80/0.10 seconds  8 jobs/second

Systems with terminals M Terminals Whole system

Interactive response time formula We have  M terminals  Think time Z between the completion of a job and the submission of the next job Applying Little’s law to the whole system M = (R + Z ) X 0 then R = M/X 0 – Z Very Important

Problem 3 We have  M = 50 users  Z = 20 s  X 0 = 2 transactions/s What is the system response time?

Answer We apply R = M/X 0 – Z

Answer We apply R = M/X 0 – Z and obtain R = 50/2 – 20 = 5 seconds

Problem 4 A system  Processes 5 transactions/seconds  Has 60 users  Achieves a response time of 4 seconds What is the think time?

Answer We apply R = M/X 0 – Z,  Z = M/X 0 – R

Answer We apply R = M/X 0 – Z,  Z = M/X 0 – R = 60/5 – 4 = 8 seconds

Problem 5 We have  M = 50 users  Z = 20 s  R = 4 s What is the system throughput?

Answer From R = M/X 0 – Z, we have X 0 = (R + Z)/M Hence X 0 = (20 + 4)/50 = 0.48 tasks/s

Problem 6 A system  Can process up to 4 transactions/second  Has 60 users  User think time is 12 seconds Can the system achieve a response time of 2 seconds?

Answer Applying R = M/X 0 – Z, we compute a lower bound for the response time  R min = M/X 0,max – Z

Answer Applying R = M/X 0 – Z, we compute a lower bound for the response time  R min = M/X 0,max – Z = 60/4 – 12 = 3 seconds Answer is no

Problem 7 Compute the response time of a system knowing the following parameters  M = 50 users  Z = 15 s  V CPU S CPU = 200ms  U CPU = 50%

Answer Since X k = U k /S k and X k = V k X 0, X 0 = U k /(V k S k ) The response time is then given by R = M/X 0 – Z

Answer Let us compute first the throughput X 0  Applying X 0 = U k /(V k S k ) X 0 = 0.50/0.200 = 2.5 interactions/s The response time is then R = M/X 0 – Z = 50/2.5 – 15 = 5 s

Simulation Case Studies

A simple reminder If interarrival times are  Independent identically distributed (i. i. d.)  According to an exponential law then the probability of having exactly n arrivals during a fixed interval is distributed according to a Poisson law

Explanation (II) Assume that  The probability of one arrival during a small interval  t is  t  The probability of two arrivals during the same small time interval is negligible tt tt tt tt tt tt

Explanation (I) The probability of having exactly k arrivals during n slots is What would happen if the number of time intervals goes to infinity while their total duration T = n  t remains constant

Explanation (III) We rewrite the previous expression as and compute separately the limits of its four factors

Explanation (IV)

Explanation (V) We obtain the Poisson distribution The probability that there are no arrivals in the same time interval T (or in any time interval T ) is

Explanation (VI) This last expression is the probability that the time interval between two consecutive arrivals is greater than T The probability that the time interval between two consecutive arrivals is equal or lesser than T is which is the cdf of the exponential distribution

A final observation Use the Poisson distribution to generate number of arrivals during a time interval Use the exponential distribution to generate interarrival times

Linear Regression

Most important point Compute a regression line Compute regression coefficient

Example

Linear Regression We have  one independent variable  One dependent variable We must find Y =  +  X minimizing the sum of squares of errors  i (y i -  -  x i ) 2

Formulas

Calculations (I)

Calculations (II)

Outcome

More notations

More notations (II) Solution can be rewritten

Coefficient of correlation r = 1 would indicate a perfect fit r = 0 would indicate no linear dependency

Calculations