Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example.

Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example

Goals: 1.Look at the nature of random data. What happens as random data is used in multiple operations? 2.Look at how network arrivals really work – are arrivals random or do they follow some other pattern? 3.Use our simulation techniques to study these patterns (so this is really an example of simulation usage). 4.Determine the difference in behavior as a result of network arrival patterns.

Random Data 1.Suppose we have a random number generator. And suppose we run a program using that data multiple times. 2.Do the results of those multiple program executions converge or diverge? 3.There is no simple intuitive answer to this question, so let’s try it. Random Arrivals

Random Data 1.Let’s take a very simple piece of code: if ( random() >= 0.5 ) HeadsGreaterThanTails++; else HeadsGreaterThanTails--; 2.When we run the program, we collect the value of the variable every 100 million iterations – and do it for a total of 1 billion iterations. 3.Here’s a sample run. IterationsProc 0 100,000,000-10299 200,000,000-4245 300,000,0005141 400,000,0003197 500,000,000-1313 600,000,000-25941 700,000,000-24093 800,000,000-24661 900,000,000-27123 1,000,000,000-23997 After 400 million iterations, there were 3192 more “heads” than “tails”.

Random Data 1.Now lets do that same thing for 8 processes 2.What do you think will happen to the numbers? –Will some process always have more heads than tails? –Will the difference between results for processes depend on how many iterations have been done? 3.Here’s the result for 8 processes: IterationsProc 0Proc 1Proc 2Proc 3 Proc 4Proc 5Proc 6Proc 7 100,000,000-10299-9319-106367438633-44218123-1367 200,000,000-4245-102273657-2305924885-2665525865-5871 300,000,0005141-6819255-2017514469-3338927077-7299 400,000,0003197-8155-5379-663327387-50509245312339 500,000,000-1313-10547-153-1467929335-5196323097-3705 600,000,000-25941-29847-26371502732857-4950527089-1659 700,000,000-24093-26331-434011315324471-268994561-47 800,000,000-24661-35315-312334120425-1186113837-4217 900,000,000-27123-33049-44461-11769-3283-1247715865-2107 1,000,000,000-23997-15483-4453522889-8447-13671157436023

Random Data And here’s the graph for those 8 processes – note there’s been a constant amount added to each value to get all the outputs positive.

Random Data As you can see in the last graph, the statistics are terrible – it’s hard to determine the pattern for multiple runs. So the program was run 10,000 times. And the minimum and maximum count was taken at each time interval for those 10,000 runs.

Random Data But, what happens if the processes doing random events interact with each other? This is the case if the programs are all accessing the same disk – we randomly choose which block in a large file is being written to. But each process must compete for the file lock and for disk access. Here’s the behavior of 10 disk-writing processes for 10,000 seconds. The numbers represent disk writes for that process during the time interval. SecsProc 0Proc 1Proc 2Proc 3Proc 4Proc 5Proc 6Proc 7Proc 8Proc 9 100021660216502181021800217902172021850217402164021730 200043000429604308043120432204296043190431104290043080 300064790646506485064930650606468064900648606477064940 400086610864508662086680867508653086640866608656086690 5000108450108280108370108450108520108410108480108380108400108580 6000130010129860129990129950129980130050130090130010129910130080 7000151730151600151710151730 151770151750151820151750151800 8000173340 173400173640173480173400173520173660173470173500 9000194950195050195010195300195090195000195230195440195130195150 10000216760216880216780217140216860216740216990217240216880216960

Random Data The accesses are clearly very close to each other

Random Data Comparing the 10 processes. This is the spread (difference) of the maximum less the minimum accesses for the process.

Random Data Comparing the 10 processes. Here’s how their relative performance varies over time. Note that no one process is always the minimum or the maximum performer.

Another Numerical Example I have two virtual cats, who share a single can of food at each meal. My cats are very finicky and get angry if their portions are unequal. I am finicky too, and I don't like dirtying dishes when I divvy it up. To split the food, then, I upend the open can of food onto a flat plate, then carefully lift the can off, leaving a perfectly formed virtual cylinder of food. Then I use the vanishingly small circular edge of the can to carefully cut the food into two exactly equal portions, one of which is shaped like a crescent moon, the other a cat's eye, or mandorla.

Another Numerical Example X X AA BB 

// ////////////////////////////////////////////////////////////////////// // We're trying to solve the following problem. // Given two circles, how close should the centers of the circles be such // that the area subtended by the arcs of the two circles is exactly one // half the total area of the circle. // // See example 2.3.8 in Leemis & Park. // We use the book's definition for Uniform - see 2.3.3 // Here's how this works. Try a number of different distances between // the two circle centers. Then for the ones that are most successful, // zoom in to do them in more detail. // ////////////////////////////////////////////////////////////////////// #include #define PI 3.1415927 #define TRUE 1 #define FALSE 0 // Prototypes double GetRandomNumber( void ); void InitializeRandomNumber( ); double ModelTwoCircles( double, int ); double Uniform( double min, double max) { return( min + (max - min)*GetRandomNumber() ); }

int main( int argc, char *argv[] ) { double Distance, Result = 0; double FirstSample = 0.1, LastSample = 1.9; double Increment, NewFirstSample; double BestDistance; int NumberOfSamples = 5000; int AnswerIsFound = FALSE; InitializeRandomNumber(); while ( !AnswerIsFound ) { printf( "\nNext Iteration starts at %f\n", FirstSample ); Increment = (LastSample - FirstSample)/10; NumberOfSamples = 2 * NumberOfSamples; for ( Distance = FirstSample; Distance <= LastSample; Distance += Increment ){ Result = ModelTwoCircles( Distance, NumberOfSamples ); if ( Result - 0.5000 > 0 ) NewFirstSample = Distance; if ( (0.5 - Result) < 0.0001 && (Result - 0.5) < 0.0001 ) { AnswerIsFound = TRUE; BestDistance = Distance; } printf( "Distance = %8.6f, Fraction = %8.6f\n", Distance, Result ); } FirstSample = NewFirstSample - 2 * Increment; LastSample = FirstSample + 4 * Increment; } printf( "\nThe best Distance is at %f using %d samples\n", BestDistance, NumberOfSamples ); }

double ModelTwoCircles( double Distance, int NumberOfSamples ) { double HitsInOneCircle = 0, HitsInTwoCircles = 0; double x, y, SecondDistance; int Samples; for ( Samples = 0; Samples < NumberOfSamples; Samples++ ) { do { x = Uniform( -1, 1 ); y = Uniform( -1, 1 ); } while ( (x * x) + (y * y) >= 1 ); // Loop until value in circle HitsInOneCircle++; SecondDistance = sqrt( ( x - Distance ) * (x - Distance ) + (y * y) ); if ( SecondDistance < 1.0 ) { HitsInTwoCircles++; // printf( "Samples: Second Distance = %8.6f\n", SecondDistance ); } } // End of for return( HitsInTwoCircles / HitsInOneCircle ); }

Network Arrivals 1.In our queueing analysis, we’ve assumed random arrivals (Poisson distribution, with exponentially distributed inter-arrival times.) 2.This leads to our analysis of M/M/1 queues with –Utilization = Service Time/Arrival Time and with –Queue Length = U / ( 1 – U ). 3.We generated uniformly distributed random numbers and based on those were able to derive the exponential arrival times and Poisson distributions. But is this how networks behave? Random Arrivals

Network Arrivals On the Self-Similar Nature of Ethernet Traffic Leland, Taqqu, Willinger, Wilson. IEEE/ACM ToN, Vol. 2, pp 1-15, 1994 1.Establish self-similar nature of Ethernet traffic 2.Illustrate the differences between self-similar and standard models 3.Show serious implications of self-similar traffic for design, control and performance analysis of packet-based communication systems Self-Similar Arrivals This how networks really behave?

What Did Leland et. al Measure? Millions of packets from many workstations, as recorded on Bellcore internal networks.

What Did Leland et.al Measure? Significance of self-similarity Nature of traffic generated by individual Ethernet users. Aggregate traffic study provides insights into traffic generated by individual users. Nature of congestion produced by self-similar models differs drastically from that predicted by standard formal models. We will show this by the simulation we perform here. Why is Ethernet traffic self-similar? Plausible physical explanation of self similarity in Ethernet traffic. (People don’t generate traffic randomly. They come to work at the same time, get tired at the same time, etc.) Mathematical Result Superposition of many ON/OFF sources whose ON-periods and OFF- periods have high variability or infinite variance produces aggregate network traffic that is self-similar or long range independent. (Infinite variance here means that there are some samples with a very long inter-arrival time (lunch hour is a very long time!)

What Did Leland et.al Measure? So are these bursts “random”? Can you tell by looking at the data. The answer is the data is bunched together – it’s not spread uniformly – and to be self-similar, the “bunches” themselves form “super-bunches”.

Where does “Self-Similar” Data Occur? It occurs throughout nature. Also called Pareto Distribution, Bradford, Zipf, and various other names. Distribution of books checked out of a library. Distribution of lengths of rivers in the world. It’s NOT the same as an exponential distribution! (But it can look fairly close.) Fractals are an example of self-similarity.

Exponential and Self-Similar Data Exponential Cumulative Function F(x) = 1 – e (-ax) Exponential Probability Density Function (PDF) f(x) = a e (-ax) Pareto Cumulative Function F(x) = 1 – (X 0 / (X 0 + x) ) b Pareto Probability Density Function (PDF) f(x) = b X 0 b / (X 0 +x) (b+1) In these equations: a = 1 (exponent falls to 1/e when x = 1.) The mean of these values is 1. Turns out the variance is also 1. The exponent is special that way. X0 is = 2. Then b was adjusted so that it gave a mean of 1. Arrivals for both distributions therefore have the same mean value.

Exponential and Self-Similar Data Exp PDF (Black) Pareto PDF (Purple) Note that the Pareto data has a higher value at the limits – this is what leads to it being self-same and to the data having a large variance.

Simulation Example 25 Simulation So I wrote a simulator. There are two parts I especially want to show you: 1.The “guts” of the simulator – how events are taken off a queue and are processed; that processing generates new events. 2.How data is generated – starting with a random number in the range 0  1, how do we get an exponential distribution. Here’s the code I used for the simulation. It’s not beautiful, but the price is right. http://www.cs.wpi.edu/~jb/CS533/Lectures/ArrivalSimulation.c

Simulation Example 26 Simulation SCHEMATIC OF EVENT DRIVEN SIMULATION OF A NETWORK Initialize Determine Next Event Set current time to the time of this event. Packet approaches network Event Queue Is it arrival or completion? Put packet on network; if queue WAS empty, generate a completion event Network Service Completed Take packet off queue; if queue still has a packet, then generate completion. Update Statistics Determine when next packet will finish. Determine future time for next packet arriving. Generate event for “Packet arrives at Q"Generate event for “Service Completed"

Simulation Example 27 The Guts of the Simulation while( Iterations < RequestedArrivals ) { RemoveEvent( &CurrentSimulationTime, &EventType ); if ( EventType == ARRIVAL ) { if ( ArrivalDiscipline == EXPONENTIAL ) NextEventTimeInterval = GetExponentialArrival( ExponentialArrivalValue ); if ( ArrivalDiscipline == PARETO ) NextEventTimeInterval = GetParetoArrival( ParetoArrivalValue ); StoreStats( NextEventTimeInterval ); AddEvent( CurrentSimulationTime + NextEventTimeInterval, ARRIVAL ); if ( QueueLength == 0 ) { // Schedule completion event for this request NextEventTimeInterval = GetExponentialArrival( ServiceRate ); AddEvent( CurrentSimulationTime + NextEventTimeInterval, COMPLETION ); } // Do counting of state for stats purposes QueueLength++; } // End of EventType == ARRIVAL if ( EventType == COMPLETION ) { QueueLength--; if ( QueueLength > 0 ) { // Something else needs service NextEventTimeInterval = GetExponentialArrival( ServiceRate ); AddEvent( CurrentSimulationTime + NextEventTimeInterval, COMPLETION ); } } // End of EventType == COMPLETION } // End of while iterations // Print out the statistics: PrintStats();

Simulation Example 28 Data Generation Here’s the question we want to answer – given a PDF, how do we find what value generates a particular value of that PDF. For instance, applying this question to the Exponential Probability Density Function (PDF) f(x) = a e (-ax), or f(x) = e –x for a == 1. what value of x produces the resultant f(x)? We generate random numbers in the range of 0  1. These are the f(x). So what values of x will give us this range of f(x)? For x = 0, f(x) == 1; For x = infinity, f(x) = 0. This inverse mapping is most easily accomplished by taking the inverse function. x = -ln( f(x) )  x = -ln( rand() ) Here’s the essence of this code: double GetExponentialArrival( double Argument ) { return( -log( 1.0 - GetRandomNumber() )/ Argument ); } // End of GetExponentialArrival

Simulation Example 29 Data Generation So having an inverse function is very nice – it’s one reason that using exponential function is so handy, and so universal. But for the Pareto PDF f(x) = b X 0 b / (X 0 +x) (b+1) The inverse function is much more difficult to find in this case. I solved this by doing a search. The binary search algorithm goes like this: 1.Pick a random number in the range 0  1; R = random(); 2.Calculate an f(y), and f(z) such that one of these is larger than R and one is smaller than R. 3.Calculate f( (y + z )/2 ) – for a value half way between y and z. 4.Determine y and z such that f(y) and f(z) again straddle R. 5.Loop to Step 3 until the value of ( R – f(y) ) is arbitrarily small. All this is messy and compute intensive – but that’s the way it is when there’s no inverse function.

Simulation Example 30 Simulation Results Results look very similar to the analytical functions.

Simulation Example 31 Simulation Results The Q lengths are larger for Pareto Data. Does this make sense?

Simulation Example 32 Graphs The Utilization is larger for Pareto Data. Does this make sense?

Simulation Example 33 Marriage & Divorce Simulation The goal of this exercise to show the simulation of a “society”. In the larger context, it’s an example of how students might perform a simulation. Given a body of data, how do we arrange that data in order to represent how the society is behaving. This is essentially a “model” using the data. There are three ways we go about putting numerical values on this model.: 1.Given a series of equations, can we simply solve the equations? 2.If the equations don’t have a closed form solution, can we solve them recursively. There are no statistics involved here, but all we do is solve each equation over and over again and hope that it converges. This method gives us no details about the population since we’re simply solving equations. 3.We can try for a “real” simulation. In this case, we use the probabilities and a random generator to try to simulate good years and bad years. This allows us to answer much more complex situations. We could now track characteristics for each individual in our society. We could, possibly, see how long a person in our society stays married for instance.

Simulation Example 34 Marriage & Divorce Simulation There’s lots of stuff on the web, confusing and maybe contradictory: All data is for the US. In 2007, there were 2,200,000 marriages. This represents a rate of 7.5 per 1000 total population. Note this is 2.2M / 296M = 7.5. (Total US population is higher but some states don’t report.) Another metric which may be saying the same thing is that there are 39.9 marriages per 1000 single women. We’re going to use the first number here. In 2007, there were 856,000 divorces. This is 3.6 per 1000 total population. Interesting numbers, but not used here: 41% of 1 st marriages end in divorce. 60% of 2 nd marriages end in divorce. 74% of 3 rd marriages end in divorce. The average remarriage occurs 3.3 years after a divorce. In 2007 there were 2.400,000 deaths representing a rate of 8.2 per 1000. Details of this on next page. 60% of all marriages last until 1 partner dies Birth rate is 13.8 per 1,000 population Recent statistics say that 51% of the adult population is married. This is important because we don’t use it directly as one of our equations – we use it to test if our model gives approximately this answer.

Simulation Example 35 Marriage & Divorce Simulation In 2007 there were 2.400,000 deaths representing a rate of 8.2 per thousand. Details on this mortality data are for men and women 65+ : Death rate for married man is defined as 1.00 Death rate for a widowed man is 1.06 times that of a married man. Death rate for a divorced or separated man is 1.14 times that of a married man. Death rate for a never-married man is 1.05 times that of a married man. Death rate for married woman is defined as 1.00 Death rate for widowed woman is defined as 1.15 Death rate for divorced or separated woman is defined as 1.26 Death rate for a never-married woman is 1.18 times that of a married woman. This information is from “US Mortality by Economic, Demographic, and Social Characteristics: The National Longitudinal Mortality Study”, Sorlie, Backlund, and Keller, 1995 We use a rate that’s above and below the 8.2 per 1000 for the national average to take into account single and married rates. DeathMarriedRate = 7.6 per 1000 DeathSingleRate = 8.7 per 1000

Simulation Example 36 Zombie Single Married Reincarnation = 100% Death while Married Death while Single Birth Rate Marriage Rate Divorce Rate Widowed Marriage & Divorce Simulation

Simulation Example 37 Leaving Zombie:  Z = - R birth * ( S + M ) Entering Zombie:  Z = + R death-single * S + R death-married * M Leaving Single:  S = -2 * R marriage * ( S + M ) - R death-single * S Entering Single:  S = + R birth * ( S + M ) + 2 * R divorce * ( S + M ) + R death-married * M Leaving Married:  M= -2 * R divorce * ( S + M ) - R death-married * M Entering Married:  M= + 2 * R marriage * ( S + M ) In Steady State – leaving equals entering + R death-single * S + R death-married * M - R birth * ( S + M ) = 0 + R birth * ( S + M ) + 2 * R divorce * ( S + M ) + R death-married * M -2 * R marriage * ( S + M ) - R death-single * S = 0 + 2 * R marriage * ( S + M ) - 2 * R divorce * ( S + M ) - R death-married * M = 0 Marriage & Divorce Simulation

Simulation Example 38 In Steady State – leaving equals entering + R death-single * S + R death-married * M - R birth * ( S + M ) = 0 + R birth * ( S + M ) + 2 * R divorce * ( S + M ) + R death-married * M -2 * R marriage * ( S + M ) - R death-single * S = 0 + 2 * R marriage * ( S + M ) - 2 * R divorce * ( S + M ) - R death-married * M = 0 Rearranging these equations gives: - R birth * ( S + M ) + R death-single * S + R death-married * M = 0 + R birth * ( S + M ) - 2 * R marriage * ( S + M ) + 2 * R divorce * ( S + M ) - R death-single * S + R death-married * M = 0 + 2 * R marriage * ( S + M ) - 2 * R divorce * ( S + M ) - R death-married * M = 0 Maybe there’s a solution, but they seem redundant to me. Marriage & Divorce Simulation Here are links to the code and executables for this simulation: MarriageAndDivorceSimulation1.c MarriageAndDivorceSimulation1.c // Recursively solves the equations MarriageAndDivorceSimulation1.exe MarriageAndDivorceSimulation2.c MarriageAndDivorceSimulation2.c // Does a statistical simulation MarriageAndDivorceSimulation2.exe

Simulation Example 39 WRAPUP This section has shown the result of a simulation. It’s gone through the coding, the data generation, and the interpretation of results. If network arrivals are Self-Similar, what about all kinds of other data generated by computers? What about requests arriving at a disk? What about processes arriving at a ready queue? Is there any computer data that REALLY is random, or is it all self-similar?

Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example.

Similar presentations

Presentation on theme: "Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example.

Similar presentations

Presentation on theme: "Performance Engineering Prof. Jerry Breecher Looking at Random Data & A Simulation Example."— Presentation transcript:

Similar presentations

About project

Feedback