Information Analysis Gaussian or Normal Distribution
:= mean, estimated as x x = observed sample mean = 3 x/n F= standard deviation, estimated as s n = sample size S= observed standard deviation : F Area under curve = 1
: F Coefficient of Variation C v = 150/20 = 7.5 C v = 150/60 = 2.5
Example 100 kg of glass is recovered from municipal refuse and processed. The glass is crushed and sieved. Lot the cumulative distribution of particle size from the data below 4 mm holes10 kg glass remained on the sieve (90 kg went through) 3 mm holes25 kg remained on the sieve 2 mm holes35 kg remained on the sieve 1 mm holes20 kg remained on the sieve No holes10 kg went all the way through Sieve SizeFraction Retained 410/100 = /100 = /100 = /100 = 0.20 <110/100 = 0.1
Cumulative Distribution Sieve SizeFraction Smaller Than sieve size 41 – 0.1 = – ( ) = –( ) = – ( ) = 0.1
Graphs Independent variableAbscissa (x-axis) Dependent variableOrdinate (y-axis) A variable is independent if the value is chosen, like sieve size in the previous example. A value is dependent if is determined by experiment
Probability Paper X-axis is linear Y-axis is plotted so that if the probability is normal (Gaussian) then the cumulative probability will plot as a straight line. If this is the case the mean is at 0.5 or 50% and the standard deviation is on either side of the mean. You can also calculate s by: s = 2/5(x 90 – x 10 )
Example Consider the recycled glass data from the previous example. What is the mean, the standard deviation, and the 95% interval? The mean is the value on the x-axis when the y-axis value is 0.5, 2.4 mm. The standard deviation is the spread around the mean so that 68% of the data fall into the range (or about 34% on either side of the mean) = 0.84, which corresponds to 3.5 mm, so s = 3.5 – 2.4 = 1.1, or: S=2/5( ) = 1.16 The 95% interval means 95% of the data is in the range, or between and 0.975, or 0.2 mm and 4.8 mm
Return Period Return period is how often an event is expected to recur. If the annual probability of an event occurring is 5%, then the event can be expected to occur once every 20 years, or have a return period of 20 years: Return period = 1/fractional probability To determine return periods, first rank time-variant data (smallest to largest or largest to smallest) then calculate the probabilities and plot the data.
Return Period Example The data below are from a wastewater treatment plant. BOD is the measure of organic pollution in a water. The BOD is measured daily.. Does this data fit the normal distribution? Can it be used to calculate the mean and standard deviation? What is the worst quality expected in 30 days?
First, rank the data: Now plot the data. We will plot m/n (which is the probability), versus the BOD
It does fit the normal distribution fairly well The mean is about 35 mg/L BOD To find the worst quality in a 30 day period, calculate: 29/30 = This is the fraction of days the quality is better than the worst day out of 30 days Enter the graph at and find the answer: 67 mg/L BOD
Sometimes data is analyzed after it is grouped. Often the mean is used to analyze the data. Example: Using the data from the previous problem estimate the highest expected BOD to occur once every 30 days using grouped data analysis First define groups of BOD values.
Now plot these data Notice how the data points form a curve. This means the data don’t really fit the normal Distribution, but we’ll go ahead anyway Now P 29/30 = and we read 67 mg/L BOD from the graph.