STATISTICS AND PROBABILITY IN CIVIL ENGINEERING TS4512 Doddy Prayogo, Ph.D.
2. Using Numerical Measures to Describe Data 2.1. Measures of Central Tendency Measures of central tendency provide numerical information about typical observation in the data. The mean, median, mode, range, and geometric mean 2.1.1. Mean (Arithmetic mean) The arithmetic mean is the sum of the data values divided by the number of observations If the data set is the entire population of data, then the population mean, µ, is a parameter. If the data set is from a sample, then the sample mean, , is a statistic.
Mean formula: 2.1.2. Median The median is the middle observation of a set of observations that are arranged in increasing (or decreasing) order. If n is odd, the sample median is the number in position (n +1):2 If n is even, the sample median is the average of the number in positions n/2 and ( n/2 + 1)
2.1.3. Mode and the range The mode, if one exists, is the most frequently occuring value. If several values occur with equal frequency, each one is a mode. The range is the difference between the largest and smallest values in a sample. It is a measure of spread, but it is rarely used, because it depends only on the two extreme values and provides no information about the rest of the sample.
The geometric mean is the n th root of the product of n numbers The geometric mean rate of return is the geometric mean. 2.1.5. The trimmed mean The trimmed mean is a measure of center that is designed to be unaffected by outlier. It is computed by arranging the sample values in order, ‘trimming’ an equal number of them from each end, and computing the mean of those remaining.
Example An investor invests $100 and receives the following returns: Year 1: 3% Year 2: 5% Year 3: 8% Year 4: -1% Year 5: 10% Find the annual growth rate of his investment! The geometric mean is: [(1.03*1.05*1.08*.99*1.10) ^ (1/5 or .2)]-1= 4.93%. The average return per year is 4.93%, slightly less than the 5% computed using the arithmetic mean. Actually as a mathematical rule, the geometric mean will always be equal to or less than the arithmetic mean. Read more: Breaking Down The Geometric Mean | Investopedia http://www.investopedia.com/articles/investing/071113/breaking-down-geometric-mean.asp#ixzz4I8KjuBl3 Follow us: Investopedia on Facebook
The $100 grew each year as follows: Year 1: $100 x 1. 03 = $103 The $100 grew each year as follows: Year 1: $100 x 1.03 = $103.00 Year 2: $103 x 1.05 = $108.15 Year 3: $108.15 x 1.08 = $116.80 Year 4: $116.80 x 0.99 = $115.63 Year 5: $115.63 x 1.10 = $127.20 The geometric mean is: [(1.03*1.05*1.08*.99*1.10) ^ (1/5 or .2)]-1= 4.93%. The average return per year is 4.93%, slightly less than the 5% computed using the arithmetic mean. Actually as a mathematical rule, the geometric mean will always be equal to or less than the arithmetic mean.
Exercise An investor holds a stock that has been volatile with returns that varied significantly from year to year. His initial investment was $100 in stock A, and it returned the following: Year 1: 10% Year 2: 150% Year 3: -30% Year 4: 10% Find the annual growth rate of his investment
Solution Year 1: $100 x 1.10 = $110.00 Year 2: $110 x 2.5 = $275.00 Year 3: $275 x 0.7 = $192.50 Year 4: $192.50 x 1.10 = $211.75 The resulting geometric mean, or a compounded annual growth rate (CAGR), is 20.6%, much lower than the 35% calculated using the arithmetic mean.
Exercise The values of fracture stress (in Mpa) were measured for a sample of 24 mixtures of hot- mixed asphalt such as: 30; 75; 79; 80; 80; 105; 126; 138; 149; 179; 179; 191; 223; 232; 232; 236; 240; 242; 245; 247; 254; 274; 384; 470. Compute the mean, median, and the 5%, 10%, and 20% trimmed mean.
2.1.6. Shape of a distribution The shape of distribution reveals whether data are evently spread from its middle or center. Symmetry. The shape of a distribution is said to be symmetry if the observations are balanced, or approximately evently distributed, about its middle. Skewness. A distribution is skewed, or asymmetric, if the observations are not symmetrically distributed on either side of the middle. A positively skewed distribution has a tail that extends to the right. A negatively skewed distribution has a tail that extends to the left.
Negatively skewed distribution: mean<median Symetric distribution: mean = median Positively skewed distribution: mean>median
Solution: *The mean is found by averaging together all 24 numbers, wich produces a value of 195.42. *The median is the average of the 12th and 13 th numbers, which is (191+223):2 = 207.00 *To compute the 5% trimmed mean, it must be dropped 5% of the data from each end. This come to (0.05) (24) = 1.2 observation. It is rounded 1.2 to 1, and trim one observation off each end. The 5% trimmed is the average of the remaining 22 numbers: (75+79+......+274+384):22= 190.45
To compute the 10% trimmed mean, round off (0. 1)(24)=2. 4 to 2 To compute the 10% trimmed mean, round off (0.1)(24)=2.4 to 2. Drop 2 observations from each end, and then average the remaining20: (79+80+.......+254+274):20 = 186.55 To compute the 20% trimmed mean, round off (0.2)(24)=4.8 to 5. Drop 5 observations from each end, and then average the remaining14: (105+126+....+242+245):14 = 194.07
2.2.4. Variance and standard deviation *The population variance , is the sum of the squared differences between each observation and the population mean divided by the population size, N. *The sample variance , is the sum of the squared differences between each observation and the sample mean divided by the sample size n minus 1.
*The population standard deviation , is the (positive) square root of the population variance and is defined as follows:
Example: Compute the variance and standard deviation of the ten observations of executive exercise time listed here: 20;35;28;22;10; 40;23;32; 28;30.
Exercise
Executive exercise time: Times (Minutes) xi Deviation about the mean (xi - xˉ) Squared Deviation about the mean (xi - xˉ). (xi - xˉ) 20 35 28 22 10 40 23 32 30 -6.8 8.2 1.2 -4.8 -16.8 13.2 -3.8 5.2 3.2 46.24 67.24 1.44 23.04 282.24 174.24 14.44 27.04 10.24
* Coefficient of Variation The coefficient of variation expresses the standard deviation as a percentage of the mean.
* Weighted Mean and Measures of Grouped Data
2.2.5. Measures of Relationships Between Variables * Covariance Covariance (Cov) is a measure of the linear relationship between two variables. A positive value indicates a direct or increasing linearreationship, and negative value indicates a decreasing linear relationship *A population covariance is where xi and yi are the observed values, µx , µ y are the population means, and N is the population size.
* A sample covariance is: Where xi and yi are the observed values, and are the sample means, and n is the sample size. * Correlation coefficient The correlation coefficient is computed by dividing the covariance by the product of the standard deviations of the two variables. * A population correlation coeeficient, ρ, is
*A sample correlation coefficient r, is where and are the population standard deviation of the two variables, and Cov (x,y) is the population covariance. *A sample correlation coefficient r, is where sx and sy are the sample standard deviations of the two variables, and Cov(x,y) is the sample covariance. A useful rule to remenber is that a relationship exists if
Example: Aptitude test score and sales