Data Analysis Statistical Measures Industrial Engineering
Aside: Mean, Variance Mean: Variance: xp x discrete ( ) , 2 ( ) x p
Example Consider the discrete uniform die example: x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 = E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5
Example Consider the discrete uniform die example: x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 2 = E[(X-)2] = (1-3.5)2(1/6) + (2-3.5)2(1/6) + (3-3.5)2(1/6) + (4-3.5)2(1/6) + (5-3.5)2(1/6) + (6-3.5)2(1/6) = 2.92
å Binomial Mean p ÷ ø ö ç è æ ) 1 ( )! ! xp x ( ) = 1p(1) + 2p(2) + 3p(3) + . . . + np(n) x n p - = ÷ ø ö ç è æ å ) 1 ( )! !
å Binomial Mean p ÷ ø ö ç è æ ) 1 ( )! ! xp x ( ) = 1p(1) + 2p(2) + 3p(3) + . . . + np(n) x n p - = ÷ ø ö ç è æ å ) 1 ( )! ! Miracle 1 occurs = np
Binomial Measures Mean: Variance: xp ( x ) = np ( ) x 2 ( ) x p = np(1-p)
Binomial Distribution 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 x P(x) 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 x P(x) n=5, p=.3 n=8, p=.5 n=20, p=.5 n=4, p=.8 0.0 0.1 0.2 0.3 0.4 0.5 2 4 x P(x) 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 P(x) x
Measures of Centrality Mean Median Mode
Measures of Centrality Mean xp x discrete ( ) , xf x dx continuous ( ) , Sample Mean å = n i x X 1
Measures of Centrality Exercise: Compute the sample mean for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Measures of Centrality Exercise: Compute the sample mean for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 å = n i x X 1 10 9 . 3 5 1 2 8 7 4 + = = 3.06
Measures of Centrality Failure Data X 1 . 19 =
Measures of Centrality Median Compute the median for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 . 3 2 = + X
Measures of Centrality Mode Class mark of most frequently occurring interval For Failure data, mode = class mark first interval ( . 5 = X
Measures of Centrality Measure Student Gpa Failure Data Mean 3.00 19.10 Median 3.04 14.40 Mode --- 5.00
Measures of Dispersion Range Sample Variance
Measures of Dispersion Range Compute the range for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Measures of Dispersion Range Compute the range for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 Min = 2.4 Max = 3.9 Range = 3.9 - 2.4 = 1.5
Measures of Dispersion Variance 2 ( ) x p 2 ( ) x f dx Sample variance x 1 2 - = å n s i
Measures of Dispersion Exercise: Compute the sample variance for the student Gpa data 2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9 x 1 2 - = å n s i
Measures of Dispersion Sample Variance x 1 2 - = å n s i ( ) 185 . 1 10 06 3 95 2 = -
Measures of Dispersion Exercise: Compute the variance for failure time data s2 = 302.76
An Aside For Failure Time data, we now have three measures for the data s2 = 302.76 X 1 . 19 =
An Aside For Failure Time data, we now have three measures for the data Exponential ?? s2 = 302.76 X 1 . 19 =
An Aside X 1 . 19 = Recall that for the exponential distribution m = 1/l s2 = 1/l2 If E[ X ] = m and E [s2 ] = s2, then 1/l = 19.1 or 1/l2 = 302.76 X 1 . 19 = s2 = 302.76 l 0524 . ˆ = l 0575 . ˆ =
Introduction to Probability & Statistics The Central Limit Theorem
The Sample Mean x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 Suppose, for our die example, we wish to compute the mean from the throw of 2 dice: x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 xp x ( ) . 3 5 Estimate by computing the average of two throws: X 1 2
Joint Distributions x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 X1 X2 X
Joint Distributions x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 X1 X X2
Joint Distributions x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 X1 X X2
Joint Distributions x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 X1 X X2
Distribution of X x 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 p(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Distribution of X Distribution of X 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 7 8 9 10 11
Distribution of X n = 2 n = 10 n = 15 0.0 0.1 0.2 0.3 0.4 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 7 8 9 10 11 0.0 0.1 0.2 0.3 0.4 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 n = 15
Distribution of X X lim Normal n = 2 n = 10 n = 15 n 0.0 0.1 0.2 0.3 0.4 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 6 7 8 9 10 11 0.0 0.1 0.2 0.3 0.4 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 n = 15 X lim n Normal
Expected Value of X E X n [ ] . 1 n E X [ ] . 2 1 2 n E X [ ] .
Expected Value of X E X n [ ] . 1 n E X [ 2 1 2 n E X [ ] . 1 2 n .
Expected Value of X E X n [ ] . 1 n E 2 1 2 n E X [ ] . 1 2 n . 1 n
Variance of X ( ) . x X n n X . 2 1 ( ) . x X n 2 1 n X .
Variance of X ( ) . x X n n X 2 1 ( ) . x X n 2 1 n X . 1 2 n X ( ) .
Variance of X ( ) . x X n n X 2 1 ( ) . x X n 2 1 n X . 1 2 n X ( ) . 1 2 n ( ) 2 n
Distribution of x Recall that x is a function of random variables, so it also is a random variable with its own distribution. By the central limit theorem, we know that where,
Example Suppose that breakeven analysis indicates we must have average daily revenues of $500. A random sample of 10 days yields an average of only $450 dollars. What is the probability we will not breakeven this year?
Example P not breakeven x { } = < m 500 450 = - > P x { } m 500 Suppose that breakeven analysis indicates we must have average daily revenues of $500. A random sample of 10 days yields an average of only $450 dollars. What is the probability we will not breakeven this year? P not breakeven x { } = < m 500 450 = - > P x { } m 500 450
Example P not breakeven { } = - > x m 500 450 Recall that Using the standard normal transformation
Example P not breakeven { }
Example In order to solve this problem, we need to know the true but unknown standard deviation . Let us assume we have enough past data that a reasonable estimate is s = 25. P Z 50 25 10 P Z 1 58 . Pr{not breakeven} = = 0.943