Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve,

Similar presentations


Presentation on theme: "Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve,"— Presentation transcript:

1

2

3 Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve, image,...) sample y1 ..., yn statistic s(y1 ..., yn)

4 Features: location scale (spread) Sample moments = (y yn)/n average s2 = Σ (y - )2 /(n-1) sample variance Order statistics y(1)  y(2) ... y(n) minimum, maximum, median, range quartiles, quantiles p 100% trimmed average IQR, MAD = median{|yi - median(yi)|}

5 Bad data Outlier - observation unusual compared to the others Resistance Trimmed average Example (Midwife birth data). Hours in labor by day n = 95 = 7.57 hr s2 = hr2 min, med, max = 1.5, 7.5, 19 hr quartiles 4.95, 9.75 hr

6

7 Graphs. Indispensable in data analysis
Histogram disjoint bins [L+(k-1),L+k) Plot count, nk , or proportion nk /n EDF #{yj  y}/n Estimates CDF, Prob{Y  y} Scatter plot (uj , vj ) Parallel boxplots - location, scale, shape, outliers, comparative median, quartiles, 1.5 IQR

8

9 Random sample Y1,...,Yn independent CDF F Mean E(Y) =  y dF(y) (= yf(y)dy if  density f) p quantile yp = F-1 (p) Laplace (continuous) f(y) = exp{-|y-|/}/2 , -<y< Poisson (discrete) Prob(Y=y) = f(y) = yexp{- }/y! , y=0,1,2, ... Count of daily arrivals + poisson Hours of labor + gamma

10

11 Gamma f(y) = Will be providing many examples of useful distributions in these beginning chapters Some discrete, some continuous

12 SF Chron 01/26/09

13 Sampling variation. "the data y1 ,..., yn will be regarded as the observed values of random variables" - probabilities defined "ask how we would expect s(y1,...,yn) to behave on average, ..., understand the properties of S = S(Y1 ,...,Yn )" Y1,...,Yn sample from distribution mean , variance 2 Sample moment ; E( ) = nE(Yj )/n = , unbiased E(X + Y) = E(X) + E(Y)

14 var( ) = 2/n var(X+Y) = Var(X) + var(Y), if uncorrelated var(aX) = a2 var(X)  (Yj - )2 =  (Yj )2 =  (Yj )2 + ( )2 n2 = E(  (Yj )2 ) + 2 E(S2) = 2, unbiased Birth data. n = 95, = 7.57 hr, s/n = hr

15 Probability plot. Checking probability model
plot y(j) versus F-1(j/(n+1)) For normal take F =   from table or statistical package Normal prob plot "works" if ,  unknown For N(, 2 ), E(Y(j)) =  + E(Z(j) )

16

17 Tools for approximation
Weak law of large numbers.   in probability as n   is a consistent estimate of  Definition. {Sn} S in probability if for any  > 0 Pr(|Sn - S| > )  0 as n   If S = s0, constant and h(s) continuous at s0 then h(Sn) h(s0) in probability

18

19 Central limit theorem. n( - )/  Z = N(0,1) in distribution as n   Definition. {Zn} converges in distribution to Z if Pr(Zn  z)  Pr(Z  z) as n   at every z for which Pr(Z  z) is continuous The CLT provides an approximation for "large" n

20

21 Average as an estimate of .
If X is N( ,2) then (X - )/ is N(0,1) Writing Zn = n( )/ =  + n-1/2 Zn Indicates how efficiency of depends on n and 

22 Covariance and correlation.
cov(X,Y) = xy = E[{X-E(X)}{Y-E(Y)}] sample covariance Cxy = nj=1 (Xj - )(Yj - )/(n-1) Cxy  xy in probability correlation  = cov(X,Y)/[var(X)var(Y)]    1 R = Cxy/[Cxx Cyy ] R   in probability

23 R = -.340

24 Some more distributions.
Cauchy f(y) = 1/[{1 + (y - )2}]  < y < distribution of same as that of Y1 no moments, long tails Uniform F(u) = u  0 = u 0<u1 = < u E(U) = 1/2, center of gravity

25 Exponential f(y) = y < 0 = exp{-y} y  0 Pareto F(y) = y < a = (y/a)- y  a a,  > 0 Poisson process Times of events y(1), y(2), y(3), ... y(1), y(3)-y(2), y(4)-y(3),... i.i.d. exponential

26 Chi-squared distribution
Z1 , Z2 ,..., Z IN(0,1) W = j=1 Z2j E(W) =  var(W) = 2 Multinomial page 47 p classes with probs 1 ,..., p adding to 1

27 Linear combination L = a +  bj Yj E(L) = a +  bj j If independent var(L) =  bj2 j2 If {Yj} are IN(j,j2), then L is N(a +  bj j,  bj2 j2 )

28 Moment-generating function
MY(t) = E(exp{tY}), t real X, Y independent MX+Y (t) = MX(t)MY(t) For N(,2) M(t) = exp{t  + t2 2/2) The normal is determined by its moments


Download ppt "Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve,"

Similar presentations


Ads by Google