Download presentation
Presentation is loading. Please wait.
Published bySharleen Stevenson Modified over 6 years ago
3
Chapt 2. Variation How to: summarize/display random data appreciate variation due to randomness Data summaries. single observation y (number, curve, image,...) sample y1 ..., yn statistic s(y1 ..., yn)
4
Features: location scale (spread) Sample moments = (y yn)/n average s2 = Σ (y - )2 /(n-1) sample variance Order statistics y(1) y(2) ... y(n) minimum, maximum, median, range quartiles, quantiles p 100% trimmed average IQR, MAD = median{|yi - median(yi)|}
5
Bad data Outlier - observation unusual compared to the others Resistance Trimmed average Example (Midwife birth data). Hours in labor by day n = 95 = 7.57 hr s2 = hr2 min, med, max = 1.5, 7.5, 19 hr quartiles 4.95, 9.75 hr
7
Graphs. Indispensable in data analysis
Histogram disjoint bins [L+(k-1),L+k) Plot count, nk , or proportion nk /n EDF #{yj y}/n Estimates CDF, Prob{Y y} Scatter plot (uj , vj ) Parallel boxplots - location, scale, shape, outliers, comparative median, quartiles, 1.5 IQR
9
Random sample Y1,...,Yn independent CDF F Mean E(Y) = y dF(y) (= yf(y)dy if density f) p quantile yp = F-1 (p) Laplace (continuous) f(y) = exp{-|y-|/}/2 , -<y< Poisson (discrete) Prob(Y=y) = f(y) = yexp{- }/y! , y=0,1,2, ... Count of daily arrivals + poisson Hours of labor + gamma
11
Gamma f(y) = Will be providing many examples of useful distributions in these beginning chapters Some discrete, some continuous
12
SF Chron 01/26/09
13
Sampling variation. "the data y1 ,..., yn will be regarded as the observed values of random variables" - probabilities defined "ask how we would expect s(y1,...,yn) to behave on average, ..., understand the properties of S = S(Y1 ,...,Yn )" Y1,...,Yn sample from distribution mean , variance 2 Sample moment ; E( ) = nE(Yj )/n = , unbiased E(X + Y) = E(X) + E(Y)
14
var( ) = 2/n var(X+Y) = Var(X) + var(Y), if uncorrelated var(aX) = a2 var(X) (Yj - )2 = (Yj )2 = (Yj )2 + ( )2 n2 = E( (Yj )2 ) + 2 E(S2) = 2, unbiased Birth data. n = 95, = 7.57 hr, s/n = hr
15
Probability plot. Checking probability model
plot y(j) versus F-1(j/(n+1)) For normal take F = from table or statistical package Normal prob plot "works" if , unknown For N(, 2 ), E(Y(j)) = + E(Z(j) )
17
Tools for approximation
Weak law of large numbers. in probability as n is a consistent estimate of Definition. {Sn} S in probability if for any > 0 Pr(|Sn - S| > ) 0 as n If S = s0, constant and h(s) continuous at s0 then h(Sn) h(s0) in probability
19
Central limit theorem. n( - )/ Z = N(0,1) in distribution as n Definition. {Zn} converges in distribution to Z if Pr(Zn z) Pr(Z z) as n at every z for which Pr(Z z) is continuous The CLT provides an approximation for "large" n
21
Average as an estimate of .
If X is N( ,2) then (X - )/ is N(0,1) Writing Zn = n( )/ = + n-1/2 Zn Indicates how efficiency of depends on n and
22
Covariance and correlation.
cov(X,Y) = xy = E[{X-E(X)}{Y-E(Y)}] sample covariance Cxy = nj=1 (Xj - )(Yj - )/(n-1) Cxy xy in probability correlation = cov(X,Y)/[var(X)var(Y)] 1 R = Cxy/[Cxx Cyy ] R in probability
23
R = -.340
24
Some more distributions.
Cauchy f(y) = 1/[{1 + (y - )2}] < y < distribution of same as that of Y1 no moments, long tails Uniform F(u) = u 0 = u 0<u1 = < u E(U) = 1/2, center of gravity
25
Exponential f(y) = y < 0 = exp{-y} y 0 Pareto F(y) = y < a = (y/a)- y a a, > 0 Poisson process Times of events y(1), y(2), y(3), ... y(1), y(3)-y(2), y(4)-y(3),... i.i.d. exponential
26
Chi-squared distribution
Z1 , Z2 ,..., Z IN(0,1) W = j=1 Z2j E(W) = var(W) = 2 Multinomial page 47 p classes with probs 1 ,..., p adding to 1
27
Linear combination L = a + bj Yj E(L) = a + bj j If independent var(L) = bj2 j2 If {Yj} are IN(j,j2), then L is N(a + bj j, bj2 j2 )
28
Moment-generating function
MY(t) = E(exp{tY}), t real X, Y independent MX+Y (t) = MX(t)MY(t) For N(,2) M(t) = exp{t + t2 2/2) The normal is determined by its moments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.