Sufficient statistics. The Poisson and the exponential can be summarized by (n, ). So too can the normal with known variance Consider a statistic S(Y) Suppose that the conditional distribution of Y given S does not depend on , then S is a sufficient statistic for based on Y Occurs iff the density of Y factors into a function of s(y) and and a function of y that doesn't depend on More Chapter 4
Example. Exponential IExp( ) ~ Y E(Y) = Var(Y) = 2 Data y 1,...,y n L( ) = -1 exp(- y j / ) l( ) = -nlog( ) - y j / y j /n is sufficient
maximum
=
Approximate 100(1-2 )% CI for 0 Example. spring data
Weibull.
Note. Expected information
Gamma.
Example. Bernoulli Pr{Y = 1} = 1 - Pr{Y = 0} = 0 1 L( ) = ^y i (1 - )^(1-y i ) = r (1 - ) n-r l( ) = rlog( ) + (n-r)log(1- ) r = y j R = Y j is sufficient for , as is R/n L( ) factors into a function of r and a constant
Score vector [ y j / - (n-y j )/(1- )] Observed information [y j / 2 + (n-y j )/(1- ) 2 ] M.l.e.
Cauchy. ICau( ) f(y; ) = 1/ (1+(y- ) 2 ) E|Y| = Var(Y) = L( ) = 1/( (1+(y j - ) 2 ) Many local maxima l( ) = - log(1+(y j - ) 2 ) J( ) = 2 ((1-(y j - ) 2 )/(1+(y j - ) 2 ) 2 I( ) = n/2
Uniform. f(u; ) = 1/ 0 < u < = 0 otherwise L( ) = 1/ n 0 < y 1,..., y n < = 0 otherwise
l( ) becomes increasingly spikey E u( ) = -1 i( ) = -
Logistic regression. Challenger data Ibinomials R j, m j, j
Likelihood ratio. Model includes dim( ) = p true (unknown) value 0 Likelihood ratio statistic
Justification. Multinormal result If Y ~ N ( , ) then (Y- ) T -1 (Y- ) ~ p 2
Uses. Pr[W( 0 ) c p (1-2 )] 1-2 Approx 100(1-2 )% confidence region
Example. exponential Spring data: 96 < <335 vs. asymp normal approx 64 < <273 kcycles
Prob-value/P-value. See (7.28) Choose T whose large values cast doubt on H 0 Pr 0 (T t obs ) Example. Spring data Exponential E(Y) = H 0 : = 100?
Nesting : p by 1 parameter of interest : q by 1 nuisance parameter Model with params ( 0, ) nested within ( , ) Second model reduces to first when = 0
Example. Weibull params ( , ) exponential when = 1 How to examine H 0 : = 1?
Spring failure times. Weibull
Challenger data. Logistic regression temperature x 1 pressure x 2 ( 0, 1, 2 ) = exp{ }/(1+exp{ }) = 0 + 1 x 1 + 2 x 2 linear predictor loglike l( 0, 1, 2 ) = 0 r j + 1 r j x 1j + 2 r j x 2j - m log(1+exp{ j }) Does pressure matter?
Model fit. Are labor times Weibull? Nest its model in a more general one Generalized gamma. Gamma for =1 Weibull for =1 Exponential for = =1
Likelihood results. max log likelihood: generalized gamma gamma Weibull gamma vs. generalized gamma - 2 log like diff: 2( ) =.94 P-value Pr 0 ( 1 2 >.94) = Pr(|Z|>.969) = 2(.166) =.332
Chi-squared statistics. Pearson's chi-squared categories 1,...,k count of cases in category i: Y i Pr(case in i) = i 0 < i < 1 1 k i =1 E(Y i ) = n i var(Y i ) = i (1 - i )n cov(Y i,Y j ) = - i j n i j E.g. k=2 case cov(Y,n-Y) = -var(Y) = -n 1 2 = { ( 1,..., k ): 1 k i = 1, 0< 1,..., k <1} dimension k-1
Reduced dimension possible? model i ( ) dim( ) = p log like general model: 1 k-1 y i log i + y k log[1- k-1 ], 1 k y i = n log like restricted model: l( ) = 1 k-1 y i log i ( ) + y k log[1- 1 ( )-...- k-1 ( )]
likelihood ratio statistic: if restricted model true The statistic is sometimes written W = 2 O i log(O i /E i ) (O i - E i ) 2 /E i
Pearson's chi-squared.
Example. Birth data. Poisson? Split into k=13 categories [0,7.5), [7.5,8.5),...[18.5,24] hours O(bserved) E(xpected) P = 4.39 P-value Pr( 11 2 > 4.39) =.96
Two way contingency table. r rows and c columns n individuals Blood groups A, B, AB, O A, B antigens - substance causing body to produce antibodies group count model I model II O = 1 - A - B
Question. Rows and columns independent? W = 2 y ij log ny ij / y i. y.j with y i. = j y ij ~ k-1-p 2 = (r-1)c-1) 2 with k=rc p=(r-1)+(c-1) P = (y ij - y i. y.j /n) 2 / (y i. y.j /n) ~ (r-1)(c-1) 2
Model 1 W = Pr( 1 2 > 17.66) = Pr(|Z| > 4.202) = 2.646E-05 P = Pr( 1 2 > 15.73) = Pr(|Z| > 3.966) = 7.309E-05 k-1-p = = 1 Model 2 W = 3.17 Pr(|Z| > 1.780) =.075 P = 2.82 Pr(|Z|>1.679) =.093
Incorrect model. True model g(y), fit f(y; )
Example 1. Quadratic, fit linear
Example 2. True lognormal, but fit exponential
Large sample distribution.
Model selection. Various models: non-nested Ockham's razor. Prefer the simplest model
Formal criteria. Look for minimum
Example. Spring failure ModelpAICBIC M *769.9* M M M stress levels M 1 : Weibull - unconnected , at each stress level