Download presentation
Presentation is loading. Please wait.
Published byDonald McBride Modified over 9 years ago
1
THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics http://commprojects.jhsph.edu/faculty/bio.cfm?F=Ciprian&L=Crainiceanu
2
2 Oh formulas, where art thou?
3
3 Why does the point of view make all the difference?
4
4 Getting rid of the superfluous information
5
5 How the presentation could have started, but didn’t: Proof that statisticians can speak alien languages Let ,P) be a probability space, where is a measurable space and P: ↦ [0,1] is a probability measure function from the - algebra It is perfectly natural to ask oneself what a -algebra or -field is. Definition. A -field is a collection of subsets of the sample space with Of course, once we mastered the -algebra or -field concept it is only reasonable to wonder what a probability measure is Definition. A probability measure P has the following properties Where do all these fit in the big picture? Every sample space is a particular case of probability space and weighting is intrinsically related to sampling
6
6 Why simple questions can have complex answers? Question: Question: What is the average length of in-hospital stay for patients? Complexity: Complexity: The original question is imprecise. New question: New question: What is the average length of stay for: –Several hospitals of interest? –Maryland hospitals? –Blue State hospitals? …
7
7 “Data” Collection & Goal Survey, conducted in 5 hospitals Hospitals are selected n hospital patients are sampled at random Length of stay (LOS) is recorded Goal: Estimate the population mean
8
8 Procedure Compute hospital specific means “Average” them –For simplicity assume that the population variance is known and the same for all hospitals How should we compute the average? Need a (good, best?) way to combine information
9
9 DATA Hospital # sampled n hosp Hospital size % of Total size: 100 hosp Mean LOS Sampling variance 130 100 1025 2 /30 260 150 1535 2 /60 315 200 2015 2 /15 430 250 2540 2 /30 515 300 3010 2 /15 Total1501000100
10
10 Weighted averages Weighting strategy Weights x100 MeanVariance Ratio Equal20 20 20 20 2025.0130 Inverse variance 20 40 10 20 1029.5100 Population10 15 20 25 3023.8172 Examples of various weighted averages: Variance using inverse variance weights is smallest
11
11 What is weighting? (via Constantine)
12
12 What is weighting? The EssenceThe Essence: a general (fancier?) way of computing averages There are multiple weighting schemes Minimize variance by using inverse variance weights Minimize bias for the population mean by using population weights (“survey weights”) Policy weights “My weights,”...
13
13 Weights and their properties Let ( 1, 2, 3, 4, 5 ) be the TRUE hospital-specific LOS Then estimates If 1 = 2 = 3 = 4 = 5 = i i ANY set of weights that add to 1 estimate . So, it’s best to minimize the variance But, if the TRUE hospital-specific E(LOS) are not equal –Each set of weights estimates a different target –Minimizing variance might not be “best” –An unbiased estimate of sets w i = i General idea Trade-off variance inflation & bias reduction
14
14 Mean Squared Error General idea Trade-off variance inflation & bias reduction (Estimate - True) 2 MSE = Expected(Estimate - True) 2 Variance + Bias 2 = Variance + Bias 2 Bias is unknown unless we know the i (the true hospital-specific mean LOS) But, we can study MSE ( , w, ) Consider a true value of the variance of the between hospital means Study BIAS, Variance, MSE for various assumed values of this variance
15
15 Mean Squared Error Consider a true value of the variance of the between hospital means T = ( i - * ) 2 Study BIAS, Variance, MSE for optimal weights based on assumed values (A) of this variance When A = T, MSE is minimized Convert T and A to fraction of total variance
16
16 The bias-variance trade-off X is assumed variance fraction Y is performance computed under the true fraction
17
17 Summary Much of statistics depends on weighted averages Choice of weights should depend on assumptions and goals trustIf you trust your (regression) model, –Then, minimize the variance, using “optimal” weights –This generalizes the equal s case worryIf you worry about model validity (bias for –Buy full insurance, by using population weights –You pay in variance (efficiency) –Consider purchasing only what you need Using compromise weights
18
18 Statistics is/are everywhere!
19
19 EURO: our short wish list
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.