Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics

Similar presentations


Presentation on theme: "THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics"— Presentation transcript:

1 THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics http://commprojects.jhsph.edu/faculty/bio.cfm?F=Ciprian&L=Crainiceanu

2 2 Oh formulas, where art thou?

3 3 Why does the point of view make all the difference?

4 4 Getting rid of the superfluous information

5 5 How the presentation could have started, but didn’t: Proof that statisticians can speak alien languages Let ,P) be a probability space, where  is a measurable space and P:  ↦ [0,1] is a probability measure function from the  - algebra  It is perfectly natural to ask oneself what a  -algebra or  -field is. Definition. A  -field is a collection of subsets  of the sample space  with Of course, once we mastered the  -algebra or  -field concept it is only reasonable to wonder what a probability measure is Definition. A probability measure P has the following properties Where do all these fit in the big picture? Every sample space is a particular case of probability space and weighting is intrinsically related to sampling

6 6 Why simple questions can have complex answers? Question: Question: What is the average length of in-hospital stay for patients? Complexity: Complexity: The original question is imprecise. New question: New question: What is the average length of stay for: –Several hospitals of interest? –Maryland hospitals? –Blue State hospitals? …

7 7 “Data” Collection & Goal Survey, conducted in 5 hospitals Hospitals are selected n hospital patients are sampled at random Length of stay (LOS) is recorded Goal: Estimate the population mean

8 8 Procedure Compute hospital specific means “Average” them –For simplicity assume that the population variance is known and the same for all hospitals How should we compute the average? Need a (good, best?) way to combine information

9 9 DATA Hospital # sampled n hosp Hospital size % of Total size: 100  hosp Mean LOS Sampling variance 130 100 1025  2 /30 260 150 1535  2 /60 315 200 2015  2 /15 430 250 2540  2 /30 515 300 3010  2 /15 Total1501000100

10 10 Weighted averages Weighting strategy Weights x100 MeanVariance Ratio Equal20 20 20 20 2025.0130 Inverse variance 20 40 10 20 1029.5100 Population10 15 20 25 3023.8172 Examples of various weighted averages: Variance using inverse variance weights is smallest

11 11 What is weighting? (via Constantine)            

12 12 What is weighting? The EssenceThe Essence: a general (fancier?) way of computing averages There are multiple weighting schemes Minimize variance by using inverse variance weights Minimize bias for the population mean by using population weights (“survey weights”) Policy weights “My weights,”...

13 13 Weights and their properties Let (  1,  2,  3,  4,  5 ) be the TRUE hospital-specific LOS Then estimates If  1 =  2 =  3 =  4 =  5 =    i  i ANY set of weights that add to 1 estimate   . So, it’s best to minimize the variance But, if the TRUE hospital-specific E(LOS) are not equal –Each set of weights estimates a different target –Minimizing variance might not be “best” –An unbiased estimate of    sets  w i =  i General idea Trade-off variance inflation & bias reduction

14 14 Mean Squared Error General idea Trade-off variance inflation & bias reduction (Estimate - True) 2 MSE = Expected(Estimate - True) 2 Variance + Bias 2 = Variance + Bias 2 Bias is unknown unless we know the  i (the true hospital-specific mean LOS) But, we can study MSE ( , w,  ) Consider a true value of the variance of the between hospital means Study BIAS, Variance, MSE for various assumed values of this variance

15 15 Mean Squared Error Consider a true value of the variance of the between hospital means T =  (  i -  * ) 2 Study BIAS, Variance, MSE for optimal weights based on assumed values (A) of this variance When A = T, MSE is minimized Convert T and A to fraction of total variance

16 16 The bias-variance trade-off X is assumed variance fraction Y is performance computed under the true fraction

17 17 Summary Much of statistics depends on weighted averages Choice of weights should depend on assumptions and goals trustIf you trust your (regression) model, –Then, minimize the variance, using “optimal” weights –This generalizes the equal  s case worryIf you worry about model validity (bias for    –Buy full insurance, by using population weights –You pay in variance (efficiency) –Consider purchasing only what you need Using compromise weights

18 18 Statistics is/are everywhere!

19 19 EURO: our short wish list


Download ppt "THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics"

Similar presentations


Ads by Google