Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics 760 Random Thoughts.

Similar presentations


Presentation on theme: "Biostatistics 760 Random Thoughts."— Presentation transcript:

1 Biostatistics 760 Random Thoughts

2 Upcoming Classes Bios 761: Advanced Probability and Statistical Inference Bios 767: Longitudinal Data Analysis Bios 780: Theory and Methods for Survival Analysis Bios 841: Statistical Consulting

3 Bios 761 Frequentist and Bayesian decision theory
Hypothesis testing: UMP tests, etc. Bootstrap and other methods of inference Stochastic processes: Poisson processes Markov chains Martingales Brownian motion

4 Bios 780 Time-to-event data Right censoring
Counting processes; martingales Semiparametric approaches Kaplan-Meier estimator Log-rank statistic Cox model Data analysis

5 Bios 841 Consulting versus collaboration
Bringing it all together to solve problems Communicating about statistics Three real problems Three journal style reports One final oral presentation Real time problem solving What is the role of statistical theory?

6 A Few War Stories As a student: thesis on surrogates
As a postdoc: infectious diseases As a new professor: cystic fibrosis (CF)* Working on tenure: empirical processes Empirical processes and cancer* Chair of the DSMC for NICHD Artificial intelligence and NSCLC

7 CF Neonatal Screening 1992: Joined Phil Farrell’s CF study team
1997: Farrell, Kosorok, Laxova, et al, published in NEJM 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial States offering CF newborn screening: 3 in 1997, 12 in 2004, 45 today

8 What Role Did “Theory” Play?
Used state-of-the-art statistical methods that were robust (GEE) In other CF research we have used: Current status methods (parametric, robust) Constrained regression estimation Semiparametric bootstrap inference Martingale based survival analysis New work using artificial intelligence

9 Empirical Processes and Cancer
Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage Diagnostics show the model fits poorly

10 What is the Problem? Poor survival function prediction
Possibly incorrect interpretation of risk factor effects A model that adds a single parameter to the Cox model was developed and fit This new model fits well (Kosorok,Lee and Fine, 2004) Inference for the new model is complicated

11 What Does Theory Tell Us?
We can derive valid inferential tools for the new model: estimation and bootstrap Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: The direction of the regression coefficients is preserved Should use robust variance for Cox model

12 Theory Versus Applications
The title implies there is conflict between theory and applications This isn’t true! Theory provides a basis for correct thinking and problem solving for applications Applications drive new theoretical development

13 Theory Can Be Impractical
Law of iterated logarithm: needs sample size of 108 (“asymptopia”). Sometimes higher order approximations are needed before it becomes useful. Sometimes computational properties of asymptotically optimal estimators are poor. Some hard problems take years to solve.

14 Why Theory is Needed Often it does work for practical sample sizes.
Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). Theory can drive scientific discovery. Some results are beautiful.

15 Data Mining Versus Inference
Data mining is summarizing and representing data no matter how complicated Inference is determining valid measures of uncertainty Patterns obtained from data mining can be misleading Inference without data mining may miss important structure

16 The Core of Statistics Statistics is the science of science
How do we learn from our world and draw meaningful and valid conclusions from it? Need both data mining and valid inference Requires a unique kind of intuition Needs many different intellectual perspectives One of the most challenging of all fields

17 Everyone Needs Core Literacy
All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve All statisticians need to know enough about applications to know what is important All biostatisticians need to know enough statistical methods to be useful in practice The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology

18 Semiparametric Inference
The study of statistical models with parametric and/or nonparametric parts Can achieve trade-off between scientific meaning and model “robustness” Estimation and inference are often hard There exists an efficiency bound for parametric and some nonparametric parts NPMLE, testing and estimating equations

19 Empirical Processes Tools for complex model inference and high dimensional data Can determine universal properties of semiparametric methods: Consistency Rate of convergence Limiting distributions Valid inference (empirical process bootstrap) Empirical processes are everywhere

20 The Road Ahead Whatever you choose to do, the core statistical theory classes will help you. Be patient as your learn. Be willing to work hard (struggle is good). It takes many different kinds of thinkers with different learning styles. There are important discoveries to be made in both applications and theory.


Download ppt "Biostatistics 760 Random Thoughts."

Similar presentations


Ads by Google