Biostatistics 760 Random Thoughts. Upcoming Classes Bios 761: Advanced Probability and Statistical Inference Bios 763: Generalized Linear Model Theory.

Slides:



Advertisements
Similar presentations
Grant review at NIH for statistical methodology Jeremy M G Taylor Michelle Dunn Marie Davidian.
Advertisements

Applications of Stochastic Processes in Asset Price Modeling Preetam D’Souza.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Model Assessment, Selection and Averaging
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.
The loss function, the normal equation,
Copyright (c) Li Zhu Biostatistics and Its Role in Public Health Li Zhu, PhD Assistant Professor of Biostatistics Department of Epidemiology and.
Maximum likelihood (ML) and likelihood ratio (LR) test
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
How to prepare yourself for a Quants job in the financial market?   Strong knowledge of option pricing theory (quantitative models for pricing and hedging)
Maximum likelihood (ML)
Parametric Inference.
Biostatistics Frank H. Osborne, Ph. D. Professor.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Maximum likelihood (ML)
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
The Paradigm of Econometrics Based on Greene’s Note 1.
An extension of the compound covariate prediction under the Cox proportional hazard models Emura, Chen & Chen [ 2012, PLoS ONE 7(10) ] Takeshi Emura (NCU)
Survival Analysis: From Square One to Square Two
Overall agenda Part 1 and 2  Part 1: Basic statistical concepts and descriptive statistics summarizing and visualising data describing data -measures.
Simple Linear Regression
Model Inference and Averaging
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Introduction: Why statistics? Petter Mostad
PARAMETRIC STATISTICAL INFERENCE
Advanced Higher Statistics Data Analysis and Modelling Hypothesis Testing Statistical Inference AH.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Borgan and Henderson:. Event History Methodology
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
6 - 1 © 1998 Prentice-Hall, Inc. Chapter 6 Sampling Distributions.
MDG data at the sub-national level: relevance, challenges and IAEG recommendations Workshop on MDG Monitoring United Nations Statistics Division Kampala,
POSTER TEMPLATE BY: Weighted Kaplan-Meier Estimator for Adaptive Treatment Strategies in Two-Stage Randomization Designs Sachiko.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Introduction Sample Size Calculation for Comparing Strategies in Two-Stage Randomizations with Censored Data Zhiguo Li and Susan Murphy Institute for Social.
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L1a.1 Lecture 1a: Some basic statistical concepts l The use.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
1 Optimal design which are efficient for lack of fit tests Frank Miller, AstraZeneca, Södertälje, Sweden Joint work with Wolfgang Bischoff, Catholic University.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
6 - 1 © 2000 Prentice-Hall, Inc. Statistics for Business and Economics Sampling Distributions Chapter 6.
Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
Data Analysis.
Biostatistics 760 Random Thoughts.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Behavioral Statistics
STA 216 Generalized Linear Models
Biostatistics 760 Random Thoughts.
STA 216 Generalized Linear Models
(or why should we learn this stuff?)
Chengyaun yin School of Mathematics SHUFE
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Biostatistics 760 Random Thoughts

Upcoming Classes Bios 761: Advanced Probability and Statistical Inference Bios 763: Generalized Linear Model Theory and Applications Bios 767: Longitudinal Data Analysis Bios 780: Theory and Methods for Survival Analysis Bios 841: Statistical Consulting

Bios 761 Frequentist and Bayesian decision theory Hypothesis testing: UMP tests, etc. Bootstrap and other methods of inference Stochastic processes: –Poisson processes –Markov chains –Martingales –Brownian motion

Bios 780 Time-to-event data Right censoring Counting processes; martingales Semiparametric approaches –Kaplan-Meier estimator –Log-rank statistic –Cox model Data analysis

Bios 841 Consulting versus collaboration Bringing it all together to solve problems Communicating about statistics –Three real problems –Three journal style reports –One final oral presentation Real time problem solving What is the role of statistical theory?

A Few War Stories As a student: thesis on surrogates As a postdoc: infectious diseases As a new professor: cystic fibrosis (CF)* Working on tenure: empirical processes Empirical processes and cancer* Chair of the DSMC for NICHD Artificial intelligence and NSCLC

CF Neonatal Screening 1992: Joined Phil Farrell’s CF study team 1997: Farrell, Kosorok, Laxova, et al, published in NEJM 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial States offering CF newborn screening: 3 in 1997, 12 in 2004, 45 today

What Role Did “Theory” Play? Used state-of-the-art statistical methods that were robust (GEE) In other CF research we have used: –Current status methods (parametric, robust) –Constrained regression estimation –Semiparametric bootstrap inference –Martingale based survival analysis –New work using artificial intelligence

Empirical Processes and Cancer Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage Diagnostics show the model fits poorly

What is the Problem? Poor survival function prediction Possibly incorrect interpretation of risk factor effects A model that adds a single parameter to the Cox model was developed and fit This new model fits well (Kosorok,Lee and Fine, 2004) Inference for the new model is complicated

What Does Theory Tell Us? We can derive valid inferential tools for the new model: estimation and bootstrap Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: –The direction of the regression coefficients is preserved –Should use robust variance for Cox model

Theory Versus Applications The title implies there is conflict between theory and applications This isn’t true! Theory provides a basis for correct thinking and problem solving for applications Applications drive new theoretical development

Theory Can Be Impractical Law of iterated logarithm: needs sample size of 10 8 (“asymptopia”). Sometimes higher order approximations are needed before it becomes useful. Sometimes computational properties of asymptotically optimal estimators are poor. Some hard problems take years to solve.

Why Theory is Needed Often it does work for practical sample sizes. Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). Theory can drive scientific discovery. Some results are beautiful.

Data Mining Versus Inference Data mining is summarizing and representing data no matter how complicated Inference is determining valid measures of uncertainty Patterns obtained from data mining can be misleading Inference without data mining may miss important structure

The Core of Statistics Statistics is the science of science How do we learn from our world and draw meaningful and valid conclusions from it? Need both data mining and valid inference Requires a unique kind of intuition Needs many different intellectual perspectives One of the most challenging of all fields

Everyone Needs Core Literacy All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve All statisticians need to know enough about applications to know what is important All biostatisticians need to know enough statistical methods to be useful in practice The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology

Semiparametric Inference The study of statistical models with parametric and/or nonparametric parts Can achieve trade-off between scientific meaning and model “robustness” Estimation and inference are often hard There exists an efficiency bound for parametric and some nonparametric parts NPMLE, testing and estimating equations

Empirical Processes Tools for complex model inference and high dimensional data Can determine universal properties of semiparametric methods: –Consistency –Rate of convergence –Limiting distributions –Valid inference (empirical process bootstrap) Empirical processes are everywhere

The Road Ahead Whatever you choose to do, the core statistical theory classes will help you. Be patient as your learn. Be willing to work hard (struggle is good). It takes many different kinds of thinkers with different learning styles. There are important discoveries to be made in both applications and theory.