R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Binary Response Lecture 22 Lecture 22.
Linear statistical models 2009 Models for continuous, binary and binomial responses  Simple linear models regarded as special cases of GLMs  Simple linear.
Generalised linear models
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Gl
Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Genetic Association and Generalised Linear Models Gil McVean, WTCHG Weds 2 nd November 2011.
1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Generalized Linear Models
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Logistic Regression and Generalized Linear Models:
BIOL 582 Lecture Set 18 Analysis of frequency and categorical data Part III: Tests of Independence (cont.) Odds Ratios Loglinear Models Logistic Models.
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
New Ways of Looking at Binary Data Fitting in R Yoon G Kim, Colloquium Talk.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Statistical modelling Gil McVean, Department of Statistics Tuesday 24 th Jan 2012.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店.
Lecture 14: Introduction to Logistic Regression BMTRY 701 Biostatistical Methods II.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
1 Model choice Gil McVean, Department of Statistics Tuesday 17 th February 2007.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
You can NOT be serious! Dr Tim Paulden EARL 2014, London, 16 September 2014 How to build a tennis model in 30 minutes (Innovation & Development Manager,
04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu.
Logistic Regression and Odds Ratios Psych DeShon.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Logistic Regression. What is the purpose of Regression?
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Unit 32: The Generalized Linear Model
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
CHAPTER 7 Linear Correlation & Regression Methods
Generalized Linear Models
ביצוע רגרסיה לוגיסטית. פרק ה-2
PSY 626: Bayesian Statistics for Psychological Science
SAME THING?.
PSY 626: Bayesian Statistics for Psychological Science
Logistic Regression with “Grouped” Data
Presentation transcript:

R Programming/ Binomial Models Shinichiro Suna

Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.

Contents 1 Logit model 1.1 Fake data simulations 1.2 Maximum likelihood estimation 1.3 Bayesian estimation 2 Probit model 2.1 Fake data simulations 2.2 Maximum likelihood estimation 2.3 Bayesian estimation

1. Logit model (Logistic Regession Analysis) Logit model (Logistic regression analysis) uses the logistic function. When there are several explanatory variables, F(x) = 1 / 1+ exp( - (B0 + B1*X1 + B2*X2 + ….) )

1. Fake data simulations x <- 1 + rnorm(1000,1) xbeta < (x* 1) proba <- exp(xbeta)/(1 + exp(xbeta)) y <- ifelse(runif(1000,0,1) < proba,1,0) table(y) df <- data.frame(y, x)

1.2. Maximum likelihood estimation The standard way to estimate a logit model is glm() function with family binomial and link logit. (Fitting Generalized Linear Models)

1.2. Maximum likelihood estimation # Fitting Generalized Linear Models res <- glm(y ~ x, family = binomial(link=logit)) names(res) summary(res) # results confint(res) # confindence intervals exp(res$coefficients) # odds ratio exp(confint(res)) # Confidence intervals for odds ratio (delta method)

1.2. Maximum likelihood estimation > summary(res) # results Call: glm(formula = y ~ x, family = binomial(link = logit)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * x *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 98 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

1. 3. Bayesian estimation # Data generating process x <- 1 + rnorm(1000,1) xbeta < (x* 1) proba <- exp(xbeta)/(1 + exp(xbeta)) y <- ifelse(runif(1000,0,1) < proba,1,0) table(y) # Markov Chain Monte Carlo for Logistic Regression library(MCMCpack) res <- MCMClogit(y ~ x) summary(res) plot(res)

1. 3. Bayesian estimation > summary(res) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) x Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) x

2. Probit model The probit model is a type of regression where the dependent variable can only take two values. The name is from probability + unit.

2. Probit model Probit model uses the cumulative density function of a normal distribution.

2.1 Probit model - fake data simulation - # Generating Fake Data x1 <- 1 + rnorm(1000) x2 < x1 + rnorm(1000) xbeta < x1 + x2 proba <- pnorm(xbeta) y <- ifelse(runif(1000,0,1) < proba,1,0) mydat <- data.frame(y,x1,x2) table(y)

2. 2. Maximum likelihood # Fitting Generalized Linear Models res <- glm(y ~ x1 + x2, family = binomial(link=probit), data = mydat) names(res) summary(res) exp(res$coefficients) # odds ratio exp(confint(res)) # Confidence intervals for odds ratio (delta method)

2. 2. Maximum likelihood > summary(res) Call: glm(formula = y ~ x1 + x2, family = binomial(link = probit), data = mydat) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** x ** x e-05 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 97 degrees of freedom AIC: Number of Fisher Scoring iterations: 9

2. 2. Maximum likelihood library("sampleSelection") Res <- probit(y ~ x1 + x2, data = mydat) summary(res)

2. 2. Maximum likelihood > summary(res) Probit binary choice model/Maximum Likelihood estimation Newton-Raphson maximisation, 7 iterations Return code 1: gradient close to zero Log-Likelihood: Model: Y == '1' in contrary to '0' 100 observations (59 'negative' and 41 'positive') and 3 free parameters (df = 97) Estimates: Estimate Std. error t value Pr(> t) (Intercept) ** x ** x e-05 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Significance test: chi2(2) = (p= e-21)

2. 3. Bayesian estimation # Markov Chain Monte Carlo for Probit Regression library("MCMCpack") post <- MCMCprobit(y ~ x1 + x2, data = mydat) summary(post) plot(post)

2. 3. Bayesian estimation > summary(post) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) x x Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) x x