A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Dummy Dependent variable Models
Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.
Two-sample tests. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated?Alternative to the chi- square test if.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
Getting the most out of insect-related data. A major issue for pollinator studies is to find out what affects the number of various insects. Example from.
Hypothesis Testing Steps in Hypothesis Testing:
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 13: The Chi-Square Test
Nguyen Ngoc Anh Nguyen Ha Trang
Binary Response Lecture 22 Lecture 22.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Chapter 10 Simple Regression.

Chapter 4 Multiple Regression.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Statistical Background
Chapter 11 Multiple Regression.
Continuous Random Variables and Probability Distributions
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
AS 737 Categorical Data Analysis For Multivariate
Unit 6: Standardization and Methods to Control Confounding.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Multinomial Distribution
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Managerial Economics Demand Estimation & Forecasting.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Question paper 1997.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
I231B QUANTITATIVE METHODS ANOVA continued and Intro to Regression.
Logistic Regression Analysis Gerrit Rooks
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Logistic Regression: Regression with a Binary Dependent Variable.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Chapter 4: Basic Estimation Techniques
BINARY LOGISTIC REGRESSION
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
Chapter 2 Simple Comparative Experiments
Simultaneous equation system
Introduction to logistic regression a.k.a. Varbrul
Data Analysis for Two-Way Tables
Basic Estimation Techniques
Just Enough to be Dangerous: Basic Statistics for the Non-Statistician
Introduction to Logistic Regression
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Presentation transcript:

A generalized bivariate Bernoulli model with covariate dependence Fan Zhang

Outline Introduction Model proposed Simulation Remarks

Introduction Dependence in outcome variables may arise in various fields such as epidemiology, time series, environment, public health, economics, anthropology, etc. Examples: 1) pre-post tests; 2) proposed diagnostic tests vs. standard procedure on selected individuals; 3) any twin studies. Dependence in outcome variables may pose formidable difficulty in analyzing data in longitudinal studies.

Previous Methods Most common approach: Marginal way In the past, most of the studies made attempts to address this problem using the marginal models. Example: 1) marginal odds ratios by Lipsitz et al.; 2) marginal model based on the binary Markov Chain by Azzalini.

Previous Methods Less common approach: conditional way Example: 1) Markov models for covariate dependence of binary sequence by Muenz et al.; 2) logistic model by Bonney et al. Other attemps: quadratic exponential form model multivariate Plackett distribution

Previous Methods Limitations: using the marginal models alone, it is difficult to specify the measures of dependence in outcomes due to association between outcomes as well as between outcomes and explanatory variables. Neither conditional approach alone can resolve the problems

Model proposed Bivariate Bernoulli distribution, a joint model. Model setting: Joint probability:

Model proposed The bivariate probabilities as a function of covariates X are as follows: In terms of the exponential family for the generalized linear model:

Model proposed Log likelihood for size n: Link function: where η0 is the baseline link function, η2 is the link function for Y1, η1 is the link function for Y2 and η3 is the link function for dependence between Y1 and Y2.

Model proposed Express the conditional probabilities in terms of the logit link function as:

Model proposed The marginal probabilities are as: Assume:

Model proposed Now write:

Model proposed Thus,

Model proposed Hence, if there is no association between Y1 and Y2 then P00(x)*P11(x)/P01(x)*P10(x) = 1 and Ln(1)=0. This indicates β11= β01. This is a new formulation to measure the dependence in terms of the parameters of the conditional models obtained from the joint mass function.

Model proposed In case of no dependence, it is expected that η3 = 0 which is evident if, alternatively, β11= β01. We can test the equality of two sets of regression parameters, β11 and β01 using the statistic: which is distributed asymptotically as chi- square with p+1 degree of freedom.

Model proposed Comparison with regressive model, another widely used technique. maybe a typo Regressive model: It is noteworthy that γ is the parameter associated with the outcome variable Y1 such that, H0 : γ = 0 indicates a lack of dependence between Y1 and Y2

Model proposed Comparison with regressive model Regressive model: However, one of the major limitations arises from the fact that dependence in Y1 and Y2 depends on the dependence between the outcome variables and the covariates as well. Hence, in many instances, the regressive model may fail to recognize the true nature of relationship between Y1 and Y2 in the presence of covariates X1, X2,..., Xp in the model.

Simulation df=2 H0: independence

Simulation “It is clearly evident that the true correlations between Y1 and Y2 are zero and the average conditional correlations between Y2 and X for given Y1 = 0 and Y1 = 1 are similar or closely indicating a lack of dependence in the outcome variables as revealed by the proposed test (17). However, the regressive model (18) fails to reveal that due to the non-zero correlation between the previous outcome variable (Y1) and explanatory variable (X). This is indicative of the fact that the proposed test can reveal the nature of dependence in a wider range of situations in reality.”

Conclusion “The problem of dependence in the repeated measures outcomes is one of the formidable challenges to the researchers. In the past, the problem had been resolved on the basis of marginal models with very strict assumptions. The models based on GEE with various correlation structures have been employed in most of the cases. Another widely used technique is the regressive logistic regression model. However, both these approaches provide either inadequate or, in some instances, misleading results due to use of only marginal or conditional approaches, instead of joint models. We need to specify the bivariate or multivariate outcomes specifying the underlying correlations for a more detailed and more meaningful models. This paper shows the model for bivariate binary data using the conditional and marginal models to specify the joint bivariate probability functions. A test procedure is suggested for testing the dependence.” A heuristic point of this paper for me is that it kind of parameterizing something that is hard to measure and transforming the problem into a parameter testing problem!