Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:

Slides:



Advertisements
Similar presentations
Copula Regression By Rahul A. Parsa Drake University &
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Acknowledgement: Thanks to Professor Pagano
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chapter 1 Probability Theory (i) : One Random Variable
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.

Generalised linear models
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
3-1 Introduction Experiment Random Random experiment.
Continuous Random Variables and Probability Distributions
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Fixing problems with the model Transforming the data so that the simple linear regression model is okay for the transformed data.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Review of Lecture Two Linear Regression Normal Equation
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Distributions Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
PBG 650 Advanced Plant Breeding
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Modeling and Simulation CS 313
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Linear Model. Formal Definition General Linear Model.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Estimation in Marginal Models (GEE and Robust Estimation)
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Generalized Linear Models (GLMs) and Their Applications.
Lec. 08 – Discrete (and Continuous) Probability Distributions.
Sampling and estimation Petter Mostad
Statistics……revisited
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Continuous Random Variables and Probability Distributions
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Modeling and Simulation CS 313
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Statistical Modelling
Logistic Regression APKC – STATS AFAC (2016).
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Generalized Linear Models
Generalized Linear Models
Generalized Linear Models (GLM) in R
Introduction to logistic regression a.k.a. Varbrul
Chapter 5 Statistical Models in Simulation
Distributions and Concepts in Probability Theory
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Statistical Assumptions for SLR
EC 331 The Theory of and applications of Maximum Likelihood Method
What is Regression Analysis?
STATISTICAL MODELS.
Generalized Additive Model
Presentation transcript:

Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part: The systematic part: These two elements are the basic building blocks of generalized linear models.

The systematic part Generalized linear model, systematic part: The covariates influence the distribution of response through the linear predictor: There is a link-function that links the expectation to the linear predictor:

The generalization from linear models to GLM GLMs are a generalization of linear normal models in two directions:

Example: binomial distribution Definition: the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

Example For the binomial distribution The variance is a function of the mean: The linear model for the logit: ____________________ is a non-linear model for the probability ___________________.

The exponential family Many distributions encountered in practice (ex: normal, binomial, Poisson and Gamma distribution) share a common structure:

Example of the exponential family: Normal distribution

Example of the exponential family: Binomial

Example of the exponential family The Poisson distribution: It is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently to the time. Ex: The number of phone calls received by a telephone operator in a 10-minute period. The number of typos per page made by a secretary.

Poisson distribution The Poisson distribution belongs to the exponential family:

Mean and variance in the exponential family It can be shown that the mean and variance in the exponential family is:

Mean and variance example: Poisson For the Poisson model, mean and variance are: To summarize, for any given distribution we obtain a specific form of b which in turn determines the variance function. The converse is also true: Hence specifying a distribution and a variance function is two sides of the same coin as long as we work with exponential families.

Various variance functions

The link function The link function is a function which relates the mean to the linear predictor: Various link functions have been illustrated so far:

Canonical link For each distribution there is a specific link function which yields “nice” mathematical and numerical properties in connection with the estimation process. This link function is called the canonical link:

Specification of GLM In practice, a GLM is specified by three steps: In this connection it is important to be aware of the following: Most statistical packages will by default use the canonical link function unless another one is explicitly provided.

R code The glm function in R is used for fitting generalized linear models. Specification of the linear predictor: Specification of the distribution and the link function: e.g. family=Gamma(link=log)

Remember that the specification of a distribution yields a specific variance function. Not all possible combinations of a distribution and a link function are allowed in R.

Special aspects for binomial data Simulate artificial Bernoulli observations with different event probabilities for two groups (the number of trails N is equal to 1): R code group <- rep(c("A", "B"), c(30, 45)) logit.pi <- ifelse(group == "B", 0.7, ) group <- factor(group) pi <- plogis(logit.pi) N <- rep(1, length(group)) events <- rbinom(length(group), size = N, prob = pi) dat <- data.frame(group, N, events)

Analysis of simulated data Model: ___________________________________ The response is a two-column matrix containing events and non- events: f1<-glm(cbind(events,N-events)~group, family=binomial,data=dat) Define proportions: dat$prop<-with(dat, events/N) and use these as the response and the number of trails N as weights in the fit: f2<-glm(prop~group, family=binomial, weights=N, data=dat) Use the number of events directly as the response f3<-glm(events~group,family=binomial,data=dat)

Fitting GLMs– logistic regression Consider a data set where the response variable takes only 0 or 1 values and the single covariate variable is continues numerical type. Examples If we apply a simple linear regression model_____ to fit the data, there are some problems. Conclusion: it is not appropriate to use the simple linear regression to model regression data with binary responses.

Logistic regression Solution is to use the logistic function: The formal definition of logistic model for binary response with p variable:

Logistic regression How to interpret the model? In logistic model, the odds of “success”: The logistic model for binary data can be slightly modified

Modified to cover binomial data

Bernoulli and Poisson distribution Likelihood: MLE estimates:

Parameter estimation in GLMs

IWLS Algorithm Iterative weighted least square algorithm: