Generalized Linear Models (GLMs) and Their Applications.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Linear Regression.
Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
Review of Basic Probability and Statistics
Binary Response Lecture 22 Lecture 22.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Visual Recognition Tutorial
Statistics.
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
Maximum likelihood (ML)
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Correlation & Regression
Review of Lecture Two Linear Regression Normal Equation
Pairs of Random Variables Random Process. Introduction  In this lecture you will study:  Joint pmf, cdf, and pdf  Joint moments  The degree of “correlation”
Introduction to Linear Regression and Correlation Analysis
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
1 7.3 RANDOM VARIABLES When the variables in question are quantitative, they are known as random variables. A random variable, X, is a quantitative variable.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 6: Sampling Distributions
The simple linear regression model and parameter estimation
Chapter 7. Classification and Prediction
Statistical Modelling
Probability Theory and Parameter Estimation I
Linear Regression.
Generalized Linear Models
CHAPTER 12 More About Regression
Generalized Linear Models
Simple Linear Regression - Introduction
Review of Hypothesis Testing
Regression Models - Introduction
CHAPTER 12 More About Regression
Parametric Methods Berlin Chen, 2005 References:
CHAPTER 12 More About Regression
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Berlin Chen Department of Computer Science & Information Engineering
Regression Models - Introduction
Presentation transcript:

Generalized Linear Models (GLMs) and Their Applications

Motivation for GLMs Predict values of a single dependent or response variable (Y) using several independent or explanatory variables Treat Y as a random variable

Random Variables A random variable Z maps outcomes of an experiment to the real numbers Random Variables can be discrete or continuous and accordingly have a pmf or pdf A pmf will always sum to 1 over all real numbers and a pdf will always integrate to 1 over all real numbers Pmf’s and pdf’s are always nonnegative

Random Variables Suppose p is a pdf/pmf and suppose the pdf/pmf of R is p(x) and the pdf/pmf of S is p(y). Then we say that R and S have the same distribution. R and S are said to be independent if the probability of an event involving only R is unaffected by the occurrence of an event involving only S

Simple Linear Regression (SLR) Models a dependent or response variable (Y) based off of an independent or explanatory variable (X) Creates a line running through a plot of data points Y is random with a Normal distribution, X is fixed Draw a sample of size n from a population, construct model using this sample The ith observation has an X value of X i and a Y value of Y i Independence of the Y i ‘s

Least Squares Y i = αX i + β+e i where e i is a normal random variable representing error; thus Y i is normally distributed by a property of normal random variables SLR model has the form Ŷ i = αX i + β Ŷ i is the predicted or fitted value of Y i, the actual observation Least squares criterion: Choose α and β such that ∑(Y i -(Ŷ i )) 2 = ∑(Y i -αX i -β) 2 is minimized To find α and β, we differentiate ∑(Y i -Ŷ i ) 2 with respect to α and with respect to β and set both equations equal to 0 The α and β that satisfy the least squares criterion are unique

Example of SLR-Okun’s Law Every 1% rise in unemployment causes GDP to fall about 2% below potential GDP(when all resources are fully utilized) Change in GDP is Y, Change in unemployment rate is X Following graph shows change in US GDP and US unemployment rate every quarter from , and fits a regression line through it Every quarter is a data point, so the sample size is 220

Example of SLR-Okun’s Law We can write Okun’s law as Ŷ i =0.03-2X i for the US 0.03 here means that at full employment (when X=0), GDP increases by 3% a year

Generalized Linear Models (GLM) Y i ’s are independent, from the same type of distribution but DON’T have the same parameters Each Y i is from the exponential family of distributions μ is the vector of means of the Y i ’s and has size n x 1 There is a function g(μ) where μ is a vector, g is invertible and g(μ)=Xβ We use g to transform μ in such a way that we can estimate it using a linear combination of the explanatory variables

Generalized Linear Models (GLM) g is called the link function and it depends on what we assume the response distribution is – If the response distribution is Normal, g is the identity function and our model will be μ=Xβ – This is the model for Multiple Linear Regression (MLR) X is called the design matrix and has size n x p, so the sample size is n and there are p explanatory variables β is the vector of parameters and has size p x 1 β can be estimated through using maximum likelihood functions

Example of GLM- Binary Variables and Logistic Regression n independent variables, Y 1 …Y n, each of them have a binomial distribution Binomial Distribution: Y i represents the number of successes in n i independent trials, the probability of success in each trial is π i Since π i is a probability it has to be between 0 and 1 We can model π i by using a function whose range is between 0 and 1

Example of GLM- Binary Variables and Logistic Regression Example: the proportion of insects killed at varying dosages of a pesticide Link function g is called logit function and is log(π i /(1-π i ))