Binary logistic regression. Characteristic Regression model for target categorized variable explanatory variables – continuous and categorical Estimate.

Slides:



Advertisements
Similar presentations
Example 12.2 Multicollinearity | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b a12.1a a12.1b b The Problem.
Advertisements

Controlling for Time Dependent Confounding Using Marginal Structural Models in the Case of a Continuous Treatment O Wang 1, T McMullan 2 1 Amgen, Thousand.
Linear Regression.
Brief introduction on Logistic Regression
Logistic Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Multiple Linear Regression Model
Chapter 7 – K-Nearest-Neighbor
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Exam 1 – 115a. Basic Probability For any event E, The union of two sets A and B, A  B, includes items that are in either A or B. The intersection, A.
Principles of Supply Chain Management: A Balanced Approach
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Introduction to Educational Statistics
Topic 3: Regression.
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Maintenance Forecasting and Capacity Planning
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
5.2 Input Selection 5.3 Stopped Training
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Chapter 12 Probability. Chapter 12 The probability of an occurrence is written as P(A) and is equal to.
CHAPTER 5 DEMAND FORECASTING
CS 478 – Tools for Machine Learning and Data Mining Linear and Logistic Regression (Adapted from various sources) (e.g., Luiz Pessoa PY 206 class at Brown.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Chapter 4: Introduction to Predictive Modeling: Regressions
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Multivariate Data Analysis Chapter 1 - Introduction.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
1 Introduction to Modeling Beyond the Basics (Chapter 7)
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Predicting Mortgage Pre-payment Risk. Introduction Definition Borrower pays off the loan before the contracted term loan length. Lender loses future part.
Biostatistics Class 2 Probability 2/1/2000.
BINARY LOGISTIC REGRESSION
LINEAR REGRESSION 1.
Chapter 7. Classification and Prediction
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
Logistic Regression Logistic Regression is used to study or model the association between a binary response variable (y) and a set of explanatory variables.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
LOGISTIC REGRESSION 1.
Introduction to logistic regression a.k.a. Varbrul
What is Regression Analysis?
Introduction to Logistic Regression
CHAPTER 1 INTRODUCTION Prem Mann, Introductory Statistics, 8/E Copyright © 2013 John Wiley & Sons. All rights reserved.
Logistic Regression.
Presentation transcript:

Binary logistic regression

Characteristic Regression model for target categorized variable explanatory variables – continuous and categorical Estimate of probability to categorize the dependent variable Enable to interpret the solution Sensitive to multicollinearity Exacting to data preparation

Applications In general: a response model to predict the probability of response To predict the probability to lose the certain type of client To predict fraud To predict the purchase of certain goods …….

Logistic regression model I Binary dependent variable  1…event occurs  0…event does not occur P(Y=1) how depends on values of independent variables?

Logistic regression model II Formula In classic linear regression model is within (-∞; ∞) In case of binary variable than indicate probability Y=1 Probability is within 0 and 1 To express probability can not be used simple linear combination of inputs Chance P/(1-P)…interval (0;∞) Logit ln(P/(1-P)…interval (-∞; ∞) and ln(P/(1-P)…the same interval

Logistic regression model III Logit of the P value is expressed as weighted sum of values of independent variable values.

Regression relation probability chance logit Logistic function

Categorical input variable – contrasts X1X2X3 Category 1100 Category 2010 Category 3001 Category 4000 reference category Contrast type Indicator

Contrasts I Convert categorical variables to several numerical variables (for example 0-1) Create contrasts with respect to interpretation  Ordinal/nominal variables  Reference catogories are not nedeed in all cases Contrast specification does not influence prediction

Contrasts II a) Indicator – each category is 0-1 variables, last or first category is skipped b) Simple – each category (except reference category) is compared with reference category c) Repeated – each category (except first) is compared with previous category d) Difference – each category (except first) is compared with average effect of previous categories c), d) Ordinal variables

Data preparation LR is sensitive to multicollinearity  Necessary to reduce the number of variables Pay special attention to  missing values  extremes In practice are often (all) input variables categorized

Categorization of variables Possible way how to smooth extremes Categorization  Based on experts  Based on quantiles  Optimal categorization with respect to target variable Categorized variable would not be based on many categories – it causes mutual relation of variables  Merging of categories