Logistic Regression using Excel OLS with Nudge

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Brief introduction on Logistic Regression
Correlation and regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
x – independent variable (input)
Decision Tree Rong Jin. Determine Milage Per Gallon.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
An Introduction to Logistic Regression
2015 USCOTS V1 1 Milo Schield, Augsburg College Member: International Statistical Institute US Rep: International Statistical Literacy Project Director,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Second Stats Course: What should it be? Marc Isaacson Augsburg College Dept of Business Administration MILO SCHIELD Augsburg College Dept of Business Administration.
Chapter 6 Regression Algorithms in Data Mining
2014-Schield-DSI-Stats-Curricullum V0A 1 by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
2015 ASA V2A 1 by Milo Schield, Augsburg College Member: International Statistical Institute US Rep: International Statistical Literacy Project Director,
Issues in Estimation Data Generating Process:
Model Regress Linear 3Factor Excel 2013 V0F 1 by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project.
2015 ASA V4 1 by Milo Schield, Augsburg College Member: International Statistical Institute US Rep: International Statistical Literacy Project Director,
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Model Trendline Linear 3Factor Excel by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
Logistic Regression An Introduction. Uses Designed for survival analysis- binary response For predicting a chance, probability, proportion or percentage.
Lecture 3 (Chapter 4). Linear Models for Longitudinal Data Linear Regression Model (Review) Ordinary Least Squares (OLS) Maximum Likelihood Estimation.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Statistics Critical Thinking in Intro Stats Roger Woodard.
Advanced Data Analytics
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Statistics 200 Lecture #9 Tuesday, September 20, 2016
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
PDF, Normal Distribution and Linear Regression
Classification Methods
Statistical Data Analysis - Lecture /04/03
Experimental Psychology
Probability & Statistics
Chapter 3: Describing Relationships
Logistic regression One of the most common types of modeling in the biomedical literature Especially case-control studies Used when the outcome is binary.
QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.
David M. Levine, Baruch College (CUNY)
Statistical Fallacies Catastrophes & Contributions,
Hot Hand: Better than Chance
Scatter Plots of Data with Various Correlation Coefficients
Using real world media examples to teach statistical literacy
Linear Model Selection and regularization
Why Did Clinton Lose? Why Did Trump Win?
Introduction to Logistic Regression
Lexico-grammar: From simple counts to complex models
11C Line of Best Fit By Eye, 11D Linear Regression
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
Count Models 2 Sociology 8811 Lecture 13
US Infant Mortality Rate: National Embarrassment?
Why 25% of Voter Polls are Wrong
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Statistically-Significant Correlations
Classification Methods
Presentation transcript:

Logistic Regression using Excel OLS with Nudge Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Logistic Regression using Excel OLS with Nudge Milo Schield, Augsburg College Elected Member: International Statistical Institute US Rep: International Statistical Literacy Project VP. National Numeracy Network JSM Philadelphia July 31, 2017 www.StatLit.org/pdf/2017-Schield-ASA-Slides.pdf 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 1 1 1

Logistic Regression (LR) is Common and Important Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Logistic Regression (LR) is Common and Important Yes/No decisions (binary outcomes) are common in Marketing: Predicting whether someone will buy Finance: Deciding whether to grant a loan Medicine: Determining whether one has a condition Epidemiology: Identifying related factors to an outcome Logistic regression is the most common way of modelling binary outcomes. It is one of the main topics in Stat 200. It is almost never taught in Stat 100. But it should be!!! 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 2 2 2

Why Isn’t Logistic Regression Taught in Intro Course? Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Why Isn’t Logistic Regression Taught in Intro Course? LR isn’t taught in Stat 100 for several reasons: Complexity: Maximum likelihood estimation is complex as are odds, log-odds and quality measures. Availability: Not available in Excel or on calculators. Infinity: |Log(Odds)| goes to infinity when p=0 or p=1 Non-analytic: Requires trial & error to find best solution. Time: No extra time for extra topics in Intro Statistics. 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 3 3 3

The Data: Height and Gender Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 The Data: Height and Gender . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 4 4 4

Simple Model #1: Connect the Mean Heights Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Simple Model #1: Connect the Mean Heights . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 5 5 5

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Simple Model #2: Linear . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 6 6 6

Simple Model #3: Logistic Curve Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Simple Model #3: Logistic Curve . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 7 7 7

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Simple Solution #4 This simple solution involves two shortcuts: Use the logistic function, but nudge the zero-one data to be epsilon and one minus epsilon. This ‘nudge’ eliminates the infinities in Ln[Odds(p)]. Use the Ordinary Least Squares (OLS) in place of Maximum Likelihood Estimation (MLE). This eliminates the need for industrial-strength software. Benefits: This allows more attention to the results and to subsequent topics such as confounding and classification. 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 8 8 8

Ln[Odds(Nudged Prob)] Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Ln[Odds(Nudged Prob)] . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 9 9 9

OLS Results: Regress Gender on Height Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 OLS Results: Regress Gender on Height . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 10 10 10

Translate back to P-space; Plot Probability vs. Height Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Translate back to P-space; Plot Probability vs. Height . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 11 11 11

How close is OLS to MLE? Height: Fairly close… Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 How close is OLS to MLE? Height: Fairly close… . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 12 12 12

MLE vs. OLS+Nudge: Significant Difference? No! Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 MLE vs. OLS+Nudge: Significant Difference? No! . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 13 13 13

How close is OLS to MLE? Weight: Fairly close… Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 How close is OLS to MLE? Weight: Fairly close… . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 14 14 14

MLE vs. OLS+Nudge: No Significant Difference Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 MLE vs. OLS+Nudge: No Significant Difference . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 15 15 15

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Quotes “the maximized log likelihood method has always impressed me as an exercise in excessive fine-tuning, reminiscent on some occasions of what Alfred North Whitehead identified as the fallacy of misplaced concreteness, and on others of what Freud described as the narcissism of small differences.” Comparing exact MLE with OLS regression of Ln[Odds(p)] where p is for grouped data: “The second reason is that in most real-world cases there is little if any practical difference between the results of the two methods.” Richard Lowry, Vassar. http://vassarstats.net/logreg1.html 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 16 16 16

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News StatLit for Managers Statistical Literacy for ManagersStatLit for Managers 15 May 2008 2013 1 March 20132013 Recommendation Those teaching intro statistics needs to think broadly. Going deeper is good for those who plan to continue on. But almost none of those taking Stat 101 will take Stat 201. Introducing logistic regression using OLS is simple. The difference between MLE and OLS may not be significant . Introducing logistic regression in STAT 101 opens the door for other multivariate items such as confounding, classification analysis and discriminant analysis. 2008SchieldNNN6up.pdf 2013Schield-MBAA www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 17 17 17

So Why Won’t It Be Taught? Analyzing Numbers in the News StatLit for Managers Statistical Literacy for ManagersStatLit for Managers 15 May 2008 2013 1 March 20132013 So Why Won’t It Be Taught? OLS is not right in this case. We don’t want to teach our students bad methods. This OLS+nudge shortcut has a serious lack of rigor. This is unprofessional; we shouldn’t allow it. Reply: Lack of rigor vs. rigor mortis? Can the perfect be the enemy of the good? What is our goal? For students to understand some important ideas or be taught correctly even if they don’t understand? 2008SchieldNNN6up.pdf 2013Schield-MBAA www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 18 18 18

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Conclusion Focus on GAISE 2017 goals. Multivariate thinking More focus on confounding See Schield (2016) Offering Stat 102: Social Statistics for Decision Makers. http://www.statlit.org/pdf/2016-Schield-IASE.pdf 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 19 19 19

Much More Important Issues Un-Scientific American (2017) Statistical Literacy for ManagersStatLit for Managers StatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Much More Important Issues Un-Scientific American (2017) . 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 20 20 20

Much More Important Issues Un-Scientific American Analyzing Numbers in the News StatLit for Managers Statistical Literacy for ManagersStatLit for Managers 15 May 2008 2013 1 March 20132013 Much More Important Issues Un-Scientific American Three strikes and you are out! Association is not statistically significant Association is not materially significant Author knows that both of these are true, yet puts the association in the headline to the story Moral: Statistical educators need to put more attention on misuses of statistics in the everyday media. To do less is professional negligence. 2008SchieldNNN6up.pdf 2013Schield-MBAA www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 21 21 21

Statistical Literacy for ManagersStatLit for Managers Analyzing Numbers in the News 15 May 2008 2013 1 March 20132013 Bibliography Carlberg, Conrad (2012). Decision Analytics: Microsoft Excel. Que Publishing. Lowry, R. (2017). E-mail http://vassarstats.net/logreg1.html Moore, David (2001). Statistical Literacy and Statistical Competence in the New Century. IASE Proceedings. http://iase-web.org/documents/papers/sat2001/Moore.pdf Schield, Milo (2017). Tools at www.StatLit.org/tools.htm Schield, Milo (2016). Logistic Regression using Minitab and Pulse dataset. http://www.statlit.org/pdf/2016- Minitab-MLE1-Test1.pdf 2008SchieldNNN6up.pdf www.StatLit.org/pdf/2013-Schield-MBAA-6up.pdf2013Schield-MBAA 2013Schield-MBAA 22 22 22