Understanding regression. 2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you.

Slides:



Advertisements
Similar presentations
Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
Advertisements

Lesson 10: Linear Regression and Correlation
Doing an Econometric Project Or Q4 on the Exam. Learning Objectives 1.Outline how you go about doing your own econometric project 2.How to answer Q4 on.
Examining Relationships Chapter 3. Least Squares Regression Line If the data in a scatterplot appears to be linear, we often like to model the data by.
Inferential Statistics
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Combining the strengths of UMIST and The Victoria University of Manchester An analysis of the relationship between time spent on active leisure and educational.
Yard. Doç. Dr. Tarkan Erdik Regression analysis - Week 12 1.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 7: Demand Estimation and Forecasting.
SIMPLE LINEAR REGRESSION
Data Analysis Statistics. Inferential statistics.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Business Statistics - QBM117 Least squares regression.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Lecture 5: Simple Linear Regression
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Correlation and Regression Analysis
Bivariate linear regression ASW, Chapter 12 Economics 224 – Notes for November 12, 2008.
DUMMY VARIABLES BY HARUNA ISSAHAKU Haruna Issahaku.
Correlation and Linear Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Linear Regression and Correlation
Correlation and Linear Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Data Collection & Processing Hand Grip Strength P textbook.
Cultural Difference: Investment Attitudes and Behaviors of High Income Americans Tahira K. Hira – Iowa State University
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Determining Wages: The Changing Role of Education Professor David L. Schaffer and Jacob P. Raleigh, Economics Department We gratefully acknowledge generous.
Lecture 3-3 Summarizing r relationships among variables © 1.
Psychology’s Statistics Statistical Methods. Statistics  The overall purpose of statistics is to make to organize and make data more meaningful.  Ex.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Multivariate Descriptive Research In the previous lecture, we discussed ways to quantify the relationship between two variables when those variables are.
Least Squares Regression: y on x © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
LABOUR FORCE PARTICIPATION, EARNINGS AND INEQUALITY IN NIGERIA
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Linear regression Correlation. Suppose we found the age and weight of a sample of 10 adults. Create a scatterplot of the data below. Is there any relationship.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
Social Class and Wages in post-Soviet Russia Alexey Bessudnov DPhil candidate St.Antony's College CEELBAS seminar 30 May 2008 Please note that this is.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Stats Methods at IC Lecture 3: Regression.
Correlation and Linear Regression
Statistical analysis.
Statistical analysis.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
BUS 308 Competitive Success-- snaptutorial.com
BUS 308 Education for Service-- snaptutorial.com
BUS 308 Teaching Effectively-- snaptutorial.com
15.1 The Role of Statistics in the Research Process
Linear Regression and Correlation
Presentation transcript:

Understanding regression

2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you had “metric eyes” (rather like Superman’s x-ray vision) and could accurately estimate the height of each person as they passed through. After 10 people had gone through the door, what would be the best prediction for the height of the eleventh person? Answer – the average This is why the “average” is also called the “expected value.”

3 The expected value of the height of the 11 th is the average of the previous 10.

4 Imagine that as you are estimating the height of the persons coming through the door, you also note their gender. Information on gender improves our ability to predict height.

5 Regression Two basic purposes: – Explanation – Prediction Regression is an efficient way to analyze the structure of the data. A regression model is a sentence that connects the average or expected value of something (a person’s height) in multi-dimensions (multivariate analysis).

6 The regression sentence The regression equation may be read as a sentence that summarizes the simultaneous influence of independent variables (causes or drivers) on a single dependent variable (effects or outcomes). Here is a simple, single variable model. Height = D (D = 1 for a man and 0 for a woman) The regression sentence: The predicted (expected) height for people coming through the door is 165 cm plus 5 cm if that person is a man. In other words: Women have an expected height of 165 cm and men have an expected height of 170 cm. Regression coefficient

7 Adding variables Adding more variables conditions our prediction (expectation) for the height of people. Typical variables could include: – number of litres of milk consumed per week – income of parents ($’000s) – kilometres above sea level at birth

8 Number of litres of milk consumed each week HEIGHT (cm) X X X X X X X X X 0 5 Height = L 100 For every litre consumed, height increases 15 cm. No milk consumption implies an expected height of 100 cm. Someone who drinks 20 litres of milk each week has an expected height of 400 cm.

9 Regression sentences An earnings regression simply relates the expected earnings based on several variables. Y = 6, AGE YEARS_ED (Y = annual income) “Expected annual income for the sample is $6,000 plus times AGE plus times years of education.” A 30-year-old with 12 years of education can expect to earn: $6, (30) (12) = $24,021 For every year of education, annual salary increases by $ Regression coefficient

10 Example - LMAPD impact analysis Wanted to associate labour market programming with outcome Wanted to assess the presence and intensity of programming Built a regression sentence that expressed this relationship Hours = a 1 + a 2 Female + a 3 Aboriginal + … + a k-1 EmpIoy + a k # Employ Worked Inter. Inter. Output appears more complicated, but follows the same principles. Output

11 Ex. LMAPD: Estimating VR counselling hours (LMAPD VRhours) Admin data includes total cost of services spent by the VR program on a particular client, but it does not include the cost of VR counselling. To estimate VR counselling costs per client, 281 VR clients with currently active VR counsellors were selected. VR counsellors were provided a short questionnaire including the following question to be answered for each VR client: On average, over the entire time that you have been this client’s counsellor, how many hours per month did you spend on this client’s case?

12 Ex. LMAPD VRhours Surveys for 270 clients were returned. Information from the surveys was merged with the administrative data. The next step was to run a regression using the sample of 270 VR clients to calculate the coefficients for the independent variables (from the admin data) to estimate VR counselling costs for the entire sample of VR clients (n=1,062).

13 Ex. LMAPD VRhours Dependent variable: Average monthly time in hours spent by VR counsellors on the clients’ files (survey question) Independent variables: – Demographic: gender, Aboriginal status, minority status, age, disability type – Service data: urban/rural service delivery region, organization that delivered services

14 Ex. LMAPD VRhours: Independent variables VariablesTypeMean (Male gender)M.E. dummy0.61 Female genderM.E. dummy0.39 (Non-Aboriginal)M.E. dummy0.98 AboriginalM.E. dummy0.02 (Non-minority)M.E. dummy0.99 MinorityM.E. dummy0.01 AgeContinuous35.09 Cognitive disabilityN.E. dummy0.17 Physical disabilityN.E. dummy0.30 Psychiatric disabilityN.E. dummy0.28 Hearing disabilityN.E. dummy0.09 Vision disabilityN.E. dummy0.13 Learning disabilityN.E. dummy0.14 (Urban service delivery region)M.E. dummy0.69 Rural service delivery regionM.E. dummy0.31 (Provincial service delivery)M.E. dummy0.52 SMD service deliveryM.E. dummy0.31 CPA service deliveryM.E. dummy0.06 CNIB service deliveryM.E. dummy0.12

15 Ex. LMAPD VRhours: Independent variables Variables in parentheses (X) are the excluded dummy variables from the regression. Types of variables: – Continuous – Mutually exclusive dummy variable – Not mutually exclusive dummy variable

16 Ex. LMAPD VRhours: Regression results Independent variablesCoefficientP-value Constant Female gender (fg) Aboriginal (ab) Minority (m) Age (ag) Cognitive disability (cd) Physical disability (phd) Psychiatric disability (psd) Hearing disability (hd) Vision disability (vd) Learning disability (ld) Rural service delivery region (r) SMD service delivery (smd) CPA service delivery (cpa) CNIB service delivery (cnib) Sample: 270 Adj. R 2 :

17 Ex. LMAPD VRhours: Coefficients Aboriginal status is associated with fewer hours per month (-1.14). Minority status required 3.98 hours more of VR counselling. Rural clients logged slightly more hours in counselling than urban clients (0.17, not statistically significant). Those with physical and hearing disabilities require substantial support.

18 Ex. LMAPD VRhours: Regression sentence VRhours = fg + (-1.14ab) m + (- 0.01)ag + 0.2cd phd psd hd vd + (-0.61)ld r + (-6.06)smd + (- 5.16)cpa cnib Can now use the estimated coefficients and the independent variable values for all 1,062 VR participants to calculate the estimated number of VR hours required for each client.

19 Assessing the quality of a regression 1.Goodness of fit (R 2 ) measures the percentage of variation in Y explained by the model. The R 2 varies between 0 (low) and 1 (high).

20 Assessing the quality of a regression 2. Statistical significance The higher the coefficient, the more confident we are that it is not zero. The lower the SD, the more confident we are that we have measured the effect reliably. Coefficient divided by standard deviation is the t value. The rule of 2 is applied again as a “t” test. Y = 6, AGE YEARS_ED (2.5) (3.8) (1.2) Computer output reports t values (as above) and standard errors, p values and a host of other diagnostics.

21 Deaths = A + B (Number of installations) (The test is whether B is positive.) Model 1 Photo radar and traffic safety Model 2 Deaths = A + B (Year) + C (D) D = 0 (year 2001) (The test is whether C is negative.) N u m b e r o f d e a t h s f r o m t r a f f i c a c c i d e n t s Number of photo radar installations X X X X X X X X X Traffic accidents and photo radar for Canada’s largest cities X X X X X X

22 Regression variables Dependent (Outcome) Independent (Causal) – Context (age, gender, ethnicity) – Driver (policy) Policy can be measured directly ($, person years) or as a change in state (dummy variable).

23 Building a regression model Identify the dependent (effect or outcome) variable(s). What are the independent (causal) variables? Are there policy impacts? How are these to be measured?