Intermediate Data Collection & Analysis Steven A. Allshouse Coordinator of Research and Analysis November 5, 2008.

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Things to do in Lecture 1 Outline basic concepts of causality
Lesson 10: Linear Regression and Correlation
Linear regression and correlation
Chapter 4 The Relation between Two Variables
Chapter 3 Bivariate Data
Chapter 4 Part I - Introduction to Simple Linear Regression Applied Management Science for Decision Making, 2e © 2014 Pearson Learning Solutions Philip.
Appendix to Chapter 1 Mathematics Used in Microeconomics © 2004 Thomson Learning/South-Western.
Describing the Relation Between Two Variables
CORRELATION.
Linear Regression and Correlation Analysis
Linear Regression Analysis
Chapter 8: Bivariate Regression and Correlation
Lecture 3-2 Summarizing Relationships among variables ©
Copyright (c) 2000 by Harcourt, Inc. All rights reserved. Functions of One Variable Variables: The basic elements of algebra, usually called X, Y, and.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
Correlation Scatter Plots Correlation Coefficients Significance Test.
 Once data is collected and organized, we need to analyze the strength of the relationship and formalize it with an equation  By understanding the strength.
AP Calculus 2007 Mrs. Powell and Ms. Sheehan. For this project you will… Investigate a data set from the internet about a topic of your choice Your data.
Scatter Plots and Linear Correlation. How do you determine if something causes something else to happen? We want to see if the dependent variable (response.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
Chapter 6 & 7 Linear Regression & Correlation
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
1.6 Linear Regression & the Correlation Coefficient.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Statistical Methods Statistical Methods Descriptive Inferential
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Managerial Economics Demand Estimation & Forecasting.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Regression Analysis A statistical procedure used to find relations among a set of variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Creating a Residual Plot and Investigating the Correlation Coefficient.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
7.1 Draw Scatter Plots and Best Fitting Lines Pg. 255 Notetaking Guide Pg. 255 Notetaking Guide.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
All Rights Reserved to Kardan University 2014 Kardan University Kardan.edu.af.
Chapter 5: Introductory Linear Regression. INTRODUCTION TO LINEAR REGRESSION Regression – is a statistical procedure for establishing the relationship.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Forecasting.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
STATISTICS 13.0 Linear Time Series Trend “Time Series ”- Time Series Forecasting Method.
Scatter Plots and Correlation Coefficients
CORRELATION.
Correlation & Regression
Basic Estimation Techniques
Chapter 5 STATISTICS (PART 4).
Section 13.7 Linear Correlation and Regression
Mathematical Modeling
CORRELATION.
Correlation and Regression
Mathematical Modeling Making Predictions with Data
Lecture Notes The Relation between Two Variables Q Q
Mathematical Modeling Making Predictions with Data
7.1 Draw Scatter Plots and Best Fitting Lines
BEC 30325: MANAGERIAL ECONOMICS
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
Presentation transcript:

Intermediate Data Collection & Analysis Steven A. Allshouse Coordinator of Research and Analysis November 5, 2008

Organization of the Class Part I – Discussion of Correlation and Causation. Part II – Quantitative Examples of Correlation and Causation. Part III – How to Measure Correlation (OLS Method). Part IV – Common Pitfalls of the OLS Method. Part V – MS Excel Exercise.

Part I – Qualitative Examples of Correlation and Causation

Correlation A situation in which one variable or set of variables tends to be associated with a second variable or set of variables, but is not thought to bring about that second variable or set of variables. Examples: The size of a person’s left foot and the size of his or her right foot; women’s hemlines and the performance of the stock market; and the number of cavities in elementary school children and the size of their vocabulary. Note: Correlation can be positive or negative; positive means as X increases, so does Y; negative means as X increases, Y decreases.

Causation A situation in which one variable or set of variables is thought to bring about, or help bring about, a second variable or set of variables. Examples: Alcohol consumption/traffic accidents; average daily temperatures/heating oil consumption. Notes: Causation usually implies correlation; If X causes Y, where we see X we would expect to see Y. Causation can be positive or negative; an increase in X can cause an increase or a decrease in Y. The direction of causation can run one or both ways; X causes Y, but Y might or might not cause X.

A Case of Causation? There is a strong positive correlation between the number of fire engines that respond to a fire and the number of fatalities in that fire, i.e., the greater the number of fire engines, the greater the number of deaths. Question: Does this fact mean that Albemarle County could save lives by decreasing the number of fire engines sent to a given fire?

Additional Notes about Correlation & Causation Direction of causation usually determines what we identify as “independent” and “dependent” variables; Independent variable X causes the dependent variable Y. X and Y are correlated, but Y does not cause X. Identification problem: Smoke actually does not cause the fire alarm to be pulled; fire is the underlying cause. Similarly, an increase in, say, education can be seen as causing an increase in income, but educational attainment might just be a “signal” of some underlying ability.

Part II – Quantitative Examples of Correlation and Causation

Part III – How to Estimate Correlation

Ordinary Least Squares (OLS) Method OLS is mathematical technique that estimates the correlation between two or more variables. Usually, however, if we are measuring correlation, we already are assuming causation. The OLS technique renders two items: (1) A formula whose graphical representation (a “regression” or “trend” line) best “fits” the observed data; and (2) A number (R 2 ) whose value describes how “tightly” the data fits around the regression line.

The “Regression” or “Trend” Line Data is plotted in a “scatter” diagram. Horizontal line contains “x” values (independent variable) and vertical line contains “y” values (dependent variable). Regression or Trend line is expressed in the form y = mx + b. The terms “regression” line and “trend” line frequently are used interchangeably but, usually, a “trend” line pertains to data where the value of the dependent variable changes with time.

The R 2 Number Has a value anywhere from Zero to 1. An R 2 value of zero means that there is absolutely no correlation between the independent and dependent variables. An R 2 value of 1 means that there is a perfectly deterministic correlation between the independent and dependent variables. The R 2 number tells us how much changes in the dependent variable are “explained” by changes in the independent variable. Example: If R 2 equals 0.70, that means that 70% of the change in the dependent variable is “explained” by the change in the independent variable.

Example of a Trend Line Analysis

Part IV – Some Common Pitfalls of Regression / Trend Line Analysis

Pitfall #1: The Regression or Trend Line that is derived from the OLS method might be meaningful only for a limited range of numbers. Pitfall #2: The most valid Regression or Trend Line for a particular set of data might not necessarily be linear. Pitfall #3: Usually, a dependent variable is a function of several independent variables, not just one independent variable.

Questions?

Part V – MS Excel Exercise

Background You work in the Planning Department; your boss comes to you with historical development data showing growth in the square footage of non-residential space. An intern has compiled the data, and has calculated the square footage, by type of non-residential space, that has occurred during a twenty year time period. The intern has taken the twenty year increase and divided that number by twenty in order to derive and average annual increase in each type of square footage. Your boss has used this average annual increase to estimate the number of square feet, by non-residential type, that the County can expect over the course of the next ten years.

Background (Cont.) You are somewhat suspicious of the ten year projection for industrial space, since the County had a net loss of jobs in the manufacturing sector during the course of the twenty years. Assignment: (a) Take the historical data for the industrial square footage and use MS Excel to derive an OLS trend line that fits this data; (b) Graph the trend line, the trend line equation, and the R 2 value; and (c) Using the trend line equation, project the total new industrial square footage that the County can expect during the course of the next ten years.

Assignment (Cont.) Question: Is your estimate different from the estimate that your boss derived? If so, how large is the gap (both in absolute square footage and percentage terms)? How “tightly” does the data fit around the trend line that you have derived? Do you have much confidence in your trend line?

Conclusion