Download presentation
Presentation is loading. Please wait.
1
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50 th percentile) of distribution –Measures of Variation (used to measure the range of the distribution relative to the measures of central tendency) Range – Distance between lowest and highest data point Mean Deviation – Average distance between Mean and data points Variance – Sum of Squared distance from mean (2 nd moment) Standard Deviation – Square root of variance
2
Analysis of Individual Variables
3
Analysis of Relationship among Variables Correlation Regression –Two Variable Models –Multiple Variable Models –Discrete Dependent Variable Models
4
Scatter Plot of Money Supply Growth and Inflation
5
Correlation A scatter plot is a graph that shows the relationship between the observations for two data series in two dimensions Correlation analysis expresses this numerically –In contrast to a scatter plot, which graphically depicts the relationship between two data series, correlation analysis expresses this same relationship using a single number –The correlation coefficient is a measure of how closely related two data series are –The correlation coefficient measures the linear association between two variables
6
Correlation Determine association between 2 variables Measured on a scale from +1 to -1 –values close to +1.0 indicates strong positive relationship –values close to -1.0 indicates strong negative relationship –values close to 0 indicates little or no relationship +10
7
Variables with Perfect Positive Correlation
8
Variables with Perfect Negative Correlation
9
Variables with a Correlation of 0
10
Variables with a Non-Linear Association
11
Calculating correlations The sample correlation coefficient ‘r’ is,
12
Calculating correlations E.g.: Is it true that higher education leads to higher compensation? – To answer this question, we need to look at the data and calculate correlation
13
Calculating correlations The sample correlation coefficient ‘r’ is,
14
Calculating correlations
15
Calculating correlations (EXCEL)
16
Correlation Matrix
17
Correlations Among Stock Return Series
18
Regression Most times its not enough to just say whether 2 variables are correlated –we would like to define a relationship between the two variables –E.g. when the economy grows 1%, how much will the S&P500 increase To do this, we use a technique of Regression
19
Regression How the term Regression came to be applied to the subject of statistical models. 19 th century scientist, Sir Francis Galton, studying human subjects found in all things "regression toward mediocrity” –E.g. If your parents are very smart, you are likely to be significantly less smart - so its really not your fault!!
20
Regression In modern times, when we talk of Regression analysis, we make an implicit assumption of a ‘mean’ relationship between variables and we try to determine that relationship. Regression analysis is concerned with – – the study of the dependence of one variable (the dependent variable) –on one or more other variables (the explanatory variables) –with a view to estimating and/or predicting the mean or average value of the former –in terms of the fixed values of the latter.
21
Two Variable Regression Model Regression analysis is concerned with relationship of 2 variables, say ‘y’ and ‘x’ and can be written as – –All this means is that the value of ‘y’ is a function of the value of ‘x’ –Another way of saying it is that ‘y’ doesn’t independently get its value, but somehow depends on ‘x’ to get its value –Thus y can so how be derived from ‘x’ –Thus ‘y’ is a dependent variable and ‘x’ is an independent variable Regression is thus, the study of a relationship between the dependent and independent variables
22
Regression
25
Two Variable Regression Model Regression analysis is concerned with – – the study of a relationship between the dependent and independent variables –In reality, we can are estimating a relationship, so we can calculate the value of a random variable
26
Two Variable Regression Model Real data from which we estimate relationship is never very good because we deal with random variables –What we end up having is some thing like this –What we try to do in regression is estimates the “Line of Best Fit”, so that we can come up with this equation –This is also the equation of line, so this form of regression is called a ‘Linear regression”
27
Two Variable Regression Model
28
Regression Model – Equation of a Line Terminology – ‘y’ –Dependent Variable, or –Left-Hand Side Variable, or –Explained Variable, or
29
Terminology – ‘x’ –Independent Variable, or –Right-Hand Side Variable, or –Explanatory Variable, or –Regressor, Covariate, Control Variable Terminology – ‘ ’ –Error –Disturbance Two Variable Regression Model
30
Terminology – –‘ ’ - Intercept –‘ ’ – Slope –‘ ’ - error
31
Assumptions of the Linear Regression Model The relationship between the dependent variable, Y, and the independent variable, X is linear The independent variable, X, is not random About the error – –The expected value (remember average) of the error term is 0 –The error term is normally distributed –The variance of the error term is the same for all observations –The error term is uncorrelated across observations
32
Regression Relationship estimation The model is estimated by the “Least Squares Estimation” method
33
Two Variable Regression Model
34
Inferences from Regression can be made about –Model - how well does the specified model perform, i.e., are the specified independent variables, taken together a good predictor of the dependent variable (R 2 ) –Independent Variables – The contribution of each independent variable in predicting the dependent variable (hypothesis test) Inferences from Regression
35
Model power
36
Inference about Model Coeff. of Determination (R 2) So, higher the R 2 – better model (Yes? That would be too easy!) x 1 -x m ) (x 1, y 1 ) ymym ypyp y1y1 xmxm x1x1 SST SSE SSR
37
Inference about Model If the model is correctly specified, R 2 is an ideal measure Addition of a variable to a regression will increase the R 2 (by construction) This fact can be exploited to get regressions with R 2 ~ 100% by addition of variables, but this doesn’t mean that the model is any good Adj-R 2 should be reported
38
Inference about Parameters Coefficients are estimated with a confidence interval To know if a specific independent variable (x i ) is influential in predicting the dependent variable (y), we test whether the corresponding coefficient is statistically different from 0 (i.e. i = 0). We do so by calculating the t-statistic for the coefficient If the t-stat is sufficient large, it indicates that b i is significantly different from 0 indicating that i * x i plays a role in determining y
39
Inference about parameters We can test to see if the slope coefficient is significant by using a t-test.
40
In Excel
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.