Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision and Risk Analysis Regression analysis Kiriakos Vlahos Spring 99.

Similar presentations


Presentation on theme: "Decision and Risk Analysis Regression analysis Kiriakos Vlahos Spring 99."— Presentation transcript:

1 Decision and Risk Analysis Regression analysis Kiriakos Vlahos Spring 99

2 Session overview Why understanding relationships is important Visual tools for analysing relationships Correlation –Interpretation –Pitfalls Regression –Building models –Interpreting and evaluating models –Assessing model validity –Data transformations –Use of dummy variables

3 Why analysing relationships is important Development of theory in the social sciences and empirical testing Finance e.g. –How are stock prices affected by market movements? –What is the impact of mergers on stockholder value? Marketing e.g. –How effective are different types of advertising? –Do promotions simply shift sales without affecting overall volume? Economics e.g. –How do interest rates affect consumer behaviour? –How do exchange rates influence imports and exports?

4 Sales vrs advertising Advertising (£000) Sales (units)

5 Estimating betas The slope of this line is called the beta of the stock and is an estimate of its market risk.

6 Scatter plots What are they? A graphical tool for examining the relationship between variables What are they good for? For determining Whether variables are related the direction of the relationship the type of relationship the strength of the relationship

7 Correlation What is it? A measure of the strength of linear relationships between variables How to calculate? a) Calculate standard deviations s x, s y b) Calculate the correlation using the formula Possible values From -1 to 1

8 Interpreting the correlation

9 Correlation Pitfalls Correlation measures only linear relationships Existence of a relationship does not imply causality Even if there exists a causal relationship, the direction may not be obvious

10 Correlation and Causality Many nations see improving communications as vital to boost overall economy. A 1% increment in telephone density yields an increment of about 0.1% in per-capita GNP, according to a 1983 OECD-ITU study. AT&T advertisement in Fortune Dec 97

11 Ferric Processing What are the factors influencing production costs? Production costs Capacity Plant age Plant location Other plant features Predicting production cost is important for the negotiation of 5-year contracts with steel companies ? ? ? ?

12 Visual inspection a) Construct scatter plot b) Calculate correlation (excel function CORREL) The correlation between cost and capacity is -0.84 c) Candidate model Cost = a + b Capacity

13 Simple Linear Regression Simple regression estimates a linear equation which corresponds to straight line that passes through the data Regression model Cost = 25.2 - 4.4 Capacity Dependent variable Constant or intercept Coefficient or slope Independent or explanatory variable

14 Least squares Residuals Residuals are the vertical distances of the points from the regression line In least squares regression –The sum of squared residuals is minimised –The mean of residuals is zero –residuals are assumed to be randomly distributed around the mean according to the normal distribution

15 Excel output Read equation Observe adjusted R 2 Observe statistics sbsb s The standard error s is simply the st. deviation of the residuals (a measure of variability) R 2 is the most widely measure of goodness of fit. It can be interpreted as the proportion of the variance of the dependent variable explained by the model. Use the adjusted R 2,which accounts for the no. of observations.

16 Hypothesis testing Does a relationship between capacity and cost really exist? If we draw a different sample, would we still see the same relationship? Or in stats jargon Is the slope significantly different from zero? x y b=0 b=0 implies no relationship between x and y Hypothesis testing Test whether b=0

17 t-values and p-values 0 b p-value t-value * s b s b is the st. deviation of the slope estimate b t-value = b/s b p-value is the probability of getting an estimate of slope at least as large as b. Equivalent tests (5% significance level) |T-value| > 2 p-value < 0.05 Distribution of estimate of slope if b=0

18 Checking residuals Residuals should be random. Any systematic pattern indicates that our model is incomplete. Autocorrelated residuals Heteroscedasticity Problematic patterns

19 Ferric - Residuals Are residuals random? Can you see any pattern?

20 Combining theory and judgement The relationship appears to be non linear. We can fit non-linear relationships by introducing suitable transformations, e.g. x y y=ae bx x Ln(y) Ln(y)=ln(a)+bx What transformation is appropriate for the Ferric data? Use judgement e.g. Total Cost (TC) = Fixed Cost + Variable Cost TC = FC + Unit Cost (UC)* Quantity(Q) TC/Q = FC/Q + UC e.g. Average Cost = b/Q + a This suggests that average costs are inversely proportionate to capacity

21 Transforming the data

22 Model comparison High adusted R 2 All coefficients significant –t-values or p-values Low standard error No pattern in residuals Is model supported by theory? Does the model make sense? The transformed model is better: Cost = 11.75 + 7.93 * (1/Capacity)

23 Forecasting & confidence intervals If capacity is 2 what is the forecast for cost? –Cost = 11.75 + 7.93 (1/2) = 15.71 Approximate 95% confidence interval: 15.71  2 * s where s=0.98 is the standard error The greater the number of observations the better the approximation More accurate intervals can be calculated using statistical packages

24 Confidence intervals Statgraphics gives two sets of intervals. Outer bands are prediction intervals for an individual plant Inner bands are confidence intervals for the average cost from all plants. The can be viewed as the confidence intervals for the regression line.

25 Is plant age important? Multiple regression Cost = a + b(1/Capacity)+ cYear + e Correlation matrix Regression analysis Is this a good model?

26 Multicollinearity Multicollinearity means appears when explanatory variables are highly correlated. Effects: Including Year adds little information, hence fit does not improve much Parameter estimates become unreliable Remedial action: Remove one of the correlated variables Moral: Check for correlations between explanatory variables

27 Other inappropriate models Influential observations and outliers Clustering of data

28 Dummy variables Bond purchases and national income War years Regression equation: B = 1.29+.68Y+2.3W

29 Regression checklist Visually inspect the data (scatter plots) Calculate correlations Develop and fit sensible model(s) Assess and compare the model(s) –Significance of variables (t-values, p-values) –adjusted R 2 –standard error (s) –residual plots autocorrelation heteroscedasticity Normality Outliers, influencial observations –Does the model make sense? If you are satisfied use the model for –developing business insights –forecasting

30 Preparation for Regression workshop Work on Excel regression tutorial Revise Ferric case Read note on Regression Analysis Select your workshop partner In preparation for the exam work on regression exercises


Download ppt "Decision and Risk Analysis Regression analysis Kiriakos Vlahos Spring 99."

Similar presentations


Ads by Google