Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015
Regression analysis explores the relationship between a quantitative response variable and one or more explanatory variables. 1 exp.var/ind. Var :SLR >1 exp.var/ ind.var :MLR
3 major objectives: i.Description ii.Control iii.Prediction To describe the effect of income on expenditure To increase the export of rubber by controlling other factors such as price To predict the price of houses based on lot size & location
1) A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight. a)X is the carbohydrate intake (independent variable). b)Y is the weight (dependent variable). 2) An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume. a)X is cost of packaging b)Y is sales volume 4
(X 1, Y 1 ) (X 8, Y 8 )
A graph of the ordered pairs (x,y) of num. consisting of the ind. Var Xand the dep. Var. Y
Can we use a known value of temperature (X) to help predict the number of pairs of gloves (Y)
USING THIS LINE FOR PREDICTION 1.Good fitting line?? 2.Is a line reasonable summary of the r/ship between variables??
Negative relationship: since as the num. of absences increases, the final grade decrease Positive relationship: since as the num. of cars rented increases, revenue tends to increase
No relationship
Linear regression : we assume to have linear r/ship between X and Y E(Y|X)= β₀+ β₁Xi Expectation of Y for a given value of X intercept slope
The observed values of Y vary about the line Parameters that we do not know Estimated!!!
We will use sample data to obtain the Estimated regression line:
No Error term… WHY?? because my predicted value of Y will fall precisely on this line
How we are going to estimate the 2 parameters values?? We usually use the method of least squares to estimate
Recall the assumed relationship between Y and X: We use data to find the estimated regression line: How we are going to choose them wisely… so that we can have a good regression line.
+ve Residual/ error -ve Residual/ error
What is the best line? Minimize the
are chosen to minimize the sum of the squared residual: This is called the method of least squares.
Assumptions About the Error Term 1. The error is a random variable with mean of zero. 2.The variance of, denoted by, is the same for all values of the independent variable. 3. The values of are independent. 4. The error is a normally distributed random variable.
Solution:
USING THIS LINE FOR PREDICTION
When we increase 1 unit of X, so it will decrease unit of Y
The coefficient of determination is a measure of the variation of the dependent variable (Y) that is explained by the regression line and the independent variable (X). The symbol for the coefficient of determination is r 2 or R 2 26
If r =0.90, then r 2 =0.81. It means that 81% of the variation in the dependent variable (Y) is accounted for by the variations in the independent variable (X). The rest of the variation, 0.19 or 19%, is unexplained and called the coefficient of nondetermination. Formula for the coefficient of nondetermination is ( r 2 )
Relationship Among SST, SSR, SSE where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error SST = SSR + SSE n The coefficient of determination is: where: SSR = sum of squares due to regression SST = total sum of squares 28
It means that 85.5% of the variation in the dependent variable (Y: number of pairs of gloves) is explained by the variations in the independent variable (X:temperature).
Correlation measures the strength of a linear relationship between the two variables. Also known as Pearson’s product moment coefficient of correlation. The symbol for the sample coefficient of correlation is r, population coefficient of correlation is. Formula :
Properties of r : Values of r close to 1 implies there is a strong positive linear relationship between x and y. Values of r close to -1 implies there is a strong negative linear relationship between x and y. Values of r close to 0 implies little or no linear relationship between x and y
Number of pairs of gloves Refer Example 4.2: Number of pairs of gloves Solution: Thus, there is a strong negative linear relationship between score obtain before (x) and after (y). Or Next, refer to equation Negative relationship, since sign for b 1 is negative (d)
To determine whether X provides information in predicting Y, we proceed with testing the hypothesis. Two test are commonly used: F-test T-test
1.Hypotheses: 2.Significance level, 3.Rejection Region P-value approach Critical -value approach
4. Test Statistic 5. Decision Rule 6. Conclusion There is a significant relationship between variable X and Y.
Significance level, Rejection region: (e)
We conclude that the temperature is linearly related to the number of pairs of gloves produced Test Statistic Decision Rule Conclusion
We may also use the analysis of variance approach to test significance of regression. The ANOVA approach involves the partitioning of total variability in the response variable Y. SST (total sum of squares). If SST=0, all observations are the same. The greater is SST, the greater is the variation among the Y observations.
SSE (error sum of squares). If SSE=0, all observations falls on the fitted regression line. The larger the SSE, the greater is the variation of the Y observations around the regression line.
SSR (Regression sum of squares) SSR: measure of the variability of the Y’s associated with regression line. The larger is SSR in relation to SST, the greater is the effect of the regression line relation in accounting for the total variation in the Y observations.
1.Hypotheses: 2.Significance level, 3.Rejection Region Critical -value approach
4. Test Statistic 5. Decision Rule 6. Conclusion We can conclude that there is a significant relationship between variable X and Y. Alternatively, we conclude that the regression model is significant To calculate MSR and MSE, first compute the regression sum of squares (SSR) and the error sum of squares (SSE).
Chapter 4