Download presentation
Presentation is loading. Please wait.
Published byDomenic Austin Holland Modified over 8 years ago
1
STATISTIK INFERENSI: PENGUJIAN HIPOTESIS BAGI ANALISIS REGRESI DAN KHI-KUASA DUA Rohani Ahmad Tarmizi - EDU5950 1
2
ANALISIS REGRESI Analisis regresi adalah lanjutan daripada analisis korelasi dimana sesuatu hubungan telah diperoleh. Analisis regresi dilaksanakan setelah suatu pola hubungan linear dijangkakan serta suatu pekali ditentukan bagi menunjukkan terdapat hubungan yang linear antara dua pembolehubah. Selanjutnya bolehlah kita menelah atau meramal sesuatu pembolehubah (p/u criterion) setelah pembolehubah yang kedua (p/u predictive) diketahui.
3
Prosedurnya ANALISIS REGRESI MUDAH terdiri daripada: Melakarkan gambarajah sebaran bagi taburan pasangan skor tersebut Menentukan persamaan bagi garis regresi tersebut Persamaan ini juga dipanggil model regresi Persamaan/model bagi garis ini ialah Y’ = a + bx Dan selanjutnya dengan mengguna persamaan tersebut, nilai y boleh ditentukan bagi sesuatu nilai x yang telah ditentukan dan juga disebaliknya.
4
PERSAMAAN BAGI GARIS REGRESI (LEAST-SQUARES REGRESSION LINE) Y’ = a + bx Y’ = Nilai anggaran bagi y b = kecerunan bagi garis tersebut a = pintasan pada paksi y
5
b = n b = n [ x y ] - [ x y ] [ n x 2 - ( x) 2 ] KECERUNAN GARIS REGRESI n = bilangan pasangan skor n = bilangan pasangan skor jumlah skor x didarab dengan skor y x y = jumlah skor x didarab dengan skor y jumlah skor x X = jumlah skor x jumlah skor y y = jumlah skor y
6
a = PINTASAN PADA PAKSI Y a = y – b x
7
Data: Tahap kepemimpinan pengetua dengan persepsi guru terhadap tahap kepemimpinan pengetua XY 128 23 14 66 59 86 46 1522 1114 136
8
PENGIRAAN ANALISIS REGRESI XYXYX2X2 Y2Y2 128 23 14 66 59 86 46 1522 1114 136
9
PENGIRAAN ANALISIS REGRESI XYXYX2X2 Y2Y2 1289614464 23649 144116 6636 59452581 86486436 46241636 1522330225484 1114154121196 1367816936 77 84 821 805 994
10
PERSAMAAN BAGI GARIS REGRESI (LEAST-SQUARES REGRESSION LINE) Y’ = bx + a Y’ = Nilai anggran bagi y b= kecerunan bagi garis tersebut a= pintasan pada paksi y
12
r= 0.70. Ini menunjukkan bahawa 49% variasi dalam y adalah sumbangan daripada X Kecerunannya ialah 0.82 Min bagi x ialah 7.7 Min bagi y ialah 8.4 a = 2.1 (pintasan di paksi y) Model regresi ialah Y’ =.82x + 2.1 Jika x=7, maka Y’= 7.84 Jika x=10, maka Y’= 10.3 Jika x=14, maka Y’=13.58
13
13 Regression & Correlation A correlation measures the “degree of association” between two variables (interval (50,100,150…) or ordinal (1,2,3...)) Associations can be positive (an increase in one variable is associated with an increase in the other) or negative (an increase in one variable is associated with a decrease in the other)
14
14 Example: Height vs. Weight Strong positive correlation between height and weight Can see how the relationship works, but cannot predict one from the other If 120cm tall, then how heavy?
15
Example: Symptom Index vs Drug A Strong negative correlation Can see how relationship works, but cannot make predictions What Symptom Index might we predict for a standard dose of 150mg?
16
16 Correlation examples
17
Regression analysis procedures have as their primary purpose the development of an equation that can be used for predicting values on some DV for all members of a population. A secondary purpose is to use regression analysis as a means of explaining causal relationships among variables. Regression
18
The most basic application of regression analysis is the bivariate situation, to which is referred as simple linear regression, or just simple regression. Simple regression involves a single IV and a single DV. Goal: to obtain a linear equation so that we can predict the value of the DV if we have the value of the IV. Simple regression capitalizes on the correlation between the DV and IV in order to make specific predictions about the DV.
19
The correlation tells us how much information about the DV is contained in the IV. If the correlation is perfect (i.e r = ±1.00), the IV contains everything we need to know about the DV, and we will be able to perfectly predict one from the other. Regression analysis is the means by which we determine the best-fitting line, called the regression line. Regression line is the straight line that lies closest to all points in a given scatterplot This line sometimes pass through the centroid of the scatterplot.
20
“Best fit line” Allows us to describe relationship between variables more accurately. We can now predict specific values of one variable from knowledge of the other All points are close to the line Example: Symptom Index vs Drug A
21
We can still predict specific values of one variable from knowledge of the other Will predictions be as accurate? Why not? “Residuals” Example: Symptom Index vs Drug B
22
3 important facts about the regression line must be known: The extent to which points are scattered around the line The slope of the regression line The point at which the line crosses the Y-axis The extent to which the points are scattered around the line is typically indicated by the degree of relationship between the IV (X) and DV (Y). This relationship is measured by a correlation coefficient – the stronger the relationship, the higher the degree of predictability between X and Y.
23
The degree of slope is determined by the amount of change in Y that accompanies a unit change in X. It is the slope that largely determines the predicted values of Y from known values for X. It is important to determine exactly where the regression line crosses the Y-axis (this value is known as the Y-intercept).
24
The regression line is essentially an equation that express Y as a function of X. The basic equation for simple regression is: Y = a + bX where Y is the predicted value for the DV, X is the known raw score value on the IV, b is the slope of the regression line a is the Y-intercept
25
Simple Linear Regression ♠ Purpose To determine relationship between two metric variables To predict value of the dependent variable (Y) based on value of independent variable (X) ♠ Requirement : DV Interval / Ratio IV Internal / Ratio ♠ Requirement : The independent and dependent variables are normally distributed in the population The cases represents a random sample from the population
26
Simple Regression How best to summarise the data? Adding a best-fit line allows us to describe data simply
27
Establish equation for the best-fit line: Y = a + bX General Linear Model (GLM) How best to summarise the data? Where: a = y intercept (constant) b = slope of best-fit line Y = dependent variable X = independent variable
28
For simple regression, R 2 is the square of the correlation coefficient Reflects variance accounted for in data by the best-fit line Takes values between 0 (0%) and 1 (100%) Frequently expressed as percentage, rather than decimal High values show good fit, low values show poor fit Simple Regression R 2 - “Goodness of fit”
29
R 2 = 0 (0% - randomly scattered points, no apparent relationship between X and Y) Implies that a best-fit line will be a very poor description of data Simple Regression Low values of R 2
30
R 2 = 1 (100% - points lie directly on the line - perfect relationship between X and Y) Implies that a best-fit line will be a very good description of data Simple Regression High values of R 2
31
Good fit R 2 high High variance explained Moderate fit R 2 lower Less variance explained Simple Regression R 2 - “Goodness of fit”
32
32 Problem: to draw a straight line through the points that best explains the variance Line can then be used to predict Y from X
33
33 “Best fit line” allows us to describe relationship between variables more accurately. We can now predict specific values of one variable from knowledge of the other All points are close to the line Example: Symptom Index vs Drug A
34
34 Establish equation for the best-fit line: Y = a + bX Best-fit line same as regression line b is the regression coefficient for x x is the predictor or regressor variable for y Regression
35
Step –Descriptive Analysis Derive Regression / Prediction equation ● Calculate a and b a = y – b X Ŷ = a + bX
36
Example on regression analysis Data were collected from a randomly selected sample to determine relationship between average assignment scores and test scores in statistics. Distribution for the data is presented in the table below. 1. Calculate coefficient of determination and the correlation coefficient 2. Determine the prediction equation. 3. Test hypothesis for the slope at 0.05 level of significance Data set: Scores IDAssign Test 1 8.5 88 2 6 66 3 9 94 4 10 98 5 8 87 6 7 72 7 5 45 8 6 63 9 7.5 85 10 5 77
37
1.Derive Regression / Prediction equation 2155 261 = 8.257 = a= y – b x = 77.5 – 8.257 (7.2) = 18.050 IDXY 18.588 2666 3994 41098 5887 6772 7545 8663 97.585 10577 Summary stat: n 10 ΣΧ 72 ΣΥ 775 ΣΧ² 544.5 ΣΥ² 62,441 ΣΧΥ 5,795.5 Prediction equation: Ŷ = 18.05 + 8.257X
38
Interpretation of regression equation Ŷ = 18.05 + 8.257x For every 1 unit change in X, Y will change by 8.257 units ΔXΔX ΔYΔY 8.257 18.05
39
MARITAL SATISFACTION Parents : X Children : Y 13 32 76 97 88 46 53 Mean of XMean of Y No of pairs X Y X squared Standard deviation XY Example on regression analysis:
40
1.Derive Regression / Prediction equation a= y – b x = 5.00 +.65 (5.29) = 8.438 Prediction equation: Ŷ = 8.44 +.65x
41
Interpretation of regression equation Ŷ = 8.44 +.65x For every 1 unit change in X, Y will change by.65 units ΔXΔX ΔYΔY 0.65 8.44
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.