2 ANALISIS REGRESI Analisis regresi adalah lanjutan daripada analisis korelasi dimana sesuatu hubungan telah diperoleh. Analisis regresi dilaksanakan setelah suatu pola hubungan linear dijangkakan serta suatu pekali ditentukan bagi menunjukkan terdapat hubungan yang linear antara dua pembolehubah. Selanjutnya bolehlah kita menelah atau meramal sesuatu pembolehubah (p/u criterion) setelah pembolehubah yang kedua (p/u predictive) diketahui.

3 Prosedurnya Y’ = a + bx ANALISIS REGRESI MUDAH terdiri daripada:
Melakarkan gambarajah sebaran bagi taburan pasangan skor tersebut Menentukan persamaan bagi garis regresi tersebut Persamaan ini juga dipanggil model regresi Persamaan/model bagi garis ini ialah Y’ = a + bx Dan selanjutnya dengan mengguna persamaan tersebut, nilai y boleh ditentukan bagi sesuatu nilai x yang telah ditentukan dan juga disebaliknya.

Y’ = a + bx Y’ = Nilai anggaran bagi y b = kecerunan bagi garis tersebut a = pintasan pada paksi y

b = n [  x y ] - [  x  y ] [ n  x2 - ( x)2 ] n = bilangan pasangan skor  x y = jumlah skor x didarab dengan skor y  X = jumlah skor x  y = jumlah skor y

a = y – b x

7 Data: Tahap kepemimpinan pengetua dengan persepsi guru terhadap tahap kepemimpinan pengetua
X Y 12 8 2 3 1 4 6 5 9 15 22 11 14 13

X Y XY X2 Y2 12 8 2 3 1 4 6 5 9 15 22 11 14 13

X Y XY X2 Y2 12 8 96 144 64 2 3 6 4 9 1 16 36 5 45 25 81 48 24 15 22 330 225 484 11 14 154 121 196 13 78 169 77 84 821 805 994

Y’ = bx + a Y’ = Nilai anggran bagi y b = kecerunan bagi garis tersebut a= pintasan pada paksi y


12 r= 0.70. Ini menunjukkan bahawa 49% variasi dalam y adalah sumbangan daripada X Kecerunannya ialah 0.82 Min bagi x ialah 7.7 Min bagi y ialah 8.4 a = 2.1 (pintasan di paksi y) Model regresi ialah Y’ = .82x + 2.1 Jika x=7, maka Y’= 7.84 Jika x=10, maka Y’= 10.3 Jika x=14, maka Y’=13.58

13 Regression & Correlation
A correlation measures the “degree of association” between two variables (interval (50,100,150…) or ordinal (1,2,3...)) Associations can be positive (an increase in one variable is associated with an increase in the other) or negative (an increase in one variable is associated with a decrease in the other)

14 Example: Height vs. Weight
Strong positive correlation between height and weight Can see how the relationship works, but cannot predict one from the other If 120cm tall, then how heavy?

15 Example: Symptom Index vs Drug A
Strong negative correlation Can see how relationship works, but cannot make predictions What Symptom Index might we predict for a standard dose of 150mg?

16 Correlation examples

17 Regression Regression analysis procedures have as their primary purpose the development of an equation that can be used for predicting values on some DV for all members of a population. A secondary purpose is to use regression analysis as a means of explaining causal relationships among variables.

18 The most basic application of regression analysis is the bivariate situation, to which is referred as simple linear regression, or just simple regression. Simple regression involves a single IV and a single DV. Goal: to obtain a linear equation so that we can predict the value of the DV if we have the value of the IV. Simple regression capitalizes on the correlation between the DV and IV in order to make specific predictions about the DV.

19 The correlation tells us how much information about the DV is contained in the IV.
If the correlation is perfect (i.e r = ±1.00), the IV contains everything we need to know about the DV, and we will be able to perfectly predict one from the other. Regression analysis is the means by which we determine the best-fitting line, called the regression line. Regression line is the straight line that lies closest to all points in a given scatterplot This line sometimes pass through the centroid of the scatterplot.

20 Example: Symptom Index vs Drug A
“Best fit line” Allows us to describe relationship between variables more accurately. We can now predict specific values of one variable from knowledge of the other All points are close to the line

21 Example: Symptom Index vs Drug B
We can still predict specific values of one variable from knowledge of the other Will predictions be as accurate? Why not? “Residuals”

22 3 important facts about the regression line must be known:
The extent to which points are scattered around the line The slope of the regression line The point at which the line crosses the Y-axis The extent to which the points are scattered around the line is typically indicated by the degree of relationship between the IV (X) and DV (Y). This relationship is measured by a correlation coefficient – the stronger the relationship, the higher the degree of predictability between X and Y.

23 The degree of slope is determined by the amount of change in Y that accompanies a unit change in X.
It is the slope that largely determines the predicted values of Y from known values for X. It is important to determine exactly where the regression line crosses the Y-axis (this value is known as the Y-intercept).

24 The basic equation for simple regression is: Y = a + bX
The regression line is essentially an equation that express Y as a function of X. The basic equation for simple regression is: Y = a + bX where Y is the predicted value for the DV, X is the known raw score value on the IV, b is the slope of the regression line a is the Y-intercept

25 Simple Linear Regression
♠ Purpose To determine relationship between two metric variables To predict value of the dependent variable (Y) based on value of independent variable (X) ♠ Requirement : DV Interval / Ratio IV Internal / Ratio ♠ Requirement : The independent and dependent variables are normally distributed in the population The cases represents a random sample from the population

26 Simple Regression How best to summarise the data?
Adding a best-fit line allows us to describe data simply

27 General Linear Model (GLM) How best to summarise the data?
Establish equation for the best-fit line: Y = a + bX Where: a = y intercept (constant) b = slope of best-fit line Y = dependent variable X = independent variable

28 Simple Regression R2 - “Goodness of fit”
For simple regression, R2 is the square of the correlation coefficient Reflects variance accounted for in data by the best-fit line Takes values between 0 (0%) and 1 (100%) Frequently expressed as percentage, rather than decimal High values show good fit, low values show poor fit

29 Simple Regression Low values of R2
(0% - randomly scattered points, no apparent relationship between X and Y) Implies that a best-fit line will be a very poor description of data

30 Simple Regression High values of R2
(100% - points lie directly on the line - perfect relationship between X and Y) Implies that a best-fit line will be a very good description of data

31 High variance explained
Simple Regression R2 - “Goodness of fit” Good fit  R2 high High variance explained Moderate fit  R2 lower Less variance explained

32 Problem: to draw a straight line through the points that best explains the variance
Line can then be used to predict Y from X

33 Example: Symptom Index vs Drug A
“Best fit line” allows us to describe relationship between variables more accurately. We can now predict specific values of one variable from knowledge of the other All points are close to the line

34 Regression Establish equation for the best-fit line:
Y = a + bX Best-fit line same as regression line b is the regression coefficient for x x is the predictor or regressor variable for y

35 Ŷ = a + bX Step –Descriptive Analysis
Derive Regression / Prediction equation ● Calculate a and b a = y – b X Ŷ = a + bX

36 Data set: Scores ID Assign Test 1 8.5 88 2 6 66 3 9 94 4 10 98 5 8 87
Example on regression analysis Data were collected from a randomly selected sample to determine relationship between average assignment scores and test scores in statistics. Distribution for the data is presented in the table below. 1. Calculate coefficient of determination and the correlation coefficient 2. Determine the prediction equation. 3. Test hypothesis for the slope at 0.05 level of significance

37 Prediction equation: Ŷ = 18.05 + 8.257X
ID X Y 2 6 66 3 9 94 5 8 87 6 7 72 7 5 45 8 6 63 Derive Regression / Prediction equation 215.5 26.1 = = 8.257 a= y – b x = 77.5 – (7.2) = Summary stat: n ΣΧ ΣΥ ΣΧ² ΣΥ² ,441 ΣΧΥ ,795.5 Prediction equation: Ŷ = X

38 Interpretation of regression equation
Ŷ = x For every 1 unit change in X, Y will change by units 8.257 ΔY 18.05 ΔX

39 Example on regression analysis:
MARITAL SATISFACTION Parents : X Children : Y 1 3 2 7 6 9 8 4 5 Mean of X Mean of Y No of pairs  X  Y  X squared Standard deviation  XY

40 Prediction equation: Ŷ = 8.44 + 65x
Derive Regression / Prediction equation a= y – b x = (5.29) = 8.438 Prediction equation: Ŷ = x

41 Interpretation of regression equation
Ŷ = x For every 1 unit change in X, Y will change by .65 units 0.65 ΔY 8.43 ΔX

Ini juga merupakan analisis hubungan tetapi lebih dikenali sebagai analisis perkaitan (association) Analisis ini digunakan pakai bagi menentukan perkaitan antara pasangan pembolehubah yang diukur pada skala nominal atau ordinal ataupun jika salah satunya dipadankan dengan data sela dan nisbah. Dengan itu pembolehubah seperti Bangsa, Jantina, Suka/tidak suka makanan, Tinggi pencapaian/rendah pencapaian, Kebimbangan tinggi/ kebimbangan sederhana/ kebimbangan rendah Data frekuensi dicerap dengan membilang kejadian (occurance setiap perkara). Sesuai untuk kajian tinjauan Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah.

KATAKANLAH, penyelidik mengumpul maklumat tentang bangsa bagi responden dan juga kategori amalan pemakanan setiap responden, ATAU penyelidik tinjau pelajar dibeberapa buah sekolah dari segi jantina dan minta/tidak minat kepada aliran sains ATAU penyelidik tinjau bapa-bapa dan mengumpul maklumat tahap pendidikan (tinggi/ sederhana/ rendah) dan dikaitkan dengan kategori gaji Bagi ketiga-tiga contoh tersebut analisis yang sesuai dijalankan adalah analisis tak parametrik (analisis kuasa-dua khi) dan seterusnya dibina jadual kontingensi atau jadual“crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” memberi kita makluman bahawa ada/tiada perkaitan antara kedua-dua pemboleh ubah.

Terdapat dua cara/kategori – CHI-SQUARE TEST OF GOODNESS OF FIT dan TEST OF INDEPENDENCE/DEPENDENCE TEST GOODNESS OF FIT – menjawab persoalan “adakah terdapat perbezaan kadar bagi sesuatu perkara/kejadian/persetujuan” TEST OF INDEPENDENCE/ DEPENDENCE – menjawab persoalan “adakah terdapat perkaitan/kebersandaran/ hubungan antara dua perkara

Dapatan bagi analisis ini lazimnya dalam bentuk jadual frekuensi yang dipanggil jadual kontingensi atau jadual “crosstabulation”. Daripada frekuensi yang dicerap (observed frequency) analisis “chi-square” ini memberi kita makluman bahawa ada/tiada perkaitan yang signifikan antara kedua-dua pembolehubah yang dikaji Ataupun ada/tiada perbezaan frekuensi yang signifikan antara kategori-kategori yang dikaji.

46 Daripada jadual tersebut kita boleh telitikan atau kajikan sama ada terdapat hubungan atau perkaitan antara kedua-dua pemboleh ubah tersebut. Selanjutnya analisis pengujian hipotesis perlu dijalankan ia itu untuk menguji terdapatnya perkaitan antara kedua-dua pemboleh ubah tersebut dengan signifikan. Pengujian hipotesis ini adalah ujian kuasa dua khi. Sekiranya, terdapat perkaitan yang signifikan maka langkah seterusnya adalah dengan menentukan darjah atau magnitud hubungan tersebut.

47 Bagi analisis ini, data adalah dalam bentuk kekerapan dan sudah semestinya taburan skor adalah tidak normal. Dengan itu taburan ini dipanggil taburan bebas (distribution-free). Ujian ini juga dipanggil ujian tak parametrik oleh kerana ia tidak bertabur secara normal. Sebagai “rule-of-thumb” penggunaan ujian parametrik digalakkan oleh kerana oleh kerana “power” atau kekuatannya, walaubagaimana pun jika data adalah dalam bentuk nominal serta juga terdapat taburan data yang tidak normal maka ujian tak parametrik diterima pakai. Ujian-ujian parametrik – sign test, Mann-Whitney U test, Wilcoxon matched-pairs signed ranks, Kruskal-Wallis, Chi-square.


