Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Political Science Levin and Fox Chapter 11:

Similar presentations


Presentation on theme: "Statistics for Political Science Levin and Fox Chapter 11:"— Presentation transcript:

1 Statistics for Political Science Levin and Fox Chapter 11:
Regression Analysis Statistics for Political Science Levin and Fox Chapter 11:

2 Regression Analysis Regression Analysis:
Regression analysis makes the importance of the variance more clear. Goal of Research: Explain Variation Example: Judge A versus Judge B Why do some defendants get longer sentences than others?

3 Regression Analysis Judge Example: Sentences
What if a specific judge handed down the following sentence (in months) : 12, 13, 15, 19, 26, 27, 29, 31, 40, 48 How do we explain the variation? What factors contributed to some defendants getting 48 months while others got only 12 months? Mean = 26 Months

4 Regression Analysis Judge Example: Sentences
The mean sentence tells us something about the judge’s sentencing pattern, but it does not help explain the wide variation in sentences. Variance: S2 Is measured by calculating “the mean of the squared deviation.” S2= Σ(X - χ )2 N S2 = 125 months

5 Regression Analysis Judge Example: Sentences: Prior Convictions?
How much of this variance is the consequence of a defendant’s prior convictions? Regression: It enables it us to quantify the relative importance of any proposed factor or variable, in this case prior convictions. Cause (IV): Prior Convictions Effect (DV): Sentence Length. Prior Convictions (IV: Cause) Sentence Length (DV: effect).

6 Regression Analysis Regression Model: Y = a + bX + e
Y = DV: Sentence Length (response variable). X = IV: Prior Convictions (predictor variable). a = Y-intercept: base-line: No Priors (What Y (DV: sentence length) is when X (IV: priors)= zero). b = Slope (regression coefficient) for X. (Amount that Y (DV: sentence length) changes for each change in one unit of X (IV: priors)). e = error term (what is unpredictable).

7 Y-Intercept (baseline) (Regression coefficient)
Regression Analysis Regression Model: How much is Sentence (DV) effected by the number of a defendants prior convictions (IV: Cause)? DV: Effect Y-Intercept (baseline) Slope (Regression coefficient) IV (Cause) Error Term Y = a + bX + e Sentence Length. No Priors. (Y when X=0) Amount Y changes for change in X Number of Priors Unpredictable

8 Regression Analysis Regression Model: Y = a + bX + e: Calculating each variable: Priors (IV) (X) Sentence Length (DV) (y) 3 1 6 5 4 10 8 N = 10, X = 40, Mean X =4 12 13 15 19 26 27 29 31 40 48 y = ?, Mean Y =26 (months)

9 Research Questions: Regression Model: Y = a + bX + e
Plug into regression formula: (adding mean for X and Mean for Y) = 4 (mean of priors) = 26 (mean of sentences (mean of Y)) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] =

10 Research Questions: Regression Model: Y = a + bX + e
Plug into regression formula: (adding mean for X and Mean for Y) = 4 (mean of priors) = 26 (mean of sentences (mean of Y)) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] = Regression Formula

11 Research Questions: Regression Model: Y = a + bX + e
Plug into regression formula: (adding mean for X and Mean for Y) = 4 (mean of priors) = 26 (mean of sentences (mean of Y)) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] = Adding mean for X (priors) and Y (sentence)

12 Research Questions: Regression Model: Y = a + bX + e
Calculate b [regression coefficient]: = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = Σ (X – χ)(Y – y) = 300 = 3 Σ(X – χ) a [y-intercept] =

13 Regression Analysis Calculating: b [regression coefficient]: Y = a + bX + e = 4 (mean of priors) = 26 (mean of sentences (mean of Y)) 300 = 3 100

14 Regression Model Regression Model: Y = a + bX + e
Calculating a [y-intercept] = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = Σ (X – χ)(Y – y) = 300 = 3 Σ(X – χ) a [y-intercept] = – (3)(4) = 14 Y = a + bX + e Y = X (Y = DV: Sentence, X = Prior Convictions)

15 Regression Analysis: Alternative Method
Regression Model: Y = a + bX + e Alternative way to Calculate b [regression coefficient] = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] = SP SSx or: b =

16 Regression Analysis Calculating: b [regression coefficient] Y = a + bX + e = 4 (mean of priors) = 26 (mean of sentences (mean of Y)) 300 = 3 (this approach is implied in long-hand calculations.)

17 Regression Analysis Calculating Regression Coefficient: (Sum of Products and Sum of Squares for x) Hh

18 Regression Analysis Calculating: a [Y-intercept] with SP and SSx data:
4 Y = a + bX

19 Regression Analysis Regression Line:
It is a line that “falls closest to all the points in a scatter plot.” It crosses the Y axis at the Y-intercept and traces the slope (b) for the independent variable (X: Priors). Y = a + bX + e Y = X

20 Regression Analysis Predicted and Actual Values (262)
A regression line represents a “predicted rather than an actual value.”

21 Regression Analysis Defining Error: Residual
X (IV) values will give you a predicted value for Y (DV) which may in fact be different from the actual value of Y. Ý = Predicted Y Y = a + bX Y = Observed Y Residual is the Difference Between Ý and Y. e = Y – Ý

22 Regression Analysis Plotting a Regression Line:
To plot a regression line you need to locate and then connect at least two points. Easiest Line: Y-intercept and χ and y Mean The easiest way to do this is to draw a line from the y-intercept (a) (X = 0, Y = a) and then through the χ (IV: priors) and y mean (average sentence (DV)). a = (Y-intercept: base-line: No Priors) = 14 χ = (mean of IV: prior convictions) = 4 y = (mean of DV: sentences (in months)) = 26

23 Regression Analysis Plotting a Regression Line:
To plot a regression line you need to locate and then connect at least two points. Easiest Line: Y-intercept and χ and y Mean The easiest way to do this is to draw a line from the y-intercept (a) (X = 0, Y = a) and then through the χ (IV: priors) and y mean (average sentence (DV)). a = (Y-intercept: base-line: No Priors) = 14 χ = (mean of IV: prior convictions) = 4 y = (mean of DV: sentences (in months)) = 26 Means for X and Y

24 Regression Analysis

25 Regression Analysis Mean for Y (Sentence) = 26 a [y intercept]= 14
Mean for X (Priors) = 4

26 Regression Analysis Plotting the Regression Line: Figure 11.2
If the Y-intercept and X and Y means are two close to plot a line, you can insert a larger value for X and then plug it into the equation. Example: 10 Priors Y (Ý) = DV: Sentence Length: ? X = IV: Prior Convictions: 10 a = Y-intercept: base-line: No Priors: 14 b = Slope (regression coefficient) for X = 3 Y = a + bX Ý = X Ý = (10) = 44

27 Regression Analysis Plotting the Regression Line: Figure 11.2
If the Y-intercept and X and Y means are two close to plot a line, you can insert a larger value for X and then plug it into the equation. Example: 13 Priors Y (Ý) = DV: Sentence Length: ? X = IV: Prior Convictions: 13 a = Y-intercept: base-line: No Priors: 14 b = Slope (regression coefficient) for X = 3 Y = a + bX Ý = X Ý = (13) = 53

28 Regression Analysis The chart itself can predict how changes in X (priors) will effect Y (sentence): 13 Priors = 53 months.

29 Regression Analysis Requirements of Regression:
It is assumed that both variables are measured at the interval level. Regression assumes a straight-line relationship. Extremely deviant cases in scatter plot are removed from the analysis. Sample members must be chosen randomly in order to employ tests of significance. To test the significance of the regression line, one must also assume normality for both variables or else have a large sample.

30 Regression Analysis: Review
Interpreting the Regression Line: Regression analysis allows make predictions about one variable (IV: cause (X)) will effect another (DV: effect (Y)). Example: Priors Convictions and Sentence Length The Y-intercept tells us what Y (DV) is when the X (IV) is zero. If you have no priors (X (IV)), than the average sentence is 14 months. The regression coefficient b tells us how much Y (DV) (sentence length) will increase or decrease of unit change in X (IV) (prior). As such, we can also predict what the sentence length of will be for a defendant based on their number of prior convictions.

31 Regression Analysis Interpreting the Regression Line:
Regression analysis allows make predictions about one variable (IV: cause (X)) will effect another (DV: effect (Y)). Example: 5 Priors Y = a + bX Ý = X Ý = (5) = 29

32 Extra: Regression Analysis
Scatterplot (or Scatter Diagram): The scatterplot provides a visual means of “displaying relationship between two interval-ratio variables.” Example: GNP and % Willingness to Pay for Environmental Protection Hypothesis: The higher a country’s GNP (IV) the more willing it will be to pay higher prices for Environmental Protection (DV). IV: GNP DV: Willingness to Pay for EP Direction: positive

33 Extra: Regression Analysis
GNP and Willingness to Pay for Environ Protect: Figure 8.1

34 Extra: Regression Analysis
GNP and % Willingness to Pay for Environmental Protection: It appears as though there is a positive relationship between GNP and % willingness to pay for environmental protection.

35 Extra: Regression Analysis
GNP and Environment is Scared: Negative Relationship: Fig 8.2

36 Extra: Regression Analysis
GNP and Environment is Scared: Negative Relationship: Fig 8.7

37 Extra: Regression Analysis
Linear Relations and Prediction Rules: Though the relationship between GNP and willingness to pay for EP, there is a clear trend. Linear Relationship: It “allows us to approximate the observations displayed in a scatter diagram with a straight line.” Prefect Linear Relationship: Deterministic Relationship It “provides a predicted value of Y (vertical axis) for any value of X (horizontal axis).

38 Extra: Regression Analysis
Example of perfect Linear Relationship Take the examples of Teachers’ salaries and seniority where seniority determines salaries. Y = a + bX + e Y = DV: Salary X = IV: Seniority a = Y-intercept: base-line: starting salary (What Y is when X = zero). b = Slope (regression coefficient) for X. (Amount that Y changes for each change in one unit of X).

39 Extra: Regression Analysis
Example of perfect Linear Relationship Using this formula we can determine what an individual teacher’s salary (DV: Effect) will be starting with a baseline (a) of $12,000 and an extra $2000 for each year on the job (X: IV: Cause). Y = a + bX becomes: Y = 12, X (because this is a deterministic argument -seniority determines salary- there is no error).

40 Extra: Regression Analysis
Seniority and Salary: Figure Y = 12, (7) Y = $26,000

41 STOP STOP: Material Beyond this Point is NOT required for Exam 5.

42 Regression Analysis Constructing the Straight-Line Graphs
We can demonstrate a linear relationship by drawing a straight line on a scatterplot. How do we know where to draw the line on a scatterplot?

43 Regression Analysis Example: GNP and Environmental Protection (Figure 8.4)

44 Regression Analysis Best-Fitting Line:
The best-fitting line is the line with the least error. (270) Defining Error: Residual X (IV) values will give you a predicted value for Y (DV) which may in fact be different from the actual value of Y.

45 Regression Analysis Defining Error: Residual Ý = Predicted Y
Y = a + bX Y = Observed Y Residual is the Difference Between Ý and Y. e = Y - Ý

46 Regression Analysis Canada: GNP and Environmental Protection Ý = 40
Ý = 40 Y = a + bX Y = 41.8 Residual is the Difference Between Ý and Y. e = 41.8 – 40 e = 1.8 46

47 Regression Analysis Canada: GNP and Environmental Protection (Figure 8.3)

48 Regression Analysis Error Residual
How do draw a line that minimizes e for all individual observation? Residual Sum of Squares ∑e2 = ∑(Y – Ý)2 Least-Squares Line “The best-fitting line is that line where the sum of the squared residuals, ∑e2, is at a minimum.” 48


Download ppt "Statistics for Political Science Levin and Fox Chapter 11:"

Similar presentations


Ads by Google