Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson

Similar presentations


Presentation on theme: "Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson"— Presentation transcript:

1 Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson a.jackson@tcd.ie

2 Scatter plot data How are two measures related? Are they correlated? Does one cause an effect in the other? What is the relationship?

3 Develop a hypothesis What is the hypothesis about these data? What is the null hypothesis?

4 Covariance and Correlation Both the x and y data vary in some way Question is do they co-vary? – Are large x values associated with large y values? (positive covariance) – Or large x with small y (negative covariance) Calculate a statistic called the “correlation coefficient” (r) which takes values -1 >= r <= +1 Test r against a statistical distribution

5 Lets ask a different question Instead of... Is there a relationship between x and y? I want to know... What is the relationship between x and y? – Fitting a mathematical line to the data will tell us what the relationship likely is

6 The equation of a line Mathematicians use – Y = mX + c Statisticians use – y = b 1 x + b 0 – b 1 is the slope of the line – b 0 is the intercept (the value of y when x=0) To calculate the coefficients use – b 1 = (y 2 -y 1 )/(x 2 -x 1 ) – b 0 = y-b 1 x

7 To calculate the coefficients b 1 = (y 2 -y 1 )/(x 2 -x 1 ) b 0 = y-b 1 x NB b 0 can often be estimated visually from the graph

8 Different slopes Y = b 1 X + b 0 b 1 > 0 b 1 = 0 b 1 < 0

9 Different intercepts Y = b 1 X + b 0 Parallel lines

10 Sample data

11 Return to Interaction Strengths The predicted y value The observed y value

12 Residuals Informative as it tells us which data are larger than predicted, and which are lower Should ideally be normally distributed around the line – Test this with visual plots like histograms or q-q plots Should be evenly spread around the line with no obvious trend

13 Regression model assumptions Inherently assume a straight line relationship The residuals, or errors are assumed to be normally distributed – Need to test this – And make sure they are evenly spread above and below the line along its length

14 Computer Session


Download ppt "Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson"

Similar presentations


Ads by Google