Download presentation
Presentation is loading. Please wait.
Published byGwendoline Montgomery Modified over 6 years ago
1
MATH1005 STATISTICS Tutorial 3: Bivariate Data
2
In statistics we usually want to statistically analyse a population but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make inferences about the population parameters using the statistics of the sample (inferencing) with some level of accuracy (confidence level). A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a subset of the population of interest.
3
Regression The linear regression line characterises the relationship between two numerical variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other. It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome) . The linear regression line equation is based on the equation of a line in mathematics. β0+β1X
4
Y: Outcome variable Response Variable Dependent Variable The outcome to be measured/predicted.
X: Predictor Variable Explanatory Variable Independent Variable The variable one can control.
5
Correlation Correlation measures the association between two numerical variables with the strength of the relationship measured by the correlation coefficient r. A statistic that quantifies a linear relation between two variables Falls between and 1.00 The sign of the number indicates the direction of relationship. The value of the number indicates the strength of the relation. NOTE: Regression examines the relationship between one independent variable and one dependent variable. That is the slope of the linear regression. Correlation indicates the association between two metric variables with the strength and direction of the relationship measured by the correlation coefficient.
6
Strength & Direction of Correlation
DIRECTION: POSITIVE NEGATIVE STRENGTH: PERFECT STRONG MODERATE WEAK
7
R2 Coefficient of Determination
R-squared gives us the proportion of the total variability in the response variable (Y) that is “explained” by the least squares regression line based on the predictor variable (X). It is usually stated as a percentage. Interpretation: On average, R2% of the variation in the dependent variable can be explained by the independent variable through the regression model.
8
> Result <- Olympics100mW$Result > Olympics100mW[order(Result),] Year Athlete Medal Country Result Florence Griffith-Joyner GOLD USA Shelly-Ann Fraser-Pryce GOLD JAM # The reigning champion is Florence Griffith-Joyner from the USA with a time of s at the 1988 Seoul Olympics.
9
# The scatter plot on the right indicates a linear regression might be appropriate which is further suggested by the correlation coefficient r = and that 76% of the variability of Results is explained by Years.
10
# The boxplot shows 1 outlier (9502mins in 1945 by Rani).
# Take a logarithm transformation of Time to get rid of the outlier, and use this as your subsequent y variable.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.