UNIT 2 BIVARIATE DATA
BIVARIATE DATA – THIS TOPIC INVOLVES…. y-axis DEPENDENT VARIABLE x-axis INDEPENDENT VARIABLE
INDEPENDENT VS DEPENDENT VARIABLES The value of the DEPENDENT variable depends on the INDEPENDENT variable. eg. Identify the INDEPENDENT and DEPENDENT variable in each case: a. The time spent filling up a swimming pool with water compared to the size of the pool. b. The hours per week spent doing laundry compared with the number of children living in the house. c. The last time you went to the hairdresser and the length of your hair. d. A child’s height compared to their age.
INDEPENDENT VS DEPENDENT VARIABLES If you need more help with this, view the following video tutorial dependent-variables/ dependent-variables/
SCATTERPLOTS If you need more help with this, view the following video tutorial
SCATTERPLOTS If you need more help with this, view the following video tutorial
SCATTERPLOTS Study (hrs) Result (%)
Study Time(hrs) Result (%)
The following scatterplot is produced. Comparing this scatterplot to the graphs on our worksheet given in last lesson, we can say that there is a Strong Positive relationship between the number of hours studying and the score achieved on the test. Study Time(hrs) Result (%) Study Time Result
NOW TRY WORKBOOK SECTION ONE
PEARSON’S PRODUCT CORRELATION COEFFICIENT What is it? What is it? o Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. o It shows the linear relationship between two sets of data. o The correlation coefficient can be represented as r o Calculated using the equation: o Luckily, our calculator can solve this for us!
PEARSON’S PRODUCT CORRELATION COEFFICIENT What does it look like? What values can we expect? What does it look like? What values can we expect? o The value will be between -1 and 1. o 1 tells us there is a perfect positive relationship o -1 tells us there is a perfect negative relationship o Relationships for r values in between are given in the following table
PEARSON’S PRODUCT CORRELATION COEFFICIENT If you need more help with this, view the following video tutorial coefficient/ coefficient/
PEARSON’S PRODUCT CORRELATION COEFFICIENT Finding the Correlation Coefficient ‘r’ using the calculator Finding the Correlation Coefficient ‘r’ using the calculator eg. Consider the problem given earlier – Results on a Maths test were compared to the time spent studying the day before the test. The following data was obtained from a sample of 10 students. This data gave the scatterplot: Study (hrs) Result (%) Study Time Result Using this data, we can use the calculator to find r. First, using the Statistics function on your calculator, place the data into List 1 and List 2 Using what we know about ‘r’, view the scatterplot and predict – what approximate value will r have?
PEARSON’S PRODUCT CORRELATION COEFFICIENT Finding the Correlation Coefficient ‘r’ using the calculator Finding the Correlation Coefficient ‘r’ using the calculator Study (hrs) Result (%) Xlist: list with Independent Variables Ylist: list with Dependent Variables
PEARSON’S PRODUCT CORRELATION COEFFICIENT Finding the Correlation Coefficient ‘r’ using the calculator Finding the Correlation Coefficient ‘r’ using the calculator Study (hrs) Result (%) Click OK. The following information is revealed. Reading r off this gives the Correlation Coefficient r = This confirms that there is a strong positive relationship between the amount of study done and the score achieved on the maths test.
THE COEFFICIENT OF DETERMINATION
NOW TRY WORKBOOK SECTION TWO
THE COEFFICIENT OF DETERMINATION o Is a measure that allows us to determine how certain one can be in making predictions just by looking at a certain model/graph. o The coefficient of determination, r 2, tells us how much the variation in the dependent variable can be explained by the variation in the independent variable (note: not whether one causes the other). o We find r 2 simply by squaring our r value, ALSO our calculator also generates this value for us o Because r values have to be between -1 and +1, these r² values always fall between 0 and 1. o When answering questions about the Coefficient of Determination, the way we word our response is important – Variations to the dependent variable are not caused by variations of the independent variable, rather we can say, The Variations of the Dependent Variable can be explained by variations of the independent variable.
THE COEFFICIENT OF DETERMINATION If you need more help with this, view the following video tutorial
THE COEFFICIENT OF DETERMINATION Click STATISTICS. Insert Data into lists Choose CALC, Linear Reg Xlist: Choose list which has the INDEPENDENT variable Ylist: Choose list which has the DEPENDENT variable
THE COEFFICIENT OF DETERMINATION Study (hrs) Result (%) Again, looking at our prior example: Results on a Maths test were compared to the time spent studying the day before the test. This data was obtained from a sample of 10 students.
THE COEFFICIENT OF DETERMINATION Exercise (hrs) Heart Rate Lets try another example: The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising, to see if one influences the other. Choose CALC, Linear Reg Click STATISTICS. Insert Data into lists
THE COEFFICIENT OF DETERMINATION Exercise (hrs) Heart Rate Lets try another example: The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising, to see if one influences the other.
NOW TRY WORKBOOK SECTION THREE
LEAST SQUARES REGRESSION LINE o The least squares regression line is simply a line of best fit of your scatter-plotted data. o Our calculator can easily be used to plot this line and give us the equation that represents this line. What does it look like?? y=
LEAST SQUARES REGRESSION LINE Exercise (hrs) Heart Rate Lets find the linear regression line for our earlier example: The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising. Choose CALC, Linear Reg Click STATISTICS. Insert Data into lists
LEAST SQUARES REGRESSION LINE Exercise (hrs) Heart Rate
NOW TRY WORKBOOK SECTION FOUR
LEAST SQUARES REGRESSION EQUATIONS { WITHOUT THE LISTS OF DATA } There may be times when you are given some information about a set of bivariate data, but you have no access to the original table of data. You may be asked to use this information to find r, r 2, or the equation of the least squares regression line. Where a and b can be found using:
LEAST SQUARES REGRESSION EQUATIONS { WITHOUT THE LISTS OF DATA }
eg2. The following values were found when comparing the width of strawberries compared to the hours of daylight the plant was exposed to: Mean width of strawberries = 25mm Mean hours of daylight = 8 hours Standard Deviation of width of strawberries = 5mm Standard Deviation of hours of daylight = 3 hours Pearson’s Correlation Coefficient = 0.92 Find the equation of the least squares (linear) regression line (follow the given steps….) 1) Decide which is the IV (x-data) and which is the DV (y-data) 2) Define your variables based on which is your IV and DV 3) Sub values into your “b” equation to find “b” 4) Sub values into your “a” equation to find “a” 5) Sub your “a” and “b” values into the “y” equation to give your answer. LEAST SQUARES REGRESSION EQUATIONS { WITHOUT THE LISTS OF DATA }
USING THE CALCULATOR TO FIND VALUES {IF WE HAVE DATA} Enter your data into lists Choose calc – Two-Variable Xlist : Independent Data Ylist : Dependent Data
NOW TRY WORKBOOK SECTION FIVE
MAKING PREDICTIONS {INTERPOLATION & EXTRAPOLATION}
More on Interpolation vs Extrapolation…. Interpolation – Is an estimation of a value within the given data set. Extrapolation – Is an estimation of a value which is outside of the given set of data. Predicting a value which is an Interpolation is much more reliable than predicting an Extrapolation. Example 2: The height of plants were measured and compared to amount of water they received weekly. A) Is predicting the height of a plant which receives 600ml of water reliable? 600ml – This value is within our data set, so is an INTERPOLATION – predictions generally reliable. B) Is predicting the height of a plant which receives 2000ml of water reliable? 2000ml – This value is outside of our data set, so is an EXTRAPOLATION – predictions not as reliable. Water (ml) Height (cm)
MAKING PREDICTIONS {INTERPOLATION & EXTRAPOLATION} Example 3: A study of the weekly supermarket shopping cost of various household income groups produced the following results shown in the table below. a)Determine the least squares regression line b)Predict the weekly spend for an income of $720 c)Predict the weekly income if the supermarket spend is $146 d)Predict the weekly spend for an income of $1500 e)How reliable are your predictions in b and d? Weekly Income ($) Weekly Spend ($) b – Interpolation (lies within our given data set) RELIABLE d – Extrapolation (lies outside of the data set) NOT OVERLY RELIABLE, INDICATION ONLY
NOW TRY WORKBOOK SECTION SIX