Introduction to bivariate data

Slides:



Advertisements
Similar presentations
Covariance and Correlation
Advertisements

Bivariate Distributions Overview. I. Exploring Data Describing patterns and departures from patterns (20%-30%) Exploring analysis of data makes use of.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Scatterplots & Correlations Chapter 4. What we are going to cover Explanatory (Independent) and Response (Dependent) variables Displaying relationships.
Chapter 3: Describing Relationships
Warm Up Scatter Plot Activity.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Describing Relationships
Describing Relationships
MATH 2311 Section 5.1 & 5.2.
Objectives Fit scatter plot data using linear models with and without technology. Use linear models to make predictions.
CHAPTER 7 LINEAR RELATIONSHIPS
Computations, and the best fitting line.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS

Multiple Regression.
Chapter 3: Describing Relationships
Scatterplots A way of displaying numeric data
Regression and Residual Plots
Chapter 7 Part 1 Scatterplots, Association, and Correlation
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Investigation 4 Students will be able to identify correlations in data and calculate and interpret standard deviation.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Describing Relationships
Scatterplots and Correlation
Linear Functions Algebra 2 Concepts.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
3.1: Scatterplots & Correlation
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
September 25, 2013 Chapter 3: Describing Relationships Section 3.1
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Summarizing Bivariate Data
Unit 2 Quantitative Interpretation of Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
AP Stats Agenda Text book swap 2nd edition to 3rd Frappy – YAY
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Association between 2 variables
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Correlation Coefficient
Basic Practice of Statistics - 3rd Edition
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Introduction to bivariate data

Why Two Variables instead of just one?

We

Let’s collect some data. (except we don’t have time, so you will just read about collecting this data) Image that everyone in class counts the number of text messages they received yesterday. Then I select a sample of 10 students and actually record their number of messages received. That’s one-variable data, so we could make a dotplot or some other kind of graphical display.

How the heck do we do that? I want to use this data to PREDICT the number of text messages received by the next randomly chosen student. How the heck do we do that? The best number we can use as a predictor is the mean from the 10 students in our sample But that still might not be a very good prediction Can we make our prediction better? Is there some other variable that influences the number of text messages a student might receive? ? ?

How about the number of messages sent? Could we collect data on messages sent and received, an then use that to make a prediction of the number of messages received based on how many messages were sent?

This is what we mean by Bivariate (Two Variable) Data This is what we mean by Bivariate (Two Variable) Data. Instead of looking at one variable at a time, we look at two variables that are related. Perhaps one even depends on the other. We are able to use the value of one of the variables to make a prediction about the other

Instead of having one axis or scale, we’re going to have two axes Instead of having one axis or scale, we’re going to have two axes. The same ones we call “x” ad “y” in algebra But instead of just having equations like we usually did in algebra, we’re going to start with data, which we can display as a scatterplot.

Correlation and the correlation coefficient How close are we to a line?

In AP Stats, when it comes to Bivariate Data, we like lines In AP Stats, when it comes to Bivariate Data, we like lines. Computers and graphing calculators can create all sorts of equations from data, but we’re only going to be interested in creating lines. So we need to judge our data as to how close it is to being linear. In fact, we’re going to calculate a number that will quantify how linear our data is.

Let’s start with this set of data: We can also do some summary statistics, the mean and standard deviation of the x’s and the y’s 𝑥 = 2.75 and 𝑠 𝑥 = 1.708 𝑦 = 2.5 and 𝑠 𝑦 = 1.291 x y 1 2 3 5 4

Let’s look at the plot again with the mean x and mean y values added. Using z-scores, we can see how far away from the mean each of our points are, relative to the others. When we do z-scores, we can get positive answers (above the mean) and negative answers (below the mean) The z-scores for the x-value and y-value of each point are given on the next slide. 𝑥 𝑦

𝑥 = 2.75 and 𝑠 𝑥 = 1.708 𝑦 = 2.5 and 𝑠 𝑦 = 1.291 z = 𝑥 − 𝑥 𝑠 𝑥 or 𝑦 − 𝑦 𝑠 𝑦 x-coord. y-coord. z-score of x z-score of y 1 1−2.75 1.708 = -1.0245 1−2.5 1.291 = -1.1619 2 3 2−2.75 1.708 = -0.4391 1−2.5 1.291 = 0.3873 3−2.75 1.708 = 0.1464 1−2.5 1.291 = -0.3873 5 4 5−2.75 1.708 = 1.3173 1−2.5 1.291 = 1.16189

And now, just for kicks, lets multiply the x z-score and y z- score for each point together! OK, it’s not just for kicks. This is how we combine the effects of the x-coordinate and the y-coordinate together The fact that we multiply is also not arbitrary. It’s how we get positive or negative slope, or positive or negative correlation (remember that from algebra I?) Points below 𝑦 have negative z-scores, and points above are positive. Points to the left of 𝑥 have negative z- scores and points to the right are positive. Multiplication rules from algebra: If the signs are the same, the product is positive. If the signs are different, the product is negative

No matter where the x- and y-axes are, the mean of the x-values and mean of the y- values split the graph into four quadrants. If there are more data points in the blue quadrants you have a positive relationship, or positive correlation. If there are more points in the white quadrants you have a negative relationship or negative correlation z-score of x’s negative z-score of y’s positive Product = negative z-score of x’s positive z-score of y’s negative Product = positive 𝒙 𝒚

So, back to this multiplication thing… x-coord. y-coord. z-score of x z-score of y 1 1−2.75 1.708 = -1.0245 1−2.5 1.291 = -1.1619 2 3 2−2.75 1.708 = -0.4391 1−2.5 1.291 = 0.3873 3−2.75 1.708 = 0.1464 1−2.5 1.291 = -0.3873 5 4 5−2.75 1.708 = 1.3173 1−2.5 1.291 = 1.16189 (-1.2045)(-1.1619) = 1.1905 (-0.4391)(0.3873) = -1.701 (0.1464)(-0.3873) = -0.567 (1.3173)(1.16189) = 1.531 And while we’re at it, why don’t we go ahead and find an average of these x-z-score-y-z-score products. (Notice we’ve been using x-bar and y-bar, not µ or σ, so when we take the average, we’re going to divide by how many we have minus 1) Average = 1.1905 −1.701 −0.567+1.531 3 = .83156 This number, the average of the products of the z-scores for the x- and y-coordinate of each point, is our magical number that tells us how closely our points are to being perfect line.

It’s not really magic. It’s how mathematicians and statisticians decided to measure “how close are we to a line?” This number is called the CORRELATION COEFFICIENT, and the symbol we use is r The formula, based on the steps we just did, is r = 1 𝑛−1 𝑥− 𝑥 𝑠 𝑥 𝑦− 𝑦 𝑠 𝑦 The sign of r tells us if our data points have a positive correlation or negative correlation. If the value calculated for r is exactly 1 or negative 1, our points are in an exact line. If the value of r (regardless of sign) is greater than .8, we say that we have a strong relationship If the value of r is between .5 and .8, we say that we have a moderate relationship If the value of r is less than .5, we say we have a weak relationship

r is a quantitative value that tells us the strength and direction of the linear relationship that exists in any set of data points r only tells us what kind of linear relationship we have Interpretation of r: When you are asked to interpret the meaning of the correlation coefficient, r, you always do so using the following sentence: There is a [weak/moderate/strong (depending on what the value of r is)], [positive/negative (depending on the sign of r)] LINEAR relationship between [ x variable in context ] and [ y variable in context]. MEMORIZE this sentence. You will change the value of the blue words based on the data set that you have. Luckily, we don’t have to use that awful formula. The calculator calculates this value for us in a couple of different ways. Your first handout has all the steps you need.