Scatterplots & Correlations Chapter 4. What we are going to cover Explanatory (Independent) and Response (Dependent) variables Displaying relationships.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Correlation and Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
CHAPTER 4: Scatterplots and Correlation
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
QUANTITATIVE DATA ANALYSIS
The Simple Regression Model
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
PSY 307 – Statistics for the Behavioral Sciences
CHAPTER 3 Describing Relationships
Correlational Designs
Summary of Quantitative Analysis Neuman and Robson Ch. 11
DESIGNING, CONDUCTING, ANALYZING & INTERPRETING DESCRIPTIVE RESEARCH CHAPTERS 7 & 11 Kristina Feldner.
Relationships Among Variables
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
AP STATISTICS LESSON 3 – 1 EXAMINING RELATIONSHIPS SCATTER PLOTS.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Correlation.
Chapter 15 Correlation and Regression
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Statistical Analysis Topic – Math skills requirements.
1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
1 Further Maths Chapter 4 Displaying and describing relationships between two variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Summarizing the Relationship Between Two Variables with Tables and a bit of a review Chapters 6 and 7 Jan 31 and Feb 1, 2012.
The Practice of Statistics
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Statistical Analysis Topic – Math skills requirements.
Chapter Eight: Using Statistics to Answer Questions.
Describing Distributions with Numbers Chapter 2. What we will do We are continuing our exploration of data. In the last chapter we graphically depicted.
Statistics. Descriptive Statistics Organize & summarize data (ex: central tendency & variability.
ANOVA, Regression and Multiple Regression March
Outline of Today’s Discussion 1.Practice in SPSS: Scatter Plots 2.Practice in SPSS: Correlations 3.Spearman’s Rho.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Chapter 13 Understanding research results: statistical inference.
Regression Chapter 5 January 24 – Part II.
Correlations: Linear Relationships Data What kind of measures are used? interval, ratio nominal Correlation Analysis: Pearson’s r (ordinal scales use Spearman’s.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 12 Understanding Research Results: Description and Correlation
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
CHAPTER 26: Inference for Regression
Introduction to bivariate data
Chapter 2 Looking at Data— Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Scatterplots & Correlations Chapter 4

What we are going to cover Explanatory (Independent) and Response (Dependent) variables Displaying relationships with scatterplots Interpreting scatterplots Adding categorical variables to scatterplots Measuring linear associations with correlation Important facts and issues with correlations

Starting with some terminology Response variables (Dependent) = Ys Explanatory variables (Independent) = Xs When stating relationships we generally state the dependent first. When graphically depicting relationship we generally place the dependent on the y axis. In most stats software the dialogue boxes follow this convention and ask you to enter the dependent or response variable first

Here is the Scatterplot dialogue box for EXCEL with the publisher’s plugin

A scatterplot displays the relationship between two quantitative variables measured on the same individual or event, etc.

Just as we began our discussion of the distribution of individual variables by graphically depicting them, so we do when we are interested in relationships between variables Scatterplots are a great way to do this depiction.

Adjusting your graph (art and science) This is the original excel scatterplot This is my adjusted excel scatterplot

Once again let’s look for patterns of regularity and outliers Using the four step method –State the problem (in this case does the percent of students taking the SAT influence math scores –Plan we can try to observe this with a scatterplot –Solve (interpret the plot), notice there is something of a downward sloping left to right line and some clustering –Conclude, there does appear to be a negative association between the variables, as the percent of students taking the SAT in a state increases, the average math score of the state declines

We can also group data in a scatterplot As can be seen, the data in the previous chart has been group by region (a nominal variable) in this example In the last class I did the same thing when I divided my data on income into two separate sets for men and women and made side by side box plots

Measuring Linear Correlations Just as in the past lesson, where we moved from depicting data in graphs to summarizing them with numbers, so we can do the same with associations. A statistic which is commonly used to measure the strength of an association when data is measured at the interval and ratio level is “r” (Pearson’s r).

Pearson’s r really just builds on what we did with descriptive statistics. Now we are finding the distance of each point from the mean of x over the x variance multiplied by the mean of y over the y variance. In other words, it is based on standardized values

Some important points about “r” Correlations are symmetrical statistics, they will produce the same result whichever variable you tag as explanatory and respondent Because “r” uses standardized values it does not change if you rescale variables A negative signed “r” indicates a negative association, a positive sign indicates a positive relationship. r varies between -1 and 1. –Values approaching 0 indicate no association. –Values approaching -1 indicate a near perfect negative linear relationship –Values approach 1 indicate a near perfect positive linear relationship.

Some warnings As noted, Pearson’s r only works if both variables are measured at least at the interval level Do a scatterplot first. –r only works with linear (or nearly linear) relationships. As curvature enters the picture, r’s use declines –outliers (extreme high and low values) will distort r Correlations do not provide a total summary of relationships, you should usually also provide the means of x and y and their standard deviations so people can evaluate the usefulness of the correlation

Spearman’s rho (a correlation for ordinal data) Spearman’s rho (or rank order correlation) is a correlation you can use with ordinal data. As with “r” it varies between -1 and 1 and a value approaching 0 indicates no meaningful relationship between the variables. It is very handy and is used in a number of situations. For example, in sports very elaborate computer programs are used to rank players and/or teams. We could use rho to analyze whether the rankings reliably predict who wins (for example in tennis). Another common use is when you are looking for associations among opinion data which is collected at the ordinal level. We won’t calculate this. Enough to say that most programs that do “r” will have a nearby function for rho.

The following table is from Cohn, CJPS 38:2 (2005),

Some things you will note, In the previous table beside “rho” there was a number titled “significance”. As with most statistics, “r” and “rho” have known distributions with given data set sizes (degrees of freedom [N-1]). Significance answers the question, given the degrees of freedom, how likely are we to see this score for the statistic?

A score of 0.05 or less would mean there is a 5% or less chance that these results could occur if we randomly drew results. In other words, there is a 95% chance that these results represent a genuine association of the strength reported between the variables. The score in the table was This means there is almost no chance a Rho of this strength could occur with this many cases by simple random chance. Therefore, there is a very high likelihood that the strength of association reported between the variables is a genuine association. You will hear more about significance as the course proceeds.