Welcome to the Unit 5 Seminar Kristin Webster

Slides:



Advertisements
Similar presentations
7.1 Seeking Correlation LEARNING GOAL
Advertisements

Chapter 4 The Relation between Two Variables
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Section 7.2 ~ Interpreting Correlations Introduction to Probability and Statistics Ms. Young ~ room 113.
Copyright © 2015, 2011, 2008 Pearson Education, Inc. Chapter 5, Unit E, Slide 1 Statistical Reasoning 5.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Describing Relationships: Scatterplots and Correlation
Linear Regression.
Relationship of two variables
Correlation and regression 1: Correlation Coefficient
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
WELCOME TO THETOPPERSWAY.COM.
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
Chapter 10 Correlation and Regression
Correlation & Regression
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
Section 7.4 ~ The Search for Causality Introduction to Probability and Statistics Ms. Young.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter.
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
Found StatCrunch Resources
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
7.1 Seeking Correlation LEARNING GOAL
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or.
Copyright © 2009 Pearson Education, Inc. 7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations.
Unit 5E Correlation and Causality. CORRELATION Heights and weights Study Time and Test Score Available Gasoline and Price of Gasoline A correlation exists.
Some Reminders: Check the Roster below the chat area to make sure you are listed, especially if it says you left! Audio starts on the hour. Active on-topic.
Welcome to Week 05 College Statistics
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Correlation.
7.1 Seeking Correlation LEARNING GOAL
Correlation & Forecasting
Statistics 200 Lecture #6 Thursday, September 8, 2016
Regression and Correlation
Chapter 3: Describing Relationships
Topic 10 - Linear Regression
CHAPTER 7 LINEAR RELATIONSHIPS
Chapter 3: Describing Relationships
Correlation 10/27.
7.2 Interpreting Correlations
SIMPLE LINEAR REGRESSION MODEL
Cautions about Correlation and Regression
Correlation and Regression
7.3 Best-Fit Lines and Prediction
Correlation 10/27.
Elementary Statistics
7.2 Interpreting Correlations
Chapter 2: Looking at Data — Relationships
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression and Residual Plots
2. Find the equation of line of regression
7.2 Interpreting Correlations
Lecture Notes The Relation between Two Variables Q Q
Correlation and Regression
7.3 Best-Fit Lines and Prediction
CORRELATION ANALYSIS.
Correlation and Causality
Statistical Reasoning
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
An Introduction to Correlational Research
Chapter 3: Describing Relationships
Correlation and Causality
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Welcome to the Unit 5 Seminar Kristin Webster MM207 Statistics Welcome to the Unit 5 Seminar Kristin Webster

Correlation and Scatter Diagrams A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable. A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables. The x variable is on the horizontal axis The y variable is on the vertical axis The scatter plot is the location for each x,y pair. Here are a few examples of correlations: There is a correlation between the variables amount of smoking and likelihood of lung cancer; that is heavier smokers are more likely to get lung cancer. There is a correlation between the variables height and weight for people; that is, taller people tend to weigh more than shorter people. There is a correlation between the variables demand for apples and price of apples; that is, demand tends to decrease as price increases. There is a correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled.

Types of Correlations Positive: x and y move in the same direction Negative: x and y move in opposite directions Zero: no pattern of movement in x and y Nonlinear relationship: The two variables are related, but the relationship results in a scatter diagram that does not follow a straight-line pattern. See page 289

Strength of the Correlation Statisticians measure the strength of a correlation with a number called the correlation coefficient, represented by the letter r.

Properties of the Correlation Coefficient, r The correlation coefficient, r, is a measure of the strength of a correlation. Its value can range only from -1 to 1. If there is no correlation, the points do not follow any ascending or descending straightline pattern, and the value of r is close to 0. If there is a positive correlation, the correlation coefficient is positive (0 < r ≤ 1): Both variables increase together. A perfect positive correlation (in which all the points on a scatter diagram lie on an ascending straight line) has a correlation coefficient r = 1. Values of r close to 1 mean a strong positive correlation and positive values closer to 0 mean a weak positive correlation. If there is a negative correlation, the correlation coefficient is negative (-1 ≤ r < 0): When one variable increases, the other decreases. A perfect negative correlation (in which all the points lie on a descending straight line) has a correlation coefficient r = -1. Values of r close to -1 mean a strong negative correlation and negative values closer to 0 mean a weak negative correlation.

Beware of Outliers If you calculate the correlation coefficient for these data, you’ll find that it is a relatively high r = 0.880, suggesting a very strong correlation. However, if you cover the data point in the upper right corner, the apparent correlation disappears. In fact, without this data point, the correlation coefficient is r = 0.

Correlation Does Not Imply Causality Possible Explanations for a Correlation The correlation may be a coincidence. Both correlation variables might be directly influenced by some common underlying cause. One of the correlated variables may actually be a cause of the other. But note that, even in this case, it may be just one of several causes.

Best-Fit Line The best-fit line (or regression line) on a scatter diagram is a line that lies closer to the data points than any other possible line (according to a standard statistical measure of closeness).

Cautions in Making Predictions from Best-Fit Lines Don’t expect a best-fit line to give a good prediction unless the correlation is strong and there are many data points. If the sample points lie very close to the best-fit line, the correlation is very strong and the prediction is more likely to be accurate. If the sample points lie away from the best-fit line by substantial amounts, the correlation is weak and predictions tend to be much less accurate. Don’t use a best-fit line to make predictions beyond the bounds of the data points to which the line was fit. A best-fit line based on past data is not necessarily valid now and might not result in valid predictions of the future. Don’t make predictions about a population that is different from the population from which the sample data were drawn. Remember that a best-fit line is meaningless when there is no significant correlation or when the relationship is nonlinear.

Coefficient of Determination The square of the correlation coefficient, or r2, is the proportion of the variation in a variable that is accounted for by the best-fit line. Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = -0.1. Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election? Solution: The square of the correlation coefficient is r2 = (-0.1)2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout.

Multiple Regression The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R2, tells us the proportion of the scatter in the data accounted for by the best-fit equation.

The Search for Causality A correlation may suggest causality, but by itself a correlation never establishes causality. Much more evidence is required to establish that one factor causes another. Earlier, we found that a correlation between two variables may be the result of either (1) coincidence, (2) a common underlying cause, or (3) one variable actually having a direct influence on the other. The process of establishing causality is essentially a process of ruling out the first two explanations.

Determining Causality We can rule out coincidence by repeating the experiment many times or using a large number of subjects in the experiment. Because coincidences occur randomly, they should not occur consistently in many subjects or experiments. If the controls rule out confounding variables, any remaining effects must be caused by the variables being studied.

Hidden Causality Sometimes correlations—or the lack of a correlation—can hide an underlying causality. For example, studies suggested patients who had heart bypass surgery fared no better than those who didn’t. But researchers found confounding variables that early studies had not considered, such as amount of blockage and surgical techniques. These confounding variables prevented the studies from finding a real correlation between the surgery and prolonged life.

Using StatCrunch to calculate correlation

Questions?