Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Slides:

Advertisements

Similar presentations

Simple Linear Regression Chapter 6 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Advertisements

An Integrated Approach to Teaching with Real Data Joint Mathematics Meetings, January 2005 MAA Contributed Paper Session Using Real-World Data to Illustrate.

Correlation and Linear Regression.

Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.

Fundamental Features of Graphs All graphs have two, clearly-labeled axes that are drawn at a right angle. –The horizontal axis is the abscissa, or X-axis.

1 Using SPSS: Descriptive Statistics Department of Operations Weatherhead School of Management.

By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.

Determining and Interpreting Associations Among Variables.

Chapter18 Determining and Interpreting Associations Among Variables.

Project #3 by Daiva Kuncaite Problem 31 (p. 190)

Session 7.1 Bivariate Data Analysis

PSY 307 – Statistics for the Behavioral Sciences

FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,

Summary of Quantitative Analysis Neuman and Robson Ch. 11

Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.

Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Re-Expressing Variables

Measures of a Distribution’s Central Tendency, Spread, and Shape Chapter 3 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative.

Problem 1: Relationship between Two Variables-1 (1)

Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,

Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India

8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.

Chapter 21 Correlation. Correlation A measure of the strength of a linear relationship Although there are at least 6 methods for measuring correlation,

Correlations 11/5/2013. BSS Career Fair Wednesday 11/6/2013- Mabee A & B 12:30-2:30P.

Analyzing Data: Bivariate Relationships Chapter 7.

Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.

Examining Univariate Distributions Chapter 2 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Understanding Research Results

Joint Distributions AND CORRELATION Coefficients (Part 3)

LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.

How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.

Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.

Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.

Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)

Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.

1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.

METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.

Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.

Descriptive Statistics becoming familiar with the data.

Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.

DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.

Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.

Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)

Chapter 7 Scatterplots, Association, and Correlation.

DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.

Type author names here Social Research Methods Chapter 16: Using IBM SPSS for Windows (part 2) Alan Bryman Slides authored by Tom Owens.

Chapter 15: Correlation. Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship.

Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.

Determining and Interpreting Associations between Variables Cross-Tabs Chi-Square Correlation.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.

Chapter 2 Bivariate Data Scatterplots.   A scatterplot, which gives a visual display of the relationship between two variables.   In analysing the.

Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.

Determining and Interpreting Associations Among Variables

Bivariate Relationships

Chapter 10 CORRELATION.

Making Use of Associations Tests

Multiple Regression.

Understanding Research Results: Description and Correlation

Ch. 11: Quantifying and Interpreting Relationships Among Variables

Summarising and presenting data - Bivariate analysis

Chapter Nine: Using Statistics to Answer Questions

Making Use of Associations Tests

COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.

Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.

Presentation transcript:

Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using

Summarizing the Relationship Between Two Variables: An Overview Variable TypesSummary GraphicSummary Statistic Both ScaleScatterplotPearson Correlation Both OrdinalScatterplotSpearman Correlation Ordinal & ScaleScatterplotSpearman Correlation Scale & DichotomyScatterplot or BoxplotPearson (point biserial) Both DichotomiesClustered Bar GraphPearson (phi-coefficient) or Contingency Table

The Relationship Between Two Scale Variables: What the Scatterplot Tells Us Whether the relationship appears linear If linear, it also tells us: The direction (or nature) of the linear relationship The relative strength of the linear relationship

A Scatterplot Example: Hamburger Data Set

Interpreting the Fat vs. Calories Scatterplot A line, as opposed to a simple curve, appears to fit the data well; i.e., a linear model appears appropriate for these data. The direction of the linear relationship is positive because the line has a positive slope; i.e., burgers that are relatively high in fat tend also to be relatively high in calories. The strength of the linear relationship appears to be strong because the points cluster tightly around the line.

Creating the Scatterplot (for FAT and CALORIES) in SPSS Go to Graphs on the main menu bar, Legacy Dialogs, and then Scatter. Click Define. Put CALORIES into the box labeled y-axis and FAT into the box labeled x-axis and click OK.

Labeling the Points of the FAT vs. CALORIES Scatterplot Go to Graphs on main menu bar, Legacy Dialogs, then Scatter. Click Define. Put CALORIES into box labeled y-axis, FAT into box labeled x-axis, and NAME in box labeled label cases by. Click OK. Double click on the graph to put it into Chart Editor. Click Elements, Show Data Labels. Move Name to Displayed box, Eliminate count. Click Apply, Close.

Labeled Scatterplot: Hamburger Data Set

Scatterplot Example: States Data Set (Percentage of eligible students in the State taking the SAT (PERTAK) vs. the average State verbal SAT score (SATV))

Interpreting the SATV vs. PERTAK Scatterplot Although the points have a curvilinear shape, a line would appear to represent these points reasonably well, and so we will use it in this case. The direction of the linear relationship is negative because the slope of the line is negative; i.e., states with a relatively low percentage of students taking the SAT tend to have higher average SAT Verbal scores. The strength of the linear relationship is more moderate than in the hamburger example because the points in this case do not cluster as tightly around the line.

Scatterplot Example: Currency Data Set (Denomination (BILLVALUE) vs. Number of bills in circulation (NUMBER)). Note: Need to combine variables to create a variable for the number of bills in circulation.

Interpreting the Denomination vs. Number Scatterplot The points have a “cloud like” formation, and so neither a simple curve nor a line provides a good fit to these data. It appears, therefore, that there is little or no relationship between the value of a bill and its number in circulation.

Scatterplot Example: Marijuana Data Set (Year vs. Percentage of students reporting ever having used marijuana from ) Note: Use Select Cases to restrict cases to the appropriate years.

Interpreting the Year vs. Marijuana Scatterplot A simple curve (or two lines) provides a better fit to the data than a single line and is therefore more appropriate than a line for modeling the data. We conclude that the relationship between marijuana use and year is non-linear.

Quantifying a Linear Relationship between Two Scale Variables: Pearson Product Moment Correlation Coefficient Called simply, correlation, and symbolized by the letter r. Before calculating, use a scatterplot to verify that the relationship between the variables appears to be linear. Measures the direction, nature (the interpretation of the direction), and strength of the linear relationship. Direction (and Nature): Measured by the sign of r (positive or negative) Strength: Measured by the absolute magnitude of r. ▫ Rule of thumb (Cohen’s scale): r =.5 strong rel

Obtaining the Pearson Correlation Using SPSS Click Analyze on the Main Menu Bar, Correlate, and Bivariate. Move the two relevant variables into the Variables Box. Click OK.

Interpreting Pearson Correlation Coefficients The correlation between FAT and CALORIES is.997, represents a very strong positive linear relationship between these two variables. => Burgers that are relatively high in fat tend also to be relatively high in calories and burgers that are relatively low in fat tend also to be relatively low in calories. The correlation between the Average SATVs of states and the percentage of students in these states who take the SAT is -.86, representing a strong negative linear relationship between these two variables. => States with a relatively high average SATV tend to have a relatively low percentage of students taking the SAT; states with a relatively low average SATV tend to have a relatively high percentage of students taking the SAT.

Other Properties of the Correlation Correlation values are on an ordinal scale Correlation does not imply causation, i.e. when two variables are correlated it is not necessarily true that changing one will result in a predictable change in the other A linear transformation applied to one variable does not change the magnitude of the correlation. The sign of the correlation will change, however, if the transformation involves multiplication by a negative number Restricting the range of one of the variables can either increase or decrease the magnitude of the correlation

Quantifying the Relationship between Two Ordinal Variables or Between One Ordinal and One Scale Variable: The Spearman Rank Correlation Coefficient The Spearman correlation, called Spearman’s rho, is a special case of the Pearson correlation computed on ranked data. Example: Quantify the relationship between the amount of time spent in school on homework (HWKIN12) and the amount of time spent out of school on homework (HWKOUT12) in twelfth grade for students in the NELS data set.

Beginning with a Scatterplot: HWKIN12 and HWKOUT12

Obtaining the Spearman Rank Correlation Coefficient using SPSS Click Analyze, Correlate, Bivariate. Move the variables HWKIN12 and HWKOUT12 into the Variables box. Click Spearman and remove the check from Pearson in the Correlation Coefficients box. Click OK. Note: When using SPSS, there is no need to transform the data to rankings to obtain the Spearman correlation because SPSS automatically will do this transformation for us.

Interpreting the Spearman Rank Correlation Coefficient The Spearman correlation is interpreted in the same way as the Pearson correlation. In this example, Spearman’s rho =.40. => Twelfth grade students in the NELS data set who spend a relatively large amount of time doing homework in school also spend a relatively large amount of time doing homework outside of school and students who spend a relatively small amount of time doing homework in school tend also to spend a relatively small amount of time doing homework outside of school.

Quantifying the relationship between One Scale and One Dichotomous Variable – The Point Biserial Correlation Coefficient Example using the Hamburg data set: Calories vs. Cheese.

Interpreting the relationship between Calories and Cheese r =.51 between CALORIES and CHEESE. In this case r is called a point biserial correlation, a special case of the Pearson correlation coefficient Note: The sign of the correlation is positive CHEESE is coded 0 (a relatively low score on the cheese metric) to represent the absence of cheese and 1 (a relatively high score on the cheese metric) to represent the presence of cheese. => Burgers with cheese tend to be higher in calories than those without cheese.

Another Example of the Point Biserial Correlation: Impeach Data Set On February 12, 1999, for only the second time in the nation’s history, the U.S. Senate voted on whether to remove a President, based on impeachment articles passed by the U.S. House. Dozens of political talk shows featured analyses of why senators may have voted the way they did, but such discourse was rarely (if ever) informed by systematic statistical analysis of the votes. Professor Alan Reifman of Texas Tech University created a relevant data set about the senators’ voting to be used for such an analysis.

The Impeach Data Set

Conservatism Score vs. Vote on Perjury Scatterplot

Interpreting the relationship between Senators’ Conservatism and Their Vote on Perjury r =.87 between Conservatism and VOTE1. Note: The sign of the correlation is positive VOTE1 is coded 0 (a relatively low score on the voting metric) to represent not guilty of perjury and 1 (a relatively high score on the voting metric) to represent guilty of perjury. => Senators who are more conservative tended to vote guilty on perjury.

Conservatism Score vs. Vote on Obstruction of Justice Scatterplot

Interpreting the Relationship between Senators’ Conservatism and their Vote on Obstruction of Justice r =.94 between Conservatism and VOTE2. Note: The sign of the correlation is positive VOTE2 is coded 0 (a relatively low score on the voting metric) to represent not guilty of obstruction of justice and 1 (a relatively high score on the voting metric) to represent guilty of obstruction of justice. => Senators who are more conservative tended to vote guilty on obstruction of justice.

Quantifying a Relationship between Two Dichotomous Variables – The Phi Coefficient Example: Is there a relationship between the political party of a senator (Democrat or Republican) and his/her vote on obstruction of justice? Note: the Phi Coefficient is also a special case of the Pearson. Rather than use a scatterplot to represent the data graphically, we use a clustered bar graph.

Using SPSS to Obtain a Clustered Bar Graph Click Graphs on the main menu bar, Legacy Dialogs, and Bar. Change from Simple to Clustered and click Define. Put VOTE1 in the Category Axis box and NEWBIE in the Define Clusters By box. Click OK.

Graphically Representing First-Term and Vote on Perjury: The Clustered Bar Graph

Tabulating the Relationship between First-Term and Vote on Perjury: The Contingency Table To obtain the frequencies of each of the four cells (a contingency table or cross-tabulation), click Analyze on the main menu bar, Descriptive Statistics, Crosstabs. Put VOTE1 in the Row(s) box and NEWBIE in the Column(s) box. Click OK.

Analyzing and Interpreting the Contingency Table First term senators tended to vote guilty and more established senators tended to vote not guilty. Any of the following alternatives may be used to provide statistical support: Approximately 62.9 percent (39/62*100) of the non-first term senators voted not guilty whereas 42.1 percent (16/38*100) of the first term senators voted not guilty. Approximately 37.1 percent (23/62*100) of the non-first term senators voted guilty whereas 57.9 percent (22/38*100) of the first term senators voted guilty. Approximately 70.9 percent (39/55*100) of the not guilty votes came from non-first term senators whereas 51.1 percent (23/45*100) of the guilty votes came from non-first term senators. Approximately 29.1 percent (16/55*100) of the not guilty votes came from first term senators whereas 48.9 percent (22/45*100) of the guilty votes came from first term senators.

Quantifying and Interpreting the Relationship between First-Term and Vote on Perjury: The Phi Coefficient The correlation between VOTE1 and NEWBIE is r =.20. VOTE2 is coded with 0 (a relatively low score) representing not guilty and 1 (a relatively high score) representing guilty. NEWBIE is coded with 0 representing non-first term and 1 representing first term. The sign of the correlation is positive, so high scores on one variable are associated with high scores on the other. First term senators tended to vote guilty on perjury and more established senators tended to vote not guilty.

Relationships between Other Variable Types Dichotomous vs. Non-dichotomous Nominal or Ordinal (with limited categories): Vote on obstruction of justice vs. census region Scale vs. Non-dichotomous Nominal or Ordinal (with limited categories): Conservatism vs. census region.

Graphically Representing Vote on Obstruction of Justice vs. Census Region: The Clustered Bar Graph

Tabulating Vote on Obstruction of Justice vs. Census Region: The Contingency Table

Analyzing and Interpreting the Contingency Table Senators from the Northeast tended to vote Not Guilty; those from the South and West tended to vote Guilty; and those from the Midwest were equally likely to vote Guilty or Not Guilty. In particular, approximately 83.3 percent (15/18*100) of the senators from the Northeast voted Not Guilty whereas 50.0 percent (12/24*200) from the Midwest, 40.6 percent (13/32*200) from the South, and 38.5 percent (10/26*200) from the West voted not guilty. Alternatively, in terms of voting Guilty, approximately 16.7 percent (3/18*100) of the senators from the Northeast voted Guilty whereas 50.0 percent (12/24*200) from the Midwest, 59.4 percent (19/32*200) from the South, and 61.5 percent (16/26*200) from the West voted Guilty.

Graphically Representing Conservatism Score vs. Census Region: The Boxplot

Summary Tabulation of Conservatism Scores by Census Region: A Comparison by Means & Medians

Analyzing and Interpreting these Summary Data: A Preference for Medians Because the data are noticeably skewed for the Northeast region, a more appropriate comparison of conservatism across regions is via the median, although results based on the means in this example, yield the same result. According to the median values, the most conservative senators come from the South (Median=72), followed by the West (Median=64), the Midwest (Median=50), and finally, the Northeast (Median=19.5).

Summarizing The Possibilities