10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.

Slides:



Advertisements
Similar presentations
Three or more categorical variables
Advertisements

Introduction to Regression with Measurement Error STA431: Spring 2015.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Agresti/Franklin Statistics, 1 of 52 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables.
Measures of Association Quiz
Aim: How do we establish causation?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Correlation AND EXPERIMENTAL DESIGN
Chapter 6: Correlational Research Examine whether variables are related to one another (whether they vary together). Correlation coefficient: statistic.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Designing Experiments In designing experiments we: Manipulate the independent.
Chapter 10 Relationships between variables
BPSChapter 61 Two-Way Tables. BPSChapter 62 To study associations between quantitative variables  correlation & regression (Ch 4 & Ch 5) To study associations.
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Ch 2 and 9.1 Relationships Between 2 Variables
Introduction to Regression with Measurement Error STA431: Spring 2013.
Multiple Regression III 4/16/12 More on categorical variables Missing data Variable Selection Stepwise Regression Confounding variables Not in book Professor.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Basic Practice of Statistics - 3rd Edition
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Chapter 15 – Elaborating Bivariate Tables
 Pg : 3b, 6b (form and strength)  Page : 10b, 12a, 16c, 16e.
Chapter 9 Comparing Two Groups
Hypothesis testing – mean differences between populations
Soc 3306a Lecture 10: Multivariate 3 Types of Relationships in Multiple Regression.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Exploratory Data Analysis: Two Variables
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
1 10. Causality and Correlation ECON 251 Research Methods.
 Correlation and regression are closely connected; however correlation does not require you to choose an explanatory variable and regression does. 
Correlation and Regression
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Today's topics ● Causal thinking, theories, hypotheses ● Independent and dependent variables; forms of relationships ● Formulating hypothesis; hypothesis.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation Chapter 15. Correlation Sir Francis Galton (Uncle to Darwin –Development of behavioral statistics –Father of Eugenics –Science of fingerprints.
1 Further Maths Chapter 4 Displaying and describing relationships between two variables.
Statistics: Introduction Healey Ch. 1. Outline The role of statistics in the research process Statistical applications Types of variables.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.5 Adjusting for the Effects of Other Variables.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Section 7.4 ~ The Search for Causality Introduction to Probability and Statistics Ms. Young.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Introduction to Research Methods How we come to know about crime.
Professor B. Jones University of California, Davis.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Reasoning in Psychology Using Statistics Psychology
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Introduction. The Role of Statistics in Science Research can be qualitative or quantitative Research can be qualitative or quantitative Where the research.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
2.7 The Question of Causation
Essential Statistics Regression
Two-Way Tables and The Chi-Square Test*
Establishing Causation
Chapter 2 Looking at Data— Relationships
Basic Practice of Statistics - 3rd Edition Regression
Section 6.2 Establishing Causation
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Chapter 4: More on Two-Variable Data
Presentation transcript:

10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory variables have an influence on any particular response variable. The effect of an explanatory variable on a response variable may change when we take into account other variables. (Picture such as on p. 305 for X = height, Y = achievement test score, taking into account grade level)

Example: Y = whether admitted into grad school at U. California, Berkeley (for the 6 largest departments) X = gender Whether admitted Gender Yes No Total %yes Female % Male % Difference of sample proportions = 0.44 – 0.30 = 0.14 has se = 0.014, Pearson  2 = 90.8 (df = 1), P-value = …. There is very strong evidence of a higher probability of admission for men than for women.

Now let X 1 = gender and X 2 = department to which the person applied. e.g., for Department A, Whether admitted Gender Yes No Total %yes Female % Male % Now,  2 = 17.4 (df = 1), but difference is 0.62 – 0.82 = T he strong evidence is that there is a higher probability of being admitted for women than men. What happens with other departments?

Female Male Difference of Dept. Total %admitted Total %admitted proportions  2 A % % B 25 68% % C % % D % % E % % F 341 7% 273 6% Total % % There are 6 “partial tables,” which summed give the original “bivariate” table. How can the partial table results be so different from the bivariate table?

Partial tables – display association between two variables at fixed levels of a “control variable.” Example: Previous page shows results from partial tables relating gender to whether admitted, controlling for (i.e., keeping constant) the level of department. When control variable X 2 is kept constant, changes in Y when X 1 changes are not due to changes in X 2 Note: When each pair of variables is associated, then a bivariate association for two variables may differ from its partial association, controlling for the other variable.

Example: Y = whether admitted is associated with X 1 = gender, but each of these itself associated with X 2 = department. Department associated with gender: Males tend to apply more to departments A, B, females to C, D, E, F Department associated with whether admitted: % admitted higher for dept. A, B, lower for C, D, E, F Moral: Association does not imply causation! This is true for quantitative and categorical variables. e.g., a strong correlation between quantitative var’s X and Y does not mean that changes in X cause changes in Y.

Why does association not imply causation? There may be some “alternative explanation” for the association. Example: Suppose there is a negative association between X = whether use marijuana regularly and Y = student GPA. Could the association be explained by some other variables that have an effect on each of these, such as achievement motivation or degree of interest in school or parental education? With observational data, effect of X on Y may be partly due to association of X and Y with lurking variables – variables that were not observed in the study but that influence the association of interest.

Unless there is appropriate time order, association is consistent with X causing Y or with Y causing X, or something else causing both. Even when the time order is appropriate, there could still be some alternative explanation, such as a variable Z that has causal influence on both X and Y. Especially tricky to measure cause and effect when both variables measured over time; e.g., annual data for a nation shows a negative association between the fertility level and the percentage of the nation’s population using the Internet.

Causation difficult to assess with observational studies, unlike experimental studies that can control potential lurking variables (by randomization, keeping different groups “balanced” on other variables). In an observational study, when X 1 and X 2 both have effects on Y but are also associated with each other, there is said to be confounding. It’s difficult to determine whether either truly causes Y, because a variable’s effect could be partly due to its association with the other variable. (Example in Exercise for X 1 = amount of exercise, Y = number of serious illnesses in past year, X 2 = age is a possible confounding variable)

Simpson’s paradox It is possible for the (bivariate) association between two variables to be positive, yet be negative at each fixed level of a third variable. (see scatterplot) Example: Florida countywide data (Ch.11, pp ) There is a positive correlation between crime rate and education (% residents of county with at least a high school education)! There is a negative correlation between crime rate and education at each level of urbanization (% living in an urban environment) (see scatterplot)

Types of Multivariate Relationships Spurious association: Y and X 1 both depend on X 2 and association disappears after controlling X 2 (Karl Pearson 1897, one year after developing sample estimate of Galton’s correlation, now called “Pearson correlation”) Example: For nations, percent owning TV negatively correlated with birth rate, but association disappears after control per capita gross domestic product (GDP). Example: College GPA and income later in life? Example: Math test score for child and whether family has Internet connection?

Chain relationship – Association disappears when control for intervening variable. Example: Gender  Department  Whether admitted (at least, for Departments B, C, D, E, F) Example (text, p. 309): For nations, educational attainment associated with life length. Perhaps Education  Income  Life length

Multiple causes – A variety of factors have influences on the response (most common in practice) In observational studies, usually all (or nearly all) explanatory variables have associations among themselves as well as with response var. Effect of any one changes depending on which other var’s are controlled (statistically), often because it has a direct effect and also indirect effects through other variables. Example: What causes Y = juvenile delinquency? X 1 = Being from poor family? X 2 = Being from a single-parent family? Perhaps X 2 has a direct effect on Y and an indirect effect through its effect on X 1.

Statistical interaction – Effect of X 1 on Y changes as the level of X 2 changes. Example: Effect of whether a smoker (yes, no) on whether have lung cancer (yes, no) changes as value of age changes (essentially no effect for young people, stronger effect for old people) Example: U.S. median annual income by race and gender Race Gender Black White Female $25,700 $29,700 Male $30,900 $40,400

The difference in median income between whites and blacks is: $4000 for females, $9500 for males i.e., the effect of race on income depends on gender (and the effect of gender on income depends on race), so there is interaction between race and gender in their effects on income. Example (p. 311): X = number of years of education Y = annual income (1000’s of dollars) Suppose E(Y) = x for men E(Y) = x for women The effect of education on income differs for men and women, so there is interaction between education and gender in their effects on income.

“No interaction” does not mean “no association” Example: Y = violent crime rate (no. violent crimes per 10,000 population), measured statewide X 1 = poverty rate (% with income below poverty level) X 2 = % single-parent families Based on actual data in text, it is plausible that E(Y) = X 1 (bivariate), but E(Y) = X 1 when X 2 < 10 E(Y) = X 1 when X 2 between 10 and 15 E(Y) = X 1 when X 2 > 15 Controlling for X 2, X 1 has a weaker effect on Y, but that effect is same at each level of X 2. The variables are all associated (positive correlation between each pair).

Some review questions What does it mean to “control for a variable”? When can we expect a bivariate association to change when we control for another variable? Give an example of an association that you would expect to be spurious. Draw a scatterplot showing a positive correlation for county-wide data on education and crime rate, but a negative association between those variables when we control for level of urbanization. Why is it that association does not imply causation?