Stratification Matters: Analysis of 3 Variables

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Three or more categorical variables
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chapter 13: Inference for Distributions of Categorical Data
Analysis of frequency counts with Chi square
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Clustered or Multilevel Data
Sampling Prepared by Dr. Manal Moussa. Sampling Prepared by Dr. Manal Moussa.
Categorical Data Prof. Andy Field.
Experimental Design making causal inferences Richard Lambert, Ph.D.
Probability Unit 4 - Statistics What is probability? Proportion of times any outcome of any random phenomenon would occur in a very long series of repetitions.
Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Chapter 10: Cross-Tabulation Relationships Between Variables  Independent and Dependent Variables  Constructing a Bivariate Table  Computing Percentages.
Feb. 13 Chapter 12, Try 1-9 Read Ch. 15 for next Monday No meeting Friday.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
Goodness-of-Fit and Contingency Tables Chapter 11.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Second factor: education
Displaying and Describing Categorical Data
Conditioning , Stratification & Backdoor Criterion
CHAPTER 4 Designing Studies
Displaying and Describing Categorical Data
The Practice of Statistics in the Life Sciences Third Edition
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Data Collection Principles
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Sampling: Stratified vs Cluster
AP Statistics Chapter 3 Part 3
Displaying and Describing Categorical Data
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
Looking at Data - Relationships Data analysis for two-way tables
Chapter 2 Looking at Data— Relationships
The Practice of Statistics in the Life Sciences Fourth Edition
Log Linear Modeling of Independence
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Test of Independence in 3 Variables
Relations in Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Second factor: education
Test of Independence through Mutual Information
Propagation Algorithm in Bayesian Networks
Announcements 100 Years: Let's celebrate! The National Park Service turns 100 on August 25, 2016, and everyone can take part in the celebration! To honor.
Displaying and Describing Categorical Data
Comparing two Rates Farrokh Alemi Ph.D.
Wednesday, September 21, 2016 Farrokh Alemi, PhD.
Analyzing the Association Between Categorical Variables
Section 4-3 Relations in Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
Sampling Design Basic concept
P-Chart Farrokh Alemi, Ph.D. This lecture was organized by Dr. Alemi.
Improving Overlap Farrokh Alemi, Ph.D.
Displaying and Describing Categorical Data
Inference for Two Way Tables
Xbar Chart By Farrokh Alemi Ph.D
Section Way Tables and Marginal Distributions
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Relations in Categorical Data
Risk Adjusted P-chart Farrokh Alemi, Ph.D.
Wednesday, October 05, 2016 Farrokh Alemi, PhD.
Multivariate Relationships
Displaying and Describing Categorical Data
Categorical Data By Farrokh Alemi, Ph.D.
Presentation transcript:

Stratification Matters: Analysis of 3 Variables Thursday, August 04, 2016 Farrokh Alemi, PhD. Based on work of CF Jeff Lin PhD This lecture focuses on stratification. The lecture is based on slides prepared by Dr. Lin and modified by Dr. Alemi.

Stratification Ceteris Paribus Divide into subgroups Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. Within the strata, members share the same features so impact of variables can be assessed without the influence of the shared features. Since the shared features are held constant within the strata, this is sometimes referred to as ceteris paribus, or holding all other things constant

Natural Stratification Subgroups that are observed in the sample If the data have without pre-planning fallen into several subgroups, these subgroups are called natural strata.

Natural Stratification Mutually exclusive & exhaustive The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. 

Holding the 3rd variable constant Three-Variable Data Impact of one variable on another, ceteris paribus Holding the 3rd variable constant  This lecture focuses on how 3-variable data can be analyzed using stratification. This is the simplest model for looking at impact of a variable on another while holding the third one constant.

Remove Impact of Other Variables Explanatory Variable Response An important part of any research study is the ability to remove competing explanations so that the impact can be reasonable attributed to the same variable. In studying the relationship between a response variable and an explanatory variable, we should control covariates that can influence that relationship. c Jeff Lin, MD., PhD. Three-Way Table, 2 Same Level of Covariates Control of Alternative Explanations

Control of Alternative Explanations Why dissatisfied? MD Satisfaction For example, we may want to examine if the physician is causing the patients dissatisfaction or the various nurses that work with him. In doing so, we should control for the impact of the nurse in the team. We would stratify the data by nurses and examine the impact of MD within each strata. Then we can report the impact of the physician on patient experiences. Various RNs Control of Alternative Explanations

Table 1: Satisfaction Across Teams Satisfaction Example Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85%  This table is a 2 × 2 × 2 contingency table–two rows, two columns, and two layers. In this table we see how two physicians are working with two nurses and whether their patients have complained. The data is hypothetical but is easily available in complaint registries within most hospitals. The 684 patients classified in Table were patients at this hypothetical clinic. c Jeff Lin, MD., PhD. Three-Way Table, 4

Table 1: Satisfaction Across Teams Satisfaction Example Y Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85% X Z The variables in Table 1 are Y is whether the patient complained, having the categories yes and no. X is the physician in the team and Z is the nurse, each having two possible levels. George and Smith for the physicians and Jim and Jill for the nurses. We study the effect of physicians on complaints, treating nurses as a control variables. We want to estimate the impact of physicians after removing the contribution of the nurses. Table 1 has a 2 × 2 partial table relating each physicians to complaint when they worked with various nurses. These 2 by 2 tables are in color and are referred to as partial tables, because they show part of the larger table. The whole table lists the percent of complaints for combinations of physician and nursing teams.

Table 1: Satisfaction Across Teams Satisfaction Example Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85% 11.81% In the partial table when George was the physician, there were 11.81% more complaints when Jill was the nurse than when Jim was the nurse. When Jill and George team up, they do worse than when Jim and George team up.

Preferential Dependence Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85% 11.81% When Smith was the doctor, there were 2.8% more complaints when Jill was the nurse. Jill and Smith are better as a team than Jim and Smith. The impact of nurses seems to depend on which physician they are working with. The idea that difference of Jim and Jill depend on who they team up with is called Preferential Dependence. In preferential dependence the fixed level of one variable affects preferences for other variables. Violation of preferential independence are rare but they do occur and point to unusual data set. When this happens, separate analysis is needed for each of the partial tables. -2.8%

Real Example Adding a disease reduces mortality rate Examples of violation of preferential independence is given in the literature. While rare, it does occur.

Table 1: Satisfaction Across Teams Satisfaction Example Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85% The two rows at the bottom portion of Table 1 displays the marginal table. It results from summing the cell counts in Table 1 over physicians, thus combining the two partial tables for each physician. For example, we get the 15 in the marginal table by adding 11 by 4. Overall, 10.75% of patients seen by Jim and 7.85% of Jill’s patients complained. If we ignore the physician working in the team, the complaints were 2.9% more likely when Jim was the nurse. So overall Jim seems to have higher percent of complaints although when Jim and Smith work together they are the best team, they have no complaints. 2.9%

Table 1: Satisfaction Across Teams Simpson’s Paradox Table 1: Satisfaction Across Teams Physicians Nurses Complained Percent Satisfied Yes No George, MD Jim, RN 53 424 11.11% Jill, RN 11 37 22.92% Smith, MD 16 0.00% 4 139 2.80% Total 440 10.75% 15 176 7.85% The fact that association in the marginal table can have a different direction than association in the sub-group is called Simpson’s paradox. It can occur and should caution against blanket statements based on marginal tables without examining subgroups. True impact is only understood in the stratified subgroups, where we can control from team differences. -2.8% 2.9%

Stratification Compares impact in like situations In stratified analysis, impact of a variable is assessed by comparing its presence and absence in like situations. Then apples cancel apples and oranges cancel oranges, so the impact of variable is assessed without the influence of covariates.