Download presentation
Presentation is loading. Please wait.
Published byMarilynn Chase Modified over 6 years ago
1
Conditioning , Stratification & Backdoor Criterion
Farrokh Alemi, PhD. Sunday, August 28, 2016 This lecture introduce the concept of conditioning and stratification by conditions. The lecture is based on work of Pearl on causality and articles and book written by Morgan and Winship on counterfactual and causal inference. Morgan SL, Winship C. Counterfactual and causal inference. Cambridge University Press 2007
2
Causal Analysis of Observational Data
Misleading Comparisons People argue that in observational data there are many reasons why an outcome has occurred and comparing the outcomes of two patients is like comparing apples with oranges: an unequal and misleading comparison. We do not really know if the observed differences are due to the cause or to some alternative explanation. Here we do not know if the observed outcome is due to color differences or to the fact that one is an apple and the other an orange.
3
Causal Analysis of Observational Data
Misleading Comparisons Potential Outcomes Scientists have proposed different ways to solve this problem, Sociologists suggest that we need to examine potential outcomes in order to remove alternative explanations. The propensity scoring procedure is one of the best known approaches where analyst identify potential outcomes and compare observed and potential outcome. In this approach, all oranges are replaced with what would they have been if they were an apples, then we compare apples to apples and thus can determine if the color is really what makes the difference.
4
Simple, Easy, Fast, Graphical
Causal Analysis of Observational Data Conditioning or Stratification Potential Outcomes Misleading Comparisons Simple, Easy, Fast, Graphical But the easiest way to examine cause and effect, without introducing difficult counterfactual or potential outcomes, is through stratification and conditioning. In these approaches, causal probability networks provide the logic of which covariates should be used in the stratification. This lecture focuses on stratification and use of Bayes net for causal analysis.
5
Conditioning on Strata
C Cause Absent Cause Present Ceteris Paribus Stratification is a process of comparing two situations with one difference. In one situation, the cause is present and in the other the cause is absent. In stratification of observational data cases are matched on all aspects except one. These matches are not only on several conditions but all combinations of these conditions. Then comparing the two strata allows us to detect difference that the cause has made. Stratification allows us to set up the environment so that we are not comparing apples to oranges but apples to apples and oranges to oranges. Here we see two situations. We match every fruit within the first basket with every fruit in the second situation. Then the difference in outcomes in the two situations is attributed to the cause, present in the first but absent in the second situation. This is sometimes referred to as holding all other things constant or in economic terms “ceteris paribus.”
6
Different Names for Same Thing
Stratification Conditioning The conditioning or stratification approach is not new. In statistics, these ideas are behind stratified case control study design. Note that stratification, conditioning, ceteris paribus are different names for the same things, in all these concepts we look at the data within a subgroup that shares a specific feature. The idea of stratification to detect causal impact is simple and has been around since 1950s in statistical literature. Most policy makers and clinicians understand it. The method of analysis are simple and does not require conjecturing potential outcomes that have not occurred. Unlike other approaches, it does not require creation of potential outcomes, a statistical feat that is easy to do but hard to imagine. Finally, conditioning can be expressed in graphical and network terms, which as we will see shortly significantly reduce computational issues. Ceteris Paribus
7
Different Names for Same Thing
C X Y Absent Present The fact that stratification and conditioning are one and the same concept is best illustrated in the procedure to calculate conditional probability. The actual formal way of testing conditional independence is through stratification. The relationship between X and Y are examined in 2 partial table where the condition is either always absent or always present. Independence holds when in both situations X and Y are independent. In essence two strata are created, one in which condition is present and the other in which condition is absent. Within these two strata, there should not be any statistically significant relationship between X and Y. Thus the method by which tests of independence are carried out relies on stratification.
8
Explosion Impractical Many Few Ouch
Stratification was around in 1950s, before the creation of potential outcome methods. Some scientists abandoned stratification. The main reason was because stratification or conditioning was problematic. In most realistic problems, there were too many covariates and combinations of covariates. Stratification was not practical in these situations. Sufficient data may not be available to accurately assess the impact of the cause within each strata. The situation changed with Bayesian networks came about and concepts of d-separation and Back-door were clarified. In this lecture, we define these concepts and see how Bayesian networks help make stratification more practical. Ouch
9
Terminology: Nodes, Arcs
X Y To understand how stratification can be done efficiently, we need to first introduce terminology for causal graphs and discuss back-door criterion. Pearl’s work provides a language and terminology for discussing and analyzing causality. We are going to skip the mathematical proofs of his work and instead focus on the practical implications for stratification. We want to find the causal effect of X on Y and C is a variable that confounds this causal effects. This is a simple network of 3 nodes. In our discussion, all networks are assumed to be Directed Acyclical graphs. To begin with we start with three events, shown here as X, Y and C. In this visualization, events are presented by a circle. The arcs show the direction of influence and the network depicted is acyclical because starting from any node and following the direction of influence you cannot come back to the same node. If colored the events in the network are observed in the study.
10
Terminology: Hidden C X Y
If hollow, the event cannot be observed. Here C is a hidden confounder for X and Y. It is unknown to the study. It affects both X and Y and these effects are not measured as the event C is not observed. Under certain circumstances, we can infer that the event C must exist even though it is hidden. Even though we were not clever enough to include it in our data collection efforts, we can know that it must exist because contradictory relationships are seen between X and Y.
11
Terminology: Fidelity
Networks Independence Network graphs and assumptions of independence are interchangeable. This is called the fidelity principle. Each independence assumption has an equivalent graph presentation and vice versa. Every arc in the network indicates a dependence. Two nodes in a network are independent if they cannot be connected to each other in the network. Note that in a network both missing and drawn arcs are informative and may indicate independence or dependent relationships. One contribution of Pearl to the field was to point out how networks and independence assumptions are interchangeable.
12
Common Effect Common Causes Causal Chains Pearl bases his discussion of causality on three prototype networks that have difference independence assumptions. The entire world of possible relationships are explained in terms of three prototypes: causal chains, common causes and common effects. All networks have one or all of these three prototypes. The causal chain and common causes are relatively straightforward concepts that we will describe briefly. But the common effect is a problematic concept in stratification and we will spend some time explaining why this prototype is so important in conditioning and stratification.
13
Mediator C X Y The first prototype suggested by Pearl is the mediator prototype. Here C is a mediator between X and Y. If you start from X, you run into C, before you can reach Y. The effects of X on Y is entirely through its effect on C. This is also known as a chain of causality: X causes C and C causes Y. Many examples of causal chains exist. For example, a program may raise costs which then can reduce demand for the program.
14
Mediator C X Y RN Turnover Increase Cost Reduce Demand
Many examples of causal chains exist. For example, RN turnover may increase cost of nursing home, which then can reduce demand for its services.
15
2. Y Independent of X Given C
Mediator C X Y 1. Y Depends on X In graphs, direct relationship among variables is shown by a directed arc. Even though X and Y have no direct arc connecting them, this does not mean that they are independent. As X changes it causes changes in C and subsequently C causes changes in Y, so clearly the two variables are related to each other, albeit through a third variable. Two conditions must be met for this prototype to be occurring. First X and Y must be dependent but second these two variables should be conditionally independent from each other, given C. In data, we can verify if X and Y are related but conditionally independent. If so, then C is a mediator and is located in between X and Y. 2. Y Independent of X Given C
16
Common Cause C X Y The second type of causal relationship among three variables is called common cause. This is also known as mutual dependence. Here C is the cause of both X and Y. Again X and Y are unconditionally associated with C but conditioning on C they are independent. Notice that conditioning on C, there is no way to travel from X to Y so these two variables are independent. Many examples of common cause exist but one of the simpler examples is the fact that a medication may cure a disease but have a side effect.
17
2. Y Independent of X given C
Common Cause C X Y 1. Y Depends on X Again X and Y are unconditionally associated with C but they are independent if we condition on C. 2. Y Independent of X given C
18
Common Cause C X Y Medication Cure Side Effect
Many examples of common cause prototype exist but one of the simpler examples is the fact that a medication may both cure a disease and have a side effect. Cure Medication Side Effect
19
Common Effect C X Y The third type of causation among three variables is called common effect. Now both arrows from X and Y point to C.
20
Common Effect C X Y Severity Treatment Outcomes
A good example of this prototype is that both the patient’s severity of illness and the clinical intervention may affect the observed outcomes. Outcomes Severity Treatment
21
Common Effect C X Y 1. Y Independent of X 2. Y depends on X given C
The set of independent assumptions that must hold for a common effect are quite surprising and the main reason for why stratification in these situations could be problematic. Pearl has shown that two conditions must hold. First Y and X should be independent and second given the common effect, Y and X should be dependent. These conditional statements may seem strange at first and examples could clarify them. 2. Y depends on X given C
22
No Relationship in Population
Here we see the relationship between time with physician and waiting time in the clinic. There is no relationship between these two factors in these data. In fact the correlation is zero. You can change the waiting time from 10 minutes to 30 minutes and it does not seem to change the time with the physician at all. But what will happen if we separate out the satisfied and dissatisfied patients? The two variables are not related in the population but are they unrelated in various subgroups as well? Correlation = 0
23
Relationship in Subgroups
Supposed the green points show satisfied patients and the red ones dissatisfied patients. Now among the satisfied patients there is a relationship between time with MD and waiting time. The longer the physician takes the more the waiting time. The relationship only exists in the subgroups and it did not exist in the total sample. If we do not know whether the patient is satisfied or not, we see no relationship. But among satisfied patients we do see a relationship. When the relationship does not exist for the entire population but exists in the subgroup, then a common effect is suspected. In these circumstances, both “time with MD” and “waiting time” affect patients satisfaction. Correlation = -0.56
24
Patient’s Satisfaction
Example of Common Effect Patient’s Satisfaction RN MD Here is another example, both the physician and the nurse are part of a multi-disciplinary team that affect the patient’s satisfaction with the visit. If the physician or the nurse have a poor bedside manner, the patient would be dissatisfied. Each has an independent impact on patient’s satisfaction. We do not need to know what the physician did to know that the nurse could affect patients satisfaction. This independence in the graph is shown by the fact that there is no way for us to start from either the nurse or the physician and end up with the other. The arrows make sense and the independency in the graph fits our intuition. But what about the conditional dependence. How could knowing that the patient is dissatisfied make these two independence causes of dissatisfaction all of sudden dependent?
25
Example of Common Effect
Dissatisfied RN MD When the patient is dissatisfied, there are two causes possible. If we know anything about one cause, for example we know that the physician did not cause the problem, then we can pretty sure that the other party cause it. In essence, knowing that the common effect will open a new link among the two causes because knowing one cause tells us something about the other.
26
If Not This Then The Other
C = c X Y In general, when 2 or more causes have a common effect, conditioning on the effect, i.e. stratifying the data by the presence or absence of the effect, will cause the creation of a new dependency that was not previously shown in the network. This new relationships says that because we have seen the occurrence of the effect, then knowledge that one of the common causes has not occurred changes the probability that other causes have occurred. We can infer that the other cause is more probable.
27
Back Door C = c X Y This phenomena is known in network analysis as opening a back door and it is the main reason why in complex networks stratification could lead to problematic conclusions. The analogy is that a back door has been opened that allows X to influence Y, while they were independent from each other before. Notice that the backdoor enables X to affect Y or vice versa.
28
Return on IT Investment
Use of IT IT Investment Revenue Suppose we want to understand the relationship among IT investment, IT use and Revenue in our organization. Alemi and colleagues used Pearl’s method to understand the relationship among these three variables. To start with, all three variables are highly correlated. IT investment is correlated with revenues, perhaps because the more money we have the more we can invest in computers or vice versa because the more we automated sales such as e-commerce leads to more revenue for our organization. No matter how these two variables are causing each other they clearly are associated in the data. IT investment and use are also logically related, as the more computers we make available perhaps the more employees use it. Finally computer use and revenue are also highly correlated, again perhaps because employees work on computers and fulfill orders that bring in the revenue. This is so far an undirected graph. Taken from Alemi F, Zargoush M, Oakes JL Jr, Edrees H. A Simple Method for Causal Analysis of Return on IT Investment. J Healthc Eng Mar 1;2(1):43-54.
29
Return on IT Investment
Use of IT IT Investment Revenue To direct this graph, the first thing we notice is that it is not possible to use computer unless we have invested in them and they are available. So the direction of the causality starts from IT investment and goes to use of IT. We do not know about the rest and we need to check on assumptions of independence to verify what might be occurring.
30
Return on IT Investment
Use of IT IT Investment Revenue Suppose we find that use of IT makes organization’s revenue independent of IT investment. Then clearly use of IT is a mediator among the other two variables. We have a causal chain. In fact, this is the causal chain behind assumption behind return on investment calculations.
31
Return on IT Investment
Use of IT IT Investment Revenue When we examined the data, suppose we did not find support for the independence assumption for a causal chain. Suppose we found support for a common effect, i.e. we found that (a) IT investment is not related to organization’s revenue but (b) among users of IT, IT investment and revenue are interdependent. Then we have two causes leading to more use of IT. The logic for why these two relationships exist can be imagined as the following. Revenue increases use of IT because the more we sell, for example in e-commerce, the more we want to use the tools we have to sell. It Investment leads to more use because the better computers we have the more we use them. In addition, the logic that why revenue may be unrelated to IT investment might be that e-commerce is a small part of our company at this point. Among users of IT, there is a strong relationship but across all users there is no relationship. In the end, of course, the graph is not based on what we think is happening but what the data show. For the time being assume that the data show that the assumptions of the common effect have been met. Investment not related to revenue Among users of IT, revenue depends on investment
32
Opening a Back Door Extensive Use IT Investment Revenue
If this is true, then among users of IT, there must be a backdoor relationship between IT Investment and revenue. These two variables were conditionally independent before but now are dependent. Why? Think about it. Two things can cause more use: more revenue and more IT investment. If we know that there is extensive use then at least one of these two causes must have occurred. If we know that we have not put more money into IT investment, then it must be that the additional use was driven by revenue. This is the sort of thing that happens when an organization is doing well and employees want to repeat their success in generating more sales by using the IT infrastructure more extensively. Success in sales is leading to more effort. The point of this example is that conditioning on the common effect has made two independent causes dependent and a new, previously not shown relationship, has emerged, changing the very network of how variables are related to each other. There might now be a backdoor from the effect to the cause. Such a backdoor distorts the reported impact of cause on the effect and we need to close this backdoor to get an accurate measure.
33
Conditioning Simplifies Networks
Causal Chain C X Y C=c X Y In general conditioning on the mediator in a causal chain breaks the graph up into two separate nodes. It simplifies the network of relationships.
34
Conditioning Simplifies Networks
Causal Chain C X Y C=c X Y Common Cause in C=c X Y C X Y Conditioning on the common cause, does the same. It breaks up the graph. Stratifying or conditioning on C in these two situations allows us to calculate the independent effects of X and Y.
35
Conditioning Simplifies Networks
Causal Chain in C X Y C=c X Y Common Cause in C=c X Y C X Y But conditioning on the common effect makes the graph more complex by opening a back door. It actually makes variables previously unrelated to depend on each other and makes the estimation of the effects of either X or Y more difficult. Common Effect in C=c X Y C X Y
36
Obesogenic Anti-Depressant
Obesity Diabetes Infection Antibiotic Treatment ? Let us look at a more complex example. This network shows 6 relationships or arcs. The question we want to figure out is whether antibiotic treatment leads to diabetes. We know that obesogenic antidepressants affects the emergence of diabetes as well as obesity. We also know that infections alter micro-biome and eventually could lead to weight gain and obesity. So we have a complex set of relationships. To examine the effect of antibiotics on diabetes we have to close any path from diabetes to antibiotics, from the effect back to the cause. A possible candidate is obesity, as it blocks both the causal chains from antidepressants and from infection. But obesity is the common effect of two causes and therefore it may open a back door. Does Antibiotic Use Increase Diabetes
37
Obesogenic Anti-Depressant
Obese Patients Diabetes Infection Antibiotic Treatment ? If we condition on obesity, we look at only subgroup of patients who are obese or not obese. Here we show the relationships among obese patients. Among obese patients, there is a backdoor from antidepressants to or from infections. Now there is a path from diabetes to antidepressants, to infection, and back to antibiotics. So there is a back door and the effect of antibiotics on diabetes is distorted.
38
Obesogenic Anti-Depressant
Obese Patients Diabetes Infection Antibiotic Treatment ? Now there is a path from diabetes to antidepressants, to infection, and back to antibiotics. So there is a back door and the effect of antibiotics on diabetes is distorted. Note that in finding the backdoor we do not need to follow the direction of influence in the network. We can move in any direction and all we are concerned about is the associations between the various events.
39
Any Association that Starts from Effect Ends in Cause
Definition of Backward Path Any Association that Starts from Effect Ends in Cause In measuring the causal impact of two variables, a backdoor path is any path from the effect to the cause. Note that movement along the backdoor path does not depend on causal direction. If a path exists, then variables on the path are associated with both the cause and its effect and therefore confound the calculation of causal effect.
40
Definition of Backward Path
Must Point to the Cause Note that there is also a requirement that the path must end and not start from cause. When the path originates from the cause it is called a forward path. Backdoor path starts from effect and goes back to the cause, hence the name that it is backwards. It is ok to have a forward path and in fact it helps in calculation of causal impact but backward paths distort causal impact because every point on the path is associated with both the cause and the effect, creating distortions in impact of cause on the effect.
41
Definition of Backward Path
X Y To estimate the impact of cause on effect, there should not be any backward path that start from the effect and end up in the cause. Here there is a path from Y back to X, so the causal impact of X on Y cannot be calculated without the distortion cause by this path. Another way to say this is that the variable C is confounded with impact of X and Y because it is associated with both of them.
42
Block All Backward Paths
Reduce Combinatorial Explosion Block All Backward Paths One way is to condition on the minimum set of nodes that block all backward paths. If all backward paths are blocked then the impact of the cause on the outcome can be assessed accurately without needing the stratification of other events in the network. This reduces the size of combinatorial events that should be examined since the focus is on events that block the backward path and not all possible events. This strategy makes the stratification more practical without losing accuracy.
43
Mediator, Common Cause, but not Common Effect
Blocking All Backward Paths Mediator, Common Cause, but not Common Effect The procedures for identifying events that block all back doors are laid out by Pearl. One can examine each path from outcome to the cause and one of these three types of events should be used to block the path. A mediator or a common cause, but not a common effect or descendent of a common effect, can be used to block the path.
44
What is causal impact of D on Y
Can You Find the Backdoor Paths? A B C F D G Y U V ? Suppose we are interested in the relationship between D and Y. We want to estimate the causal effect of D on Y. In this network there are two hidden nodes, U and V, which are not measured or observed but are hypothesized to affect the observed variables. Both U and V are common causes. The observed variables include several common effects. G and F have the common effect on Y. A, B, and C have common effects on D. There are also a number of causal chains such as A to D to Y. Or from B to D to Y. To effectively stratify these data to discover the relationship between D to Y, we need to find the back doors. What is causal impact of D on Y
45
First Backdoor Path ? A B C F D G Y U V
There are two backdoor paths between Y and D. The first path is shown here. Note that the path connects to Y and ends in D. Thus the two condition for a path are met. There is a backward association path between Y and D and the path is pointed to the cause. If we stratify A then the path is blocked, as A is not part of a collider in this path. Within this path it is a mediator. If we stratify F this path is blocked, as F is also a mediator in this path.
46
Second Backdoor Path ? A B C F D G Y U V
This is the second pathway. It goes through both the hidden events. Now conditioning on A is not sufficient because it is a common effect in this path. Thus conditioning on A is not productive and does not block the path. But conditioning on F does. It is a mediator that blocks both the first back door path and the current back door. Therefore, this is the most reasonable event to condition on.
47
Minimal Block Set ? A B C F D G Y U V
Stratifying F is sufficient to close both backward paths. Instead of conditioning on 5 possible covariates of D and Y, we have found one event, i.e. F, that is a sufficient block to all backward paths. Therefore we can use just 1 variable for stratification. In this network, controlling for F is sufficient to remove all distortions of causal impact of D on Y. After F is stratified, then no node is associated with both D and Y. There is no longer any confounding. The two hidden nodes, even though not measured, are still controlled for as they cannot affect both D and Y.
48
No Need to Condition on All Alternative Explanations
Blocking Back Door Paths No Need to Condition on All Alternative Explanations The blocking of backward paths shows that we do not need to statistically control for all alternative explanations of outcomes. This allows us to focus on some but not necessarily all covariates in the study. It provides a minimal set of events that should be statistically controlled.
49
Algorithm for Minimal Block Set
Step 1: Connect all nodes This pseudo algorithm shows how we can identify a backdoor without constructing the full directed acyclical graph. We begin with connecting all nodes so that in the next step we can remove all independent links.
50
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent nodes Statistical Test Clinically Meaningful Cross Validate Test for independence and remove all arcs among independent nodes. Testing of independence is a statistical procedure. Keep in mind that when there are massive data, you want to make sure that the test detects at least a clinically meaningful difference. In massive data, small differences could be significant but not clinically relevant. When dealing with large number of variables, use one of the variables to split the data into two sets and cross-validate the test of independence. For example, use gender to test independence of the two variables once among men and then among women. Accept dependence only if the relationship exists in both sets.
51
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent events Step 3: Identify paths from outcome to treatment Start from outcome Connect to arcs that link to existing nodes in path For nodes with multiple arcs start new paths Stop when no arcs left or when treatment reached Drop paths that do not reach treatment In step 3 we iteratively identify the paths that start from outcome and go to treatment. In the first step we add all arcs that connect to outcome node. Then we add all arcs that previously were connected to the outcome node. Each time multiple arcs connect to the same node a separate path is started. Proceed in this fashion until all paths that end in treatment are identified or there are no arcs left to process. Drop all paths that were started but did not end in treatment. Note that identifying a path does not require knowledge of the direction of the acyclical graph.
52
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent events Step 3: Identify paths from outcome to treatment Step 4: Exclude common effects In step 4, we take any triplets inside the path that are directly connected to each other and test for common effect or collider condition. If the test of independence indicates a common effect then the middle event is excluded from potential blocks. For each triplet, test assumptions of common effect Exclude common effect nodes from blocks
53
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent events Step 3: Identify paths from outcome to treatment Step 4: Exclude common effects In step 5, we check that all identified paths end find blocks that are shared across paths. We select as a block the node that is shared across most path. Additional nodes are added until all paths are blocked. Step 5: Check paths end in cause Drop paths if last node mediates cause and effect
54
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent events Step 3: Identify paths from outcome to treatment Step 4: Exclude common effects In step 5, we find blocks that are shared across paths. We select as a node that is shared across most paths and thus will block highest number of paths. Additional nodes are added until all paths are blocked. Step 5: Check paths end in cause Step 6: Select smallest set that blocks all paths
55
Algorithm for Minimal Block Set
Step 1: Connect all nodes Step 2: Remove arcs between independent events Step 3: Identify paths from outcome to treatment Step 4: Exclude common effects In these 6 steps, the computer can find the minimum set of blocks for all paths in the network without completely orienting the directed acyclical graph. Step 5: Check all paths end in cause Step 6: Select smallest set of blocks
56
Blocking Back Door Path
Take Home Lessons Blocking Back Door Path This lecture has shown how blocking backdoor path could remove confounding and provide a way of thinking about which exogenous variable, whether observed or hidden. Blocking backdoors provides a clear criterion for making stratification computationally practical. It solves the Achilles' heel of stratification by reducing the explosion of combinations of covariates.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.