Decision Analysis Lecture 12

Decision Analysis Lecture 12
Tony Cox My Course web site:

Agenda Correction to notes for Lecture 9, page 36
Problem set 10 solution Recommended readings Causal analytics

Result is a posterior beta distribution with updated n = 21 and x = 6
Bayesian estimation of proportion: Example with n = 20 trials, x = 5 successes Choose uniform prior Initial n = 1, x = 1, pbeta(q, x, n-x) w = c(0:100)/100 y = dbeta(w, 1, 1) Result is a posterior beta distribution with updated n = 21 and x = 6 z = beta(x + 1, n - x + 1) = dbeta(w, 6, 16) Posterior mean = (x + 1)/(n + 2) = 6/22 = 0.27

Homework #10 (optional) (Due by 4:00 PM, April 18 if you want it graded)
A deep sea oil drilling platform has a normally distributed lifetime (until failure) with a mean of 30 years and a standard deviation of 4 years. While it is operating, the platform produces oil worth $10M per year. Voluntarily stopping operations and closing down the platform costs $0. Having the platform fail while still in use leads to involuntary closure and a cost of $50M. At what age should the platform be voluntarily shut down (if it has not yet failed)? Hint: Continue until marginal benefit < expected marginal cost of continuing Hint: Use a hazard function calculator for normally distributed lifetimes, e.g.,

Oil platform solution At time t, expected marginal cost from continuing for an additional interval of length dt years is 50*h(t)dt million. Expected marginal benefit of operating for another dt years is 10*dt million dollars. Marginal benefit = marginal cost when 50*h(t) = 10, or h(t) = 0.2. For T ~ normal with mean 30 years, s.d. 4 years, h(30) = 0.2 (using calculator at In R: h(t) = f(t)/(1 – F(t)), dnorm(30, 30, 4)/(1-pnorm(30, 30, 4)) = 0.2 So, voluntarily shut down platform at age 30 years

Causal analytics library
BayesianNetworksWithoutTearsCharniak1991.pdf OverviewPearl2009.pdf CausalDiscoverySystemsBiologyLagani2016.pdf CausalGraphicalModelsElwert2013 CausalOrderingDashDruzdzel2008.pdf CausalParametersHeckman1999 DoesXcauseY_Mooij2016.pdf GrangerCausalityEconomicPolicyWhite2010.pdf HealthPolicyAnalysisTerza2011.pdf LiNGAMShimuzi2014.pdf

Recommended readings Statistics and causality, Pearl (2009) Graphical causal models (Elwert, 2013) Probabilistic computational causal discovery for systems biology (Lagani et al., 2016)

Causal Analytics: Agenda
Technical background Probabilistic causation Netica software and Bayesian network (BN) technology CAT software and information-based algorithms: CART trees, random forests, partial dependence plots, BN learning Causal analytics for risk management: Vision Learning causal DAG models from data: Information-based algorithms for causal discovery and inference Making the algorithms useful: R packages and the Causal Analytics Toolkit (CAT), dagitty software Applications to causal inference, prediction, attribution, optimization, explanation Summary, conclusions, and perspectives on causal analytics for risk analysis

Causal analytics in context
Analytics Goal: Use data to learn to act more effectively Descriptive analytics: What’s happening? What’s new? What’s changed, why? What to worry about? Predictive analytics: What’s likely to happen next? What will (probably) happen if we don’t change what we’re doing? Causal analytics: What can we do about it? What will (probably) happen next if we do things differently? Prescriptive analytics: What should we do? Evaluation analytics: How well is that working? Learning analytics: How might we do better? Collaborative analytics: How to do better together?

Technical background

Interpreting X  Y: What does “cause” mean?
“X is a cause of Y” denoted X  Y, has been used to mean many different things, not all of them consistent Probabilistic causation Associational causation Attributive causation Counterfactual/potential outcomes causation Computational (simulation, calculation order) causation Predictive causation Manipulative causation Mechanistic/explanatory causation

Probabilistic causality
Key idea: Causes make their effects more likely, or changes their pdfs. “X changes the probability of Y” is necessary but not sufficient for causation Usual meaning: Observing X changes the probability distribution for observed Y values Example: A high X value for an individual makes a high Y value for that individual more probable Desired meaning: Changing X changes the probability distribution of Y values Example: Reducing X makes high Y less probable This is manipulative causation, not probabilistic inference

All agree on what X  Y means for how to calculate probabilities of Y values
In the model X  Y, suppose that input X is a random variable and that the model gives Pr(outputs | inputs) = Pr(y | x) = Pr(Y = y | X = x). Then: Pr(Y = y) = xPr(Y = y | X = x)*Pr(X = x). Prediction formula for value of Y given uncertain input X Pr(y | x) and Pr(x) are stored in tables for Y and X in a Bayesian Network (BN) model Basis for “forward Monte Carlo” risk assessment Extension allows composition of probabilistic relations: X  Y = (X  Z)(Z  Y) in X  Z  Y Conditional: Pr(Y = y | X = x) = zPr(Z = z | X = x)*Pr(Y = y | Z = z) Unconditional: Pr(y) = xPr(y | x)*Pr(x)

Inference and prediction via Bayesian Network (BN) Solver
Netica DAG model: “True state  Observation” Store “marginal probabilities” at input nodes (having output arrows only) Store “conditional probability tables” (CPTs) at all other nodes. Make observations View results (or make query, then view) Solver calculates conditional probabilities

Charniak’s BN (1991)

Example: Family Out BN P(family out | lights-on = T, hear-bark = F) = ?

P(y | x), x = evidence/observations (or what-if assumptions)
P(family out | lights-on = T, hear-bark = F) = 0.501

Netica gives Bayesian update of all other probabilities, given evidence
P(family out | hear-bark = T, bowel problem = T) = 0.153

Lessons from example All probabilities of events are conditional!
No such thing as “the” probability that family is out. Depends on what one observes. All uncertain quantities have conditional probabilities for their values Easy to break problem into parts, specify each part, and propagate evidence to draw inferences or make probabilistic predictions using BN solver

Dissatisfactions with probabilistic causality
By Bayes’ Rule, if observing X makes Y more likely, then observing Y makes X more likely Example: Disease X  Symptom Y Proof: By conditional probability, Pr(X | Y) = Pr(Y | X)*Pr(X)/Pr(Y) This implies Pr(X | Y)/Pr(X) = Pr(Y | X)/Pr(Y) Probabilistic causality does not establish the direction of a causal link No guarantee of manipulative causality

Wrap-up on probabilistic causation
Useful as a screening rule: Effects should depend on their causes  Probability distributions of effects should depend on values of their causes Conversely, if Y is conditionally independent of X given the values of other variables, then, then X is not a direct cause of Y. This is the key information shown in DAG models such as X  Y  Z. Z is conditionally independent of X given Y. In a DAG model, effects are never conditionally independent (no arrow) of the their direct causes. Challenge: How to develop better screens for manipulative causality?

Essential machine-learning (ML) algorithms
Classification and Regression Trees (CART trees) Random Forest (RF) ensembles Partial dependence plots Bayesian Networks (BNs) Some key themes and principles Conditional independence, information network structure Conditional probabilities for quantifying dependencies Non-parametric methods Model ensembles, model averaging

CART Trees A powerful, popular method for data mining, machine learning, and CPT estimation In R, party, ctree, rpart, and other algorithms provide CART (Classification and Regression Tree) algorithms Can download applet from: Basic idea: “Always ask the most informative question next” for reducing uncertainty (conditional entropy) about the dependent variable. Partitions a set of cases into groups or clusters (the leaf nodes) with similar conditional probabilities for dependent variable.

How to read a CART tree Tips of the tree (“leaf nodes” give the conditional expected value (for continuous variables) or conditional distributions (for discrete variables) of the dependent variable, given the value ranges of the variables along the path from the top of the tree (the “root node”) to the leaf node. The dependent variable is called “y” by default in the tree Branches (“splits”) show the value ranges being conditioned on The tips of the tree also show how many cases in the data set are described by the combination of values leading to that leaf node. These are the “n” values at the leaf nodes Example: n = 92 days had tmin < 43 degrees and PM2.5 < 14 g/m3. An average of y = elderly people per day died on days with this description.

How classification and regression (CART or C&RT) trees works
Basic idea: Always ask the most informative question next, given answers so far. Questions are represented by splits in tree Leaf nodes show conditional means (or conditional distributions) of dependent variable Internal nodes show significance level for split: how significant are differences between conditional distributions Trees can handle continuous, categorical, ordered categorical, and binary variables and missing values

How tree-growing works: Recursive partitioning, stopping rules
Add new questions/splits if doing so improves predictions Goal: Reduce prediction errors by conditioning on relevant information Stop this “recursive partitioning” when further questions (splits in tree) do not significantly improve prediction. Classification & Regression Tree (CART) algorithm Recursive partitioning = tree-growing by successive splitting Which variables are informative for predicting the dependent variable can depend on what other predictors are included. Omitting month makes PM2.5 informative.

Improving a tree: Some refinements in advanced algorithms
Grow a large tree and prune back to minimize cross-validation error Fit multiple trees to random subsets of data and let them vote for best splits (“bagging”) Over-train on mis-predicted cases (“boosting”) Select random subsets of columns to use as predictors to diversify and de-correlate predictions Average predictions from many trees (“RandomForest” ensemble prediction) Join prediction “patches” together smoothly (MARS) Search for best subsets of predictors instead of myopically choosing the next splits

From concepts to software: Creating a CART tree in CAT
Select dependent variable first, by clicking on a column heading in the Data sheet Select predictor columns using Ctrl + click Can select in any order Click on desired analysis (Tree) Output appears on new tab (If unsure what analyses to do after columns selected, click on Analyze.)

Classification/Decision Tree Algorithms in Practice
Different decision tree (= classification tree) algorithms use different splitting, stopping, and pruning criteria. Examples of tree algorithms: CHAID: chi-square automatic interaction detection CART: Classification and regression trees, allows continuous as well as discrete variables MARS: Multiple adaptive regression splines, smooths the relations at tree-tips to fit together smoothly KnowledgeSeeker allows multiple splits, variable types ID3, C5.0, etc.

Interpreting a tree causally
The measured direct causes of a variable in a DAG model (its “parents” in the DAG) should appear in most/all CART trees for that dependent variable Indirect causes (its more remote ancestors) should not. Direct children should More remote descendants and siblings should not Manipulative causation: How would changing the answers to some of the questions change the frequency distributions of cases at leaf nodes? Li et al., Athey and Imbens, 2015,

Classification-tree tests of conditional independence
In a classification tree, the dependent variable is conditionally independent of all variables not in the tree, given the variables that are in the tree (at least as far as the tree-growing heuristic can discover). Starting with a childless node (output node), we can recursively seek direct parents of all nodes.

Using trees to identify potential parents for each node
Smoking  Heart disease sex, age  smoking

Generating CPTs with Trees
Classification trees grown using parents of a node provide exactly the information needed to fill in its Conditional Probability Table (CPT) More efficient than exhaustive tables because of “don’t care” conditions (fewer leaves than logical combinations of values of variables) Each leaf node gives conditional probabilities or a conditional expected value of the dependent variable, given the combination of values of its parents.

Decision psychology and practice: “Fast, Frugal Heuristics”
Classification trees provide a basis for quick, effective decision-making that often compares favorably to more computationally intensive statistical procedures. In the real world, simpler decision rules are often better/more robust than more complex prediction and optimization models and algorithms Todd and Gigerenzer,

Generalizations: Ensembles of trees (Random Forest)
Averaging over hundreds of trees gives more robust results and reduces prediction errors compared to any single tree Random Forest ensemble is “go to” black-box method for most predictive analytics

Partial dependence plots
A partial dependence plot summarizes how the dependent variable is predicted to change as one variable is changed, keeping all other variables with their real values

Bayesian Networks (BNs) show dependence relations among variables
A BN provides a high-level roadmap for descriptive, predictive, and causal analytics “A tree at every node” Each node has a conditional probability table (CPT) or regression model, CART tree, etc. describing how the conditional probabilities of its values depend on other variables.

Bayesian Networks (BNs) also show independence relations among variables
If no arrow connects two variables, they are conditionally independent of each other, given the other variables in the BN. Omitted variables can create statistical dependencies Conditioning on variables can also sometimes create dependencies Information principle for causality: Direct causes are not conditionally independent of their effects.

How to create BNs, CART trees, random forests, partial dependence plots in CAT
Select dependent variable Click on column heading Select main predictor Used in partial dependence plot, dagitty software Select other predictors In advanced BN modeling, can insert constraints on allowed, forbidden, an required arrows Click on “Analyze” or on names of analyses Tree, Bayesian Network, Importance Plot, Sensitivity Plot Outputs appear on new tabs.

Advanced features in CAT
Simulation-based power calculations for Bayesian Networks (BNs) and CART trees: How large would an effect of X on dependent variable have to be to be detected with high probability? Choice of algorithms for BN learning Predictive analytics toolkit with library of prediction algorithms, out-of-sample performance assessment

Information-based algorithms
Information-based algorithms automatically find dependencies and conditional independence relationships among variables in data. Example of a Bayesian network DAG. An arrow between two variables shows that they are mutually informative about each other, i.e., statistically dependent. Directions of arrows indicate a way to compute joint probabilities of variables from marginal distributions at input nodes and conditional probability tables at other nodes

Wrap-up: Information-based ML algorithms discover relationships in data
CART trees predict one variable from others Classification & Regression Trees RandomForest ensembles average predicted effects over many CART trees Partial dependence plots BN learning algorithms relate multiple variables at once BN interpretation (dagitty)

Causal analytics for improving decisions: Vision

Causal model: Pr(y | do(x), z)
Basic causal model Decision-maker (DM) chooses action or policy from a choice set of possible alternatives Outcome probabilities are determined both by the DM’s choice and by covariates and by events not controlled by the decision-maker DM’s choice affects probabilities of outcomes Causal models reveal how, with uncertainty bands action X  outcomes Y  other variables Z Causal model: Pr(y | do(x), z) dependence plot of E(Y | do(x)) E(Y | do(x)) = total effect of do(x) on E(Y), allowing other variables to adjust

Vision for causal analytics
Represent understanding of how the world works by a causal model. Use causal model to quantify how probabilities of consequences and expected utilities change as decision variables or policies are changed Given preferences for consequences and risk attitude (social utility function), solve for best policy

Benefits of causal analytics: Use data to answer…
What works? (policy evaluation) How well? (effect size estimation) For whom, under what conditions? (effects of covariates ) How do changes in inputs affect output probabilities? How have they in the past? Causal explanation, mediation analysis, path analysis How to cause desired changes? (decision analysis) What might work better? (trials, learning) What is the best achievable result? (optimization) What will happen if we make changes? (causal prediction) How sure can we be? (uncertainty analysis) What information would improve answers? (value of information (VOI) analysis)

Representing understanding via a causal graph (DAG)
Modular structure Each variable is conditionally independent of its more remote ancestors, given the values of its parents in the “DAG” (directed acyclic graph) Dependencies are quantified by conditional probability tables (CPTs) or trees

Components for causal DAG modeling: Representing knowledge via networks
A causal graph model is represented as a set of variables in a directed acyclic graph (DAG). Choice nodes (green rectangle) = decision (policy, controlled) variables Chance nodes (yellow) = random (state, uncontrolled) variables Value nodes (pink hexagon = objective function, value or utility function Deterministic functions, constants, conditional probability tables

Interpreting DAG structure
A variable’s probability distribution of values depends on values of variables pointing into it (“direct parents”). Structure: Each variable is conditionally independent of its more remote ancestors, given the values of its parents Example: Health Damage depends on Emissions Reduction only through Concentration and is CI of Emissions Reduction given Concentration DAG provides modular organization of knowledge and expertise

Causal Graph Modeling with DAGs
Quantification: Dependencies are quantified by CPTs or regression models A conditional probability table (CPT), regression model, CART tree, randomForest ensemble, etc. at each node calculates conditional probabilities for its values, given values of parents Nonlinear CPTs and interactions are allowed.

Using a causal DAG: Solve for best policy by probabilistic simulation of outcomes
Use causal model to quantify how probabilities of consequences change as decision variables or policies are changed Given preferences for consequences, solve for best policy Simulation-based (total) dependence plot: Decision variable is varied over a range of alternative (counterfactual) values. Other variables are drawn from their conditional distributions (CPTs) for each value of the decision variable.

Selecting “Probability Bands” in Analytica yields:
If correct causal model is known, then Analytica’s probabilistic simulation-based uncertainty analysis can be used to inform risk management decisions.

Wrap-up on example Analytica model
Run time = ~1 sec. Method = Monte-Carlo propagation of input probability distributions through influence diagram. Conclusion: An emissions reduction factor of 0.7 is a robust optimum for this decision problem, given the probabilistic model (DAG influence diagram model) mapping inputs to output probabilities.

Analytica Wrap-Up If the causal model is right and the values are agreed to, then this policy is approximately optimal with high confidence. Q1: But how sure can we be that the causal model is (approximately) right? What happens if it isn’t? Q2: How to learn useful causal model from data?

Summary: Vision for causal analytics
Represent understanding of how the world works by a causal model. Learn, validate, and document models with data Use causal model to quantify how probabilities of consequences and expected utilities change as decision variables or policies are changed Policies can be actions/interventions that change inputs, or decision rules that map data to actions Given preferences for consequences and risk attitude (social utility function), solve for best policy Set policy variables to maximize expected utility Decision analysis Perform sensitivity analyses, value-of-information (VOI) analyses; optimize timing of interventions Evaluate results, adaptively learn and improve policies

Decision Analysis Lecture 12

Similar presentations

Presentation on theme: "Decision Analysis Lecture 12"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Analysis Lecture 12

Similar presentations

Presentation on theme: "Decision Analysis Lecture 12"— Presentation transcript:

Similar presentations

About project

Feedback