Decision Analysis Lecture 4 Tony Cox My e-mail: tcoxdenver@aol.com Course web site: http://cox-associates.com/DA/
Agenda Problem set 3 solutions Assignment 4: Bayesian inference Simulation-optimization for Joe’s pills Assignment 4: Bayesian inference Introduction to Netica for Bayesian inference Wrap-up on decision trees Binomial distribution
Homework #4 (Due by 4:00 PM, February 14) Problems Machines Fair coin Readings Required: Clinical vs. statistical predictions, http://emilkirkegaard.dk/en/?p=6085 Recommended: Important probability distributions Binomial https://www.utdallas.edu/~scniu/OPRE-6301/documents/Important_Probability_Distributions.pdf Recommended: Binomial distribution in R http://www.r-tutor.com/elementary-statistics/probability-distributions/binomial-distribution http://www.stats.uwo.ca/faculty/braun/RTricks/basics/BasicRIV.pdf
Assignment 3, Problem 1 (ungraded) A fair coin is tossed once. Draw the risk profile (cumulative distribution function) for the number of heads. Purpose: Be able to draw, interpret risk profiles Practice! You do not have to turn this in, but we will go over the solution next class Helpful background on discrete CDFs: www.probabilitycourse.com/chapter3/3_2_1_cdf.php
Assignment 3, Solution 1 A fair coin is tossed once. Draw the risk profile (cumulative distribution function) for the number of heads. Solution: http://www.gaussianwaves.com/2008/04/probability/
Assignment 3, Problem 2 Joe’s medicine Joe takes pills to reduce his risk of heart attack Pharmacist can prescribe for him either 1 pill per day at full strength, or 2 pills per day, each at half strength The probability that Joe forget to take any given pill on any occasion is p. Its value is uncertain. Here is how pills affect daily heart attack risk: If he takes full strength pill, multiply his risk by 0.5 (it is cut in half) If he takes 1 half-strength pill, multiply risk by 0.7 If he takes both half-strength pills, multiply his risk by 0.5 If he takes no pill, multiply risk by 1 What should the pharmacist prescribe? Please submit answer as two ranges (intervals) of p values for which the best choice is (A) Prescribe 1 full strength pill; (B) Prescribe 2 half-strength pills
Binomial distribution with parameters p and N = 2 The probability that Joe takes 0 pills is p2 Pr(takes 2 pills) = (1- p)2 Pr(takes 1 pill) = p(1 - p) + (1 - p)p = 2p(1-p)
Assignment 3, Solution 2 Joe’s medicine The probability that Joe forgets to take any pill is p. Here is how pills affect daily heart attack risk: If he takes full strength pill, multiply his risk by 0.5 (it is cut in half) If he takes 1 half-strength pill, multiply risk by 0.7 If he takes both half-strength pills, multiply his risk by 0.5 If he takes no pill, multiply risk by 1 What should the pharmacist prescribe? 1 full pill per day reduces Joe’s risk by p*1 + (1-p)*0.5 = 0.5 + 0.5p 2 half pills per day reduce risk by (p2)*1 + 2p(1-p)*0.7 + (1-p)2*0.5 = p2 + 1.4p – 1.4p2 + (1 -2p + p2)*0.5 = 0.1p2 + 0.4p + 0.5 2 pills are better than 1 (i.e., have lower risk for Joe) if 0.5 + 0.5p > 0.1p2 + 0.4p + 0.5 0.5p > 0.1p2 + 0.4p 0.1p > 0.1p2 p > p2 But p > p2 for all 0 < p < 1. If p = 0 or 1, they are equally good. Otherwise, 2 pills are always better than 1, for all p in (0, 1).
Graphical solution 2 pills deterministically dominates 1 pill over the whole (infinite) set of states in (0, 1). If EU(a) lines crossed, then we would have to assess probabilities for the values of p. Simulation-optimization: Could solve using decision tree and simulation of EU(a), given model {u(c), Pr(s), and Pr(c | a, s)}. (Here, state s is p.) For each a: Draw s from Pr(s) Draw c from Pr(c | a, s) Evaluate u(c) Repeat and average u(c) values Select a with greatest mean(u(c))
Assignment 3, Problem 3 Certainty Equivalent calculation If you buy a raffle ticket for $2.00 and win, you will get $19.00; else, you will receive nothing from the ticket. The probability of winning is 1/3 Your utility function for final wealth x is u(x) = log(x) Your initial wealth (before deciding whether to buy the ticket) is $10 What is your certainty equivalent (selling price) for the opportunity to buy this raffle ticket? Please submit one number Should you buy it? Please answer Yes or No
Assignment 3, Solution 3 Certainty Equivalent calculation If you buy a raffle ticket for $2 and win, you will get $19.00; else, you will receive nothing from the ticket. The probability of winning is 1/3 X = random variable for final wealth if you buy ticket = 10 - 2 +19 = $27 with probability 1/3, else 10 - 2 = $8. Your utility function for final wealth x is u(x) = log(x) Your initial wealth is $10 Let CE = CE(X) = CE of final wealth if you buy ticket u(CE) = EU(X) = (1/3)*log(10 - 2 + 19) + (2/3)*log(10 - 2) = 2.4849. CE = exp(2.4849) = $12. So, deciding to buy the ticket increases your CE(wealth) from $10 to $12. This transaction is worth $2 to you.
Assignment 3, Solution 3 Certainty Equivalent calculation u(CE) = EU(X) = (1/3)*log(10 - 2 + 19) + (2/3)*log(10 - 2) = 2.4849. CE = exp(2.4849) = $12. So, ticket increases your CE(wealth) from $10 to $12 and is worth $2 to you Note: EMV(X) = (1/3)*(10 - 2 + 19) + (2/3)*(10 - 2) = $14.33, so your risk premium is $14.33 - $12.00 =$2.33 Note: Suppose initial wealth is 1000: Then CE(final wealth) is exp((1/3)*log(1000 - 2 + 19) + (2/3)*log(1000 - 2)) = 1004.29 (compared to EMV = 1000 + (1/3)*17 + (2/3)*(-2) = 1004.33). Risk premium is $0.04.
Assignment 4,Problem 1: Fair Coin Problem (due 2-14-17) A box contains two coins: (a) A fair coin; and (b) A coin with a head on each side. One coin is selected at random (we don’t know which) and tossed once. It comes up heads. Q1: What is the probability that the coin is the fair coin? Q2: If the same coin is tossed again and shows heads again, then what is the new (posterior) probability that it is the fair coin? Solve manually and/or using Netica.
Assignment 4, Problem 2: Defective Items (due 2-14-17) Machines 1, 2, and 3 produced (20%, 30%, 50%) of items in a large batch, respectively. The defect rates for items produced by these machines are (1%, 2%, 3%), respectively. A randomly sampled item is found to be defective. What is the probability that it was produced by Machine 2? Exercise: (a) Solve using Netica (b) Solve manually E-mail answer (a single number,) to tcoxdenver@aol.com
Introduction to Bayesian inference with Netica®
Example: HIV screening Pr(s) = 0.01 = fraction of population with HIV s = has HIV, s′ = does not have HIV y = test is positive Pr(test positive | HIV) = 0.99 Pr(test positive | no HIV) = 0.02 Find: Pr(HIV | test positive) = Pr(s | y) Subjective probability estimates?
Solution via Bayesian Network (BN) Solver DAG model: “True state Observation” DAG = “directed acyclic graph”: Nodes and arrows, no cycles allowed Store “marginal probabilities” at input nodes (having output arrows only) Store “conditional probability tables” at all other nodes. Make observations Enter query Solver calculates conditional probabilities
Solution in Netica Step 1: Build model, compile network
Solution in Netica Step 1: Build model, compile network Step 2: Condition on observation (right-click, choose “Enter findings”), view conditional probabilities
Wrap-up on Netica introduction User just needs to enter model and observations (“findings”) Netica uses Bayesian Network algorithms to update all probabilities (conditioning them on findings) We will learn to do this manually for small problems Algorithms and software are essential for large, complex inference problems
Review and wrap-up on decision trees and probabilities
Decision tree ingredients Three types of nodes Choice nodes (squares) Chance nodes (circles) Terminal nodes / value nodes Arcs show how decisions and chance events can unfold over time Uncertainties are resolved as time passes and choices are made
Solving decision trees “Backward induction” “Stochastic dynamic programming” “Average out and roll back” implicitly, tree determines Pr(c | a) Procedure: Start at tips of tree, work backward Compute expected value at each chance node “Averaging out” Choose maximum expected value at each choice node
Obtaining Pr(s) from Decision trees http://www. eogogics Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)
Obtaining Pr(s) from Decision trees http://www. eogogics Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)
What happened to act a and state s. http://www. eogogics Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)
What happened to act a and state s. http://www. eogogics Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)
What happened to act a and state s. http://www. eogogics What are the 3 possible acts in this tree?
What happened to act a and state s. http://www. eogogics What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.
What happened to act a and state s. http://www. eogogics Optimize decisions! What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.
Key points Solving decision trees (with decisions) requires embedded optimization Make future decisions optimally, given the information available when they are made Event trees = decision trees with no decisions Can be solved, to find outcome probabilities, by forward Monte-Carlo simulation, or by multiplication and addition In general, sequential decision-making cannot be modeled well using event trees. Must include (optimal choice | information)
What happened to state s. http://www. eogogics What are the 4 possible states?
What happened to state s. http://www. eogogics What are the 4 possible states? C1 can succeed or not; C2 can be high or low demand
Acts and states cause consequences http://www. eogogics
Key theoretical insight A complex decision model can be viewed as a (possibly large) simple Pr(c | a) model. s = selection of branch at each chance node a = selection of branch at each choice node c = outcome at terminal node for (a, s) Pr(c | a) = sPr(c | a, s)*Pr(s) Other complex decision models can also be interpreted as c(a, s), Pr(c | a, s), or Pr(c |s) models s = system state & information signal a = decision rule (information act) c may include changes in s and in possible a.
Real decision trees can quickly become “bushy messes” (Raiffa, 1968) with many duplicated sub-trees
Influence Diagrams help to avoid large trees http://en. wikipedia Often much more compact than decision trees
Limitations of decision trees Combinatorial explosion Example: Searching for a prize in one of N boxes or locations involves building a tree of depth N! = N(N – 1)…*2*1. Infinite trees Continuous variables When to stop growing a tree? How to evaluate utilities and probabilities?
Optimization formulations of decision problems Example: Prize is in location j with prior probability p(j), j = 1, 2, …, N It costs c(j) to inspect location j What search strategy minimizes expected cost of finding prize? What is a strategy? Order in which to inspect How many are there? N!
With two locations, 1 and 2 Strategy 1: Inspect 1, then 2 if needed: Expected cost: c1 + (1 – p1)c2 = c1 + c2 – p1c2 Strategy 2: Inspect 2, then 1 if needed: Expected cost: c2 + (1 – p2)c1 = c1 + c2 – p2c1 Strategy 1 has lower expected cost if: p1c2 > p2c1, or p1/c1 > p2/c2 So, look first at location with highest success probability per unit cost
With N locations Optimal decision rule: Always inspect next the (as-yet uninspected) location with the greatest success probability-to-cost ratio Example of an “index policy,” “Gittins index” If M players take turns, competing to find prize, each should still use this rule. A decision table or tree can be unwieldy even for such simple optimization problems
Other optimization formulations maxa A EU(a) Typically, a is a vector, A is the feasible set More generally, a is a strategy/policy/decision rule, A is the choice set of feasible strategies In previous example, A = set of permutations s.t. EU(a) = ∑cPr(c | a)u(c) Pr(c | a) = ∑sPr(c | a, s)p(s) g(a) ≤ 0 (feasible set, A)
Advanced decision tree analysis Game trees Different decision-makers Monte Carlo tree search (MCTS) in games with risk and uncertainty https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/, http://www.cameronius.com/research/mcts/about/index.html http://www.cameronius.com/cv/mcts-survey-master.pdf Generating trees Apply rules to expand and evaluate nodes Learning trees from data Sequential testing http://stackoverflow.com/questions/23803186/monte-carlo-tree-search-implementation-for-tic-tac-toe
Summary on decision trees Decision trees show sequences of choices, chance nodes, observations, and final consequences. Mix observations, acts, optimization, causality Good for very small problems; less good for medium-sized problems; unwieldy for large problems use IDs instead Can view decision trees and other decision models as simple c(a, s) models But need good optimization solvers!
Road map: Filling in the normal form matrix Assessing probabilities Eliciting well-calibrated probabilities Deriving probabilities from models Estimating probabilities from data Assessing utilities Utility elicitation Single-attribute utility theory Multi-attribute utility theory
Binomial probability model
Some useful probability models Uniform = unif Binomial (n trials, 2 outcomes) = binom Poisson (“rare events” law) = pois Exponential (waiting time) = exp Normal (sums, random errors) = norm Beta (proportions) = beta p = distribution function (cdf) r = random sample (simulation) q = quantile, d = density
Binomial model pbinom(x, n, p) Two outcomes on each of n independent trials, “success” and “failure” Probability of success = p for each trial independently Expected number of successes in n trials with success probability p = ? Probability of no more than x successes in n trials with success probability p = pbinom(x, n, p)
pbinom(x, n, p) includes probability of x succeses Expected number of successes in n trials with success probability p = np Probability of no more than x successes in n trials with success probability p = pbinom(x, n, p) Note that pbinom is for less than or equal to x successes in n trials
Binomial model pbinom(x, n, p) 2 outcomes on each of n independent trials P(success) = p for each trial independently E(successes in n trials) = np = mean Pr(x successes in n trials) = nC xpx(1 – p)n-x = dbinom(x,n,p) nC x = “n choose x” = number of combinations of n things taken x at a time = n(n-1)…(n – x + 1)/x! Example: Pr(1 or 2 heads in 4 tosses of a fair coin) = ?
Binomial model pbinom(x,n,p), dbinom(x,n,p) Pr(1 or 2 heads in 4 tosses of a fair coin) = Pr(1 head) + Pr(2 heads) = 4C 1p1(1 – p)3 + 4C 2p2(1 – p)2 = (4 + 6)*0.54 = 10/16 = 5/8 = 0.625 = dbinom(1,4,0.5)+ dbinom(2,4,0.5) = pbinom(2,4,0.5)- pbinom(0,4,0.5)
Example of binomial model pbinom(x, n, p) Susan goes skiing each weekend if the weather is good n = 12 weekends in ski season Probability of good weather = 0.65 for each weekend independently What is the probability that she will ski for 8 or more weekends? (Use pbinom) Find her expected number of ski weekends
Do it!
Example of binomial model pbinom(x, n = 12, p = 0.65) Expected number of weekends she skis is np = ? Probability of skiing for 8 or more weekends = 1 – Pr(no more than 7 ski trips in 12 weekends, with p = 0.65 for each) = ?
Example of binomial model pbinom(x, n = 12, p = 0.65) Expected number of weekends she skis is np = 12*0.65 = 7.8 Probability of skiing for 8 or more weekends = 1 – Pr(no more than 7 ski trips in 12 weekends, with p = 0.65 for each) = 1- pbinom(7, 12, 0.65) > 1- pbinom(7, 12, 0.65) [1] 0.583345
Optional practice problems on binomial calculations: Do using R Ten percent of computer parts produced by a certain supplier are defective. What is the probability that a sample of 10 parts contains more than 3 defective ones? On the average, two tornadoes hit major U.S. metropolitan areas every year. What is the probability that more than five tornadoes occur in major U.S. metropolitan areas next year? A lab network consisting of 20 computers was attacked by a computer virus. This virus enters each computer with probability 0.4, independently of other computers. a) Find the probability that the virus enters at least 10 computers. b) A computer manager checks the lab computers, one after another, to see if they were infected by the virus. What is the probability that she has to test at least 6 computers to find the first infected one? Check answers at www.utdallas.edu/~mbaron/3341/Practice4.pdf E-mail any questions on R solutions to tcoxdenver@aol.com
Plotting a binomial distribution (probability density function) > x = c(0:12) > y = dbinom(x, 12, 0.65); plot(x,y) Probability distribution for number of ski weekends This “probability density” or “probability mass” function lets us calculate expected utility of season pass if its utility is determined by the number of ski weekends.
Plotting a binomial distribution > barplot(dbinom(x, 12, 0.65))
Risk profile (CDF) for binomial R: x <- c(0:12) R: y <- pbinom(x, 12, 0.65) G: plot(x, y)
Using the binomial model to calculate probabilities A company will remain solvent if at least 3 of its 8 markets are profitable. The probability that each market is profitable is 25%. What is the probability that the company remains solvent?
Using the binomial model to calculate probabilities A company will remain solvent if at least 3 of its 8 markets are profitable. The probability that each market is profitable is 25%. What is the probability that the company remains solvent? Pr(no more than 5 failure) = pbinom(5, 8, 0.75) [1] 0.3214569
Bayesian analysis and probability basics
How to get needed probabilities? Derive from other probabilities and models; condition on data Bayes’ rule, decomposition and logic, event trees, fault trees, probability theory & models Monte Carlo simulation models Make them up (subjective probabilities), ask others (elicitation) Calibration, biases (e.g., over-confidence) Estimate them from data