Modeling Decision Nur Aini Masruroh.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Making Simple Decisions
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Utility Theory.
Copyright © Cengage Learning. All rights reserved. 7 Probability.
Chapter 21 Statistical Decision Theory
Managerial Decision Modeling with Spreadsheets
Chapter 4 Decision Analysis.
Risk Attitude Dr. Yan Liu
Part 3 Probabilistic Decision Models
Decision Analysis Chapter 3
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L09: Graphical Models for Decision Problems Nevin.
1 Chapter 12 Value of Information. 2 Chapter 12, Value of information Learning Objectives: Probability and Perfect Information The Expected Value of Information.
CHAPTER 14 Utility Axioms Paradoxes & Implications.
1 Utility Theory. 2 Option 1: bet that pays $5,000,000 if a coin flipped comes up tails you get $0 if the coin comes up heads. Option 2: get $2,000,000.
Judgment and Decision Making in Information Systems Utility Functions, Utility Elicitation, and Risk Attitudes Yuval Shahar, M.D., Ph.D.
Lecture 4 on Individual Optimization Risk Aversion

Probabilistic thinking – part 1 Nur Aini Masruroh.
Probabilistic thinking – part 2
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Uncertainty and Consumer Behavior
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
CS 589 Information Risk Management 23 January 2007.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Decision Analysis Chapter 3
CMSC 671 Fall 2003 Class #26 – Wednesday, November 26 Russell & Norvig 16.1 – 16.5 Some material borrowed from Jean-Claude Latombe and Daphne Koller by.
COMP14112: Artificial Intelligence Fundamentals L ecture 3 - Foundations of Probabilistic Reasoning Lecturer: Xiao-Jun Zeng
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
1 9/8/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Principles of Engineering System Design Dr T Asokan Decision Making in System Design.
1 Chapter 3 Structuring Decisions Dr. Greg Parnell Department of Mathematical Sciences Virginia Commonwealth University.
Decision Trees and Influence Diagrams Dr. Ayham Jaaron.
Decision Analysis (cont)
Chapter 3 Decision Analysis.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Introduction to Decision Making Theory Dr. Nawaz Khan Lecture 1.
Chapter 5 Uncertainty and Consumer Behavior. ©2005 Pearson Education, Inc.Chapter 52 Q: Value of Stock Investment in offshore drilling exploration: Two.
Stephen G. CECCHETTI Kermit L. SCHOENHOLTZ Understanding Risk Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
1.3 Simulations and Experimental Probability (Textbook Section 4.1)
Introduction to Probability
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Choice under uncertainty Assistant professor Bojan Georgievski PhD 1.
Quantitative Decision Techniques 13/04/2009 Decision Trees and Utility Theory.
Lecture 3 on Individual Optimization Uncertainty Up until now we have been treating bidders as expected wealth maximizers, and in that way treating their.
Decision theory under uncertainty
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
1 Civil Systems Planning Benefit/Cost Analysis Scott Matthews Courses: / / Lecture 12.
Models for Strategic Marketing Decision Making. Market Entry Decisions To enter first or to wait Sources of First-Mover Advantages –Technological leadership.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 16 Decision Analysis.
Making Simple Decisions Chapter 16 Some material borrowed from Jean-Claude Latombe and Daphne Koller by way of Marie desJadines,
1 Automated Planning and Decision Making 2007 Automated Planning and Decision Making Prof. Ronen Brafman Various Subjects.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
Money and Banking Lecture 11. Review of the Previous Lecture Application of Present Value Concept Internal Rate of Return Bond Pricing Real Vs Nominal.
Decision Making ECE457 Applied Artificial Intelligence Spring 2007 Lecture #10.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Decisions Under Risk and Uncertainty
Nevin L. Zhang Room 3504, phone: ,
CHAPTER 1 FOUNDATIONS OF FINANCE I: EXPECTED UTILITY THEORY
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #10
MNG221- Management Science –
Rational Decisions and
Propagation Algorithm in Bayesian Networks
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Behavioral Finance Economics 437.
Presentation transcript:

Modeling Decision Nur Aini Masruroh

Outline Introduction Probabilistic thinking Decision tree Introduction to Bayesian Network and Influence Diagram

Introduction Why are decisions hard to make? Complexity Uncertainty There are many alternatives or possible solutions There are many factors to be considered and many of these factors are interdependent Uncertainty The possible future outcomes are uncertain or difficult to predict Information may be vague, incomplete, or unavailable Multiple conflicting objectives The decision maker(s) may have many goals and objectives Many of these goals or objectives may be conflicting in nature

Good decision versus good outcome Good decision is not guarantee good outcome – it only enhances the chance Good decision Good outcome Bad outcome Bad decision

Probabilistic thinking Event is a distinction about some states of the world Example: Whether the next person entering the room is a beer drinker Whether it will be raining tonight, etc When we identify an event, we have in mind what we meant. But will other people know precisely what you mean? Even you may not have precise definition of what you have in mind To avoid ambiguity, every event should pass the clarity test Clarity test: to ensure that we are absolutely clear and precise about the definition of every event we are dealing with in a decision problem

Possibility tree Single event tree Example: event “the next person entering this room is a businessman” Suppose B represents a businessman and B’ otherwise,

Possibility tree Two-event trees Simultaneously consider several events Example: event “the next person entering this room is a businessman” and event “the next person entering this room is a graduate” can be jointly considered

Reversing the order of events in a tree In the previous example, we have considered the distinctions in the order of “businessman” then “graduate”, i.e., B to G. The same information can be expressed with the events in the reverse order, i.e., G to B.

Multiple event trees We can jointly consider three events businessman, graduate, and gender.

Assigning probabilities to events To assign probabilities, it depends on our state of information about the event Example: information relevant to assessment of the likelihood that the next person entering the room is a businessman might include the followings: There is an alumni meeting outside the room and most of them are businessman You have made arrangement to meet a friend here and she to your knowledge is not a businessman. She is going to show up any moment. Etc After considering all relevant background information, we assign the likelihood that the next person entering the room is a businessman by assigning a probability value to each of the possibilities or outcomes

Marginal and conditional probabilities In general, given information about the outcome of some events, we may revise our probabilities of other events We do this through the use of conditional probabilities The probability of an event X given specific outcomes of another event Y is called the conditional probability X given Y The conditional probability of event X given event Y and other background information ξ, is denoted by p(X|Y, ξ) and is given by

Factorization rule for joint probability

Changing the order of conditioning Suppose in the previous tree we have There is no reason why we should always conditioned G on B. suppose we want to draw the tree in the order G to B Need to flip the tree!

Flipping the tree Graphical approach Bayes’ theorem Change the ordering of the underlying possibility tree Transfer the elemental (joint) probabilities from the original tree to the new tree Compute the marginal probability for the first variable in the new tree, i.e., G. We add the elemental probabilities that are related to G1 and G2 respectively. Compute conditional probabilities for B given G Bayes’ theorem Doing the above tree flipping is already applying Bayes’theorem

Bayes’ Theorem Given two uncertain events X and Y. Suppose the probabilities p(X|ξ) and p(Y|X, ξ) are known, then

Application of conditional probability Direct conditioning: Relevance of smoking to lung cancer Suppose: S: A person is a heavy smoker which is defined as having smoked at least two packs of cigarettes per day for a period of at least 10 years during a lifetime L: A person has lung cancer according to standard medical definition A doctor not associated with lung cancer treatment assigned the following probabilities:

Relevance of smoking to lung cancer (cont’d) A lung cancer specialist remarked: “The probability p(L1|S1, ξ) = 0.1 is too low” When asked to explain why, he said: “Because in all these years as a lung cancer specialist, whenever I visited my lung cancer ward, it is always full of smokers.” What’s wrong with the above statement? The answer can be found by flipping the tree:

Relevance of smoking to lung cancer (cont’d) What the specialist referred to as “high” is actually the probability of a person being a smoker given that he has lung cancer, i.e., p(S1|L1, ξ) = 0.769 is exactly what he was referring to. He has confused p(S1|L1, ξ) with p(L1|S1, ξ) Notice that p(L1|S1, ξ) << p(S1|L1, ξ) Hence even highly a trained professional can fall victim to wrong reasoning

Expected value criterion Suppose you face a situation where you must choose between alternatives A and B as follows: Alternative A: $10,000 for sure. Alternative B: 70% chance of receiving $18,000 and 30% chance of loosing $4,000. What is your personal choice? Compare now Alternative B with: Alternative C: 70% chance of winning $24,600 and 30% chance of loosing $19,400 Note that EMV(B) = EMV(C), but are they “equivalent”? Alternative C seems to be “more risky” than Alternative B even thought they have the same EMV. Conclusion: EMV does not take Risk into account

The Petersburg Paradox In 1713 Nicolas Bernoulli suggested playing the following games: An unbiased coin is tossed until it lands with Tails The player is paid $2 if tails comes up the opening toss, $4 if tails first appears on the second toss, $8 if tails appears on third toss, $16 if tails appears on the forth toss, and so forth What is the maximum you would pay to play the above game? If we follow the EMV criterion: This means that you should be willing to pay up to an infinite amount of money to play the game, but why people are unwilling to pay more than a few dollars?

The Petersburg Paradox 25 years later, Nicolas’s cousin, Daniel Bernoulli, arrived at a solution that contained the first seeds of contemporary decision theory Daniel reasoned that the marginal increase the value or “utility” of money declines with the amount already possessed. A gain of $1,000 is more significant to a poor person than to a rich man through both gain same amount Specifically, Daniel Bernoulli argued that the value or utility of money should exhibit some form of diminishing marginal return with increase in wealth: The measure to use to value the game is then the “expected utility”  u is an increasing concave function, converge to a finite number

The rules of actional thought How a person should acts or decides rationally under uncertainty? Answer: by following the following rules or axioms: The ordering rule The equivalence or continuity rule The substitution or independence rule Decomposition rule The choice rule The above five rules form the axioms for Decision Theory

The ordering rule The decision maker must be able to state his preference among the prospects, outcomes, or prizes of any deal Furthermore, the transitivity property must be satisfied: that is, if he prefers X to Y, and Y to Z, then he must prefer X to Z Mathematically, The ordering rule implies that the decision maker can provide a complete preference ordering of all the outcomes from the best to the worst Suppose a person does not follow the transitivity property: the money pump argument

The equivalence or continuity rule Given a prospect A, B, and C such that , then there exists p where 0 < p < 1 such that the decision maker will be indifferent between receiving the prospect B for sure and receiving a deal with a probability p for prospect A and a probability of 1 – p for prospect C Given that B: certain equivalent of the uncertain deal on the right p: preference probability of prospect B with respect to prospects A and C

The substitution rule We can always substitute a deal with its certainty equivalent without affecting preference For example, suppose the decision maker is indifferent between B and the A – C deal below Then he must be indifferent between the two deals below where B is substituted for the A – C deal

The decomposition rule We can reduce compound deals to simple ones using the rules of probabilities For example, a decision maker should be indifferent between the following two deals:

The choice or monotonicity rule Suppose that a decision maker can choose between two deals L1 and L2 as follows: If the decision maker prefers A to B, then he must prefer L1 to L2 if and only if p1 > p2. That is, if In other words, the decision maker must prefer the deal that offers the greater chance of receiving the better outcome

Maximum expected utility principle Let a decision maker faces the choice between two uncertain deals or lotteries L1 and L2 with outcomes A1, A2, …, An as follows: There is no loss of generality in assuming that L1 and L2 have the same set of outcomes A1, A2, …, An because we can always assign zero probability to those outcomes that do not exist in either L1 and L2. It’s not clear whether L1 or L2 is preferred By ordering rule, let

Maximum expected utility principle Again, there is no loss of generality as we can always renumber the subscripts according to the preference ordering We note that A1 is the most preferred outcome, while An is the least preferred outcome By equivalent rule, for each outcome Ai (i =1, …, n) there is a number ui such that 0 ≤ ui ≤ 1 and Note that u1 = 1 and un = 0. Why?

Maximum expected utility principle By the substitution rule, we replace each Ai (i=1,…,n) in L1 and L2 with the above constructed equivalent lotteries

Maximum expected utility principle By the decomposition rule, L1 and L2 may be reduced to equivalent deals with only two outcomes (A1 and An) each having different probabilities Finally, by the choice rule, since , the decision maker should prefer lottery L1 to lottery L2 if and only if

Utilities and utility functions We define the quantity ui (i=1,…,n) as the utility of outcome Ai and the function that returns the values ui given Ai as a utility function, i.e. u(Ai) = ui The quantities are known as the expected utilities for lotteries L1 and L2 respectively Hence the decision maker must prefer the lottery with a higher expected utility

Case for more than 2 alternatives The previous may be generalized to the case when a decision maker is faced with more than two uncertain alternatives. He should choose the one with maximum expected utility Hence where is the probability for the outcome Ai in the alternative j

Comparing expected utility criterion with expected monetary value criterion The expected utility criterion takes into account both return and risk whereas expected monetary value criterion does not consider risk The alternative with the maximum expected utility is the best taking into account the trade off between return and risk The best preference trade-off depends on a person’s risk attitude Different types of utility function represent different attitudes and degree of aversion to risk taking

Decision tree Consider the following party problem: Problem: decide party location to maximize total satisfaction Note: Decision is represented by square Uncertainties are represented by circles

Preference Suppose we have the following preference Note: Best case: O – S  set 1 Worst case: O – R  set 0 Other outcomes set the preference relative to these two values

Assigning probability to the decision tree Suppose we believe that the probability it will rain is 0.6,

Applying substitution rule

Using utility values We may interpret preference probability as utility values,

Introduction to Bayesian Network and Influence Diagram

“A good representation is the key to good problem solving”

Probabilistic modeling using BN Suppose we have the following problem (represented in decision tree): Can be represented using Bayesian Network (BN): Conditional Probability Table (CPT) is embedded in each arch

Probabilistic modeling using BN The network can be extended … Can you imagine the size of decision tree for these?

Bayesian Network: definition Also called relevance diagrams, probabilistic network, causal network, causal graph, etc. BN represents the probabilistic relations between uncertain variables It is a directed acyclic graph; the nodes in the graph indicate the variables of concern, while the arcs between nodes indicate the probabilistic relations among the nodes In each node, we store a conditional probability distribution of the variable represented by that node, conditioned on the outcomes of all the uncertain variables that are parents of that node

Two layers of representation of knowledge Qualitative level Graphical structure represents the probabilistic dependence or relevance between variables Quantitative level Conditional probabilities represent the local “strength” of the dependence relationship

Where do the numbers in a BN come from? Direct assessment by domain experts Learn from sufficient amount of data using: Statistical estimation methods Machine learning and data mining algorithms Output from other mathematical models Simulation models Stochastic models Systems dynamics models Etc Combination of the above Expert assess the graphical structure and learning algorithms or other models fill in the number Learn both structure and numbers and let the experts fine-tune the results

Properties of BN Presence of an arc indicates possible relevance Arc reversal: If we are interested to know the probability that he is a smoker if a specific person has lung cancer … The operation will compute and replace the probabilities at the two nodes An arc can be drawn in any direction

Arc reversal operation Suppose initially we have, Then we want, The probability distribution p(Y) and p(X|Y) for the new network can be computed using Bayes’ Theorem as follows: Original network

Arc reversal: example Note: in arc reversal, sometimes we should add arc(s) to preserve the Bayes’ Theorem. However, if possible, avoid arc reversal that will introduce additional arcs as that implies loss of conditional independence information

If an arc can be drawn in any direction, which shall I use? During the network construction, draw arcs in the directions in which you know the conditional probabilities or you know that there are data which you can used to determined these values later. Arcs drawn in these directions are said to be in assessment order. During inference, if he arcs are not in the desired directions, reverse them. Arcs in directions required for inference are said to be in inference order. Example: The network with the arc from “smoking” to “lung cancer” is in assessment order The network with the arc from “lung cancer” to “smoking” is in inference order

BN represents joint probability distribution BN can help simplifying the JPD Consider the following BN Without constructing BN first … p(A,B,C,D,E,F)= p(A)p(B|A)p(C)p(D|B,C)p(E|B)p(F|B,E) p(A,B,C,D,E,F)= p(A)p(B|A)p(C|A,B)p(D|A,B,C)p(E|A,B,C,D)p(F|A,B,C,D,E)

Example of BN: car starting system

Example of BN: cause of dyspnea

Example of BN: ink jet printer trouble shooting

Example of BN: patient monitoring in an ICU (alarm project)

Decision modeling using Influence Diagram BN represents probabilistic relationship among uncertain variables They are useful for pure probabilistic reasoning and inferences BN can be extended to Influence Diagram (ID) to represent decision problem by adding decision nodes and value nodes This is analogues to extending a probability tree to a decision tree by adding decision branches and adding values or utilities to the end points of the tree

Decision node Decision variable: variable within the control of the decision maker Represented by rectangular node in an ID In each decision node, we store a list of possible alternatives associated with the decision variable

Arcs Information arcs: arc from chance node into decision node Influence arcs: arcs from decision node to chance node

Arcs (cont’d) Chronological arcs: Arc from one decision node to another decision node indicates the chronological order in which the decisions are being carried out

Value node and value arc Used to represent the utility or value function of the decision maker Denoted by a diamond Value node must be a sink node, i.e. it has only incoming arcs (known as value arcs) but no outgoing arc Value arcs indicate the variables whose outcomes the decision maker cares about or have impact on his utility Only one value node is allowed in a standard ID

Deterministic node Special type of chance node Represent the variable whose outcomes are deterministic (i.e. has probability = 1), once the outcomes of other conditioning nodes are known Denoted by a double-oval

ID vs decision tree No Influence Diagram Decision tree 1 Compact The size of an ID is equal to the total number of variables Combinatory The size of decision tree grows exponentially with the total number of variables. A binary tree with n nodes has 2n leaf nodes 2 Graphical representation of independence Conditional independence relations among the variables are represented by the graphical structure of network. No numerical computations needed to determine conditional independence relations Numerical representation of independence Conditional independence relations among the variables can only determined through numerical computation using the numerical computation using the probabilities 3 Non-directional The nodes and arcs of an ID may be added or deleted in any order. This makes the modeling process flexible Unidirectional A decision tree can only be built in the direction from the root to the leaf nodes. The exact sequence of the nodes or events must be known in advance 4 Symmetric model only The outcomes of all nodes must be conditioned on all outcomes of its parents. This implies that the equivalent tree must be symmetrical Asymmetric model possible The outcomes of some nodes may be omitted for certain outcomes of its parent leading to a asymmetrical tree

Example 1

Example 2

Example 3

Decision model : example 1 The party problem Basic risky decision problem

Decision model : example 2 Decision problem with imperfect information

Decision model : example 3 Production/sale problem

Decision model : example 4 Maintenance decision for space shuttle tiles

Decision model : example 5 Basic model for electricity generation investment evaluation

Evaluating ID To find the optimal decision policy of a problem represented by an ID Methods: Convert ID into an equivalent decision tree and perform tree roll back Perform operations directly on the network to obtain the optimal decision policy. First algorithm is that of Shachter (1986)

Readings Clemen, R.T. and Reilly, T. (2001). Making Hard Decisions with Decision Tools. California: Duxbury Thomson Learning Howard, R.A, (1988). Decision Analysis: Practice and Promise. Management Science, 34(6), pp. 679 – 695. Russell, S. and Norvig, P. (2003). Artificial intelligent: A modern approach, 2 ed. Prentice-Hall, Inc. Shachter, R.D., 1986, Evaluating Influence Diagrams, Operations Research, 34(6), pp. 871 – 882.