Www.company.com Value of Information (VOI) Theory Advisor: Dr Sushil K Prasad By: DM Rasanjalee Himali.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

A Tutorial on Learning with Bayesian Networks
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
INTRODUCTION TO MODELING
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall Publishers and Ardith E. Baker.
1 Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Based upon.
Bayesian Decision Theory
Chain Rules for Entropy
Introduction of Probabilistic Reasoning and Bayesian Networks
Protein- Cytokine network reconstruction using information theory-based analysis Farzaneh Farhangmehr UCSD Presentation#3 July 25, 2011.
Entropy Rates of a Stochastic Process
Planning under Uncertainty
Visual Recognition Tutorial
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Visual Recognition Tutorial
Maximum likelihood (ML)
Sensing, Tracking, and Reasoning with Relations Leonidas Guibas, Feng Xie, and Feng Zhao Xerox PARC and Stanford University.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Basic Concepts in Information Theory
Some basic concepts of Information Theory and Entropy
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
By Saparila Worokinasih
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
The Science of Prediction Location Intelligence Conference April 4, 2006 How Next Generation Traffic Services Will Impact Business Dr. Oliver Downs, Chief.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
1 Demand for Repeated Insurance Contracts with Unknown Loss Probability Emilio Venezian Venezian Associates Chwen-Chi Liu Feng Chia University Chu-Shiu.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
Uncertainty Management in Rule-based Expert Systems
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Decision Making Under Uncertainty - Bayesian Techniques.
Probabilistic Robotics Introduction.  Robotics is the science of perceiving and manipulating the physical world through computer-controlled devices.
UNIT IV INFRASTRUCTURE ESTABLISHMENT. INTRODUCTION When a sensor network is first activated, various tasks must be performed to establish the necessary.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Submitted by: Sounak Paul Computer Science & Engineering 4 th Year, 7 th semester Roll No:
Introduction on Graphic Models
Probabilistic Robotics Introduction. SA-1 2 Introduction  Robotics is the science of perceiving and manipulating the physical world through computer-controlled.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Lecture 1.31 Criteria for optimal reception of radio signals.
Hiroki Sayama NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory p I(p)
Pattern Recognition and Image Analysis
LECTURE 23: INFORMATION THEORY REVIEW
Machine Learning: Lecture 6
28th September 2005 Dr Bogdan L. Vrusias
Machine Learning: UNIT-3 CHAPTER-1
Kalman Filter: Bayes Interpretation
A handbook on validation methodology. Metrics.
Presentation transcript:

Value of Information (VOI) Theory Advisor: Dr Sushil K Prasad By: DM Rasanjalee Himali

Introduction Value of information (VoI) in decision analysis is the amount a decision maker would be willing to pay for information prior to making a decision. Ex: Consider a decision situation with one decision :Vacation Activity and one uncertainty :Weather Condition which will be resolved only after the Vacation Activity decision has been made. Value of information on Weather Condition –captures the value of being able to know Weather Condition even before making the Vacation Activity decision. –It is quantified as the highest price decision-maker is willing to pay for being able to know Weather Condition before making Vacation Activity decision.

Uncertainty The concept of uncertainty is closely connected with the concept of information. Uncertainty involved in any problem-solving situation is a result of some information deficiency. There are many forms of information deficiency: – The information may be, for example, incomplete, imprecise, fragmentary, unreliable, vague, or contradictory.

Uncertainty The amount of uncertainty is reduced by obtaining relevant information as a result of some action –Ex: performing a relevant experiment, and observing the experimental outcome, searching for and discovering a relevant historical record, requesting and receiving a relevant document from an archive Then, the amount of information obtained by the action can be measured by the amount of reduced uncertainty. That is, the amount of information pertaining to a given problem-solving situation that is obtained by taking some action is measured by the difference between a priori uncertainty and a posteriori uncertainty

Entropy Entropy: –a measure of the uncertainty of a random variable. Also called Shannon entropy, quantifies the information contained in a message, usually in units such as bits.bits Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication"

Entropy Let X be a discrete random variable with alphabet X and probability mass function p(x) = Pr{X = x}, x ∈ X. The entropy H(X) of a discrete random variable X is defined by: The log is to the base 2 and entropy is expressed in bits.

Relative Entropy and Mutual Information The entropy of a random variable is a measure of: –the uncertainty of the random variable; –the amount of information required on the average to describe the random variable. Relative Entropy –is a measure of the distance between two distributions. –The relative entropy D(p||q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p –The relative entropy or Kullback–Leibler distance between –two probability mass functions p(x) and q(x) is defined as:

Relative Entropy and Mutual Information Mutual Information measure of the amount of information that one random variable contains about another random variable. the reduction in the uncertainty of one random variable due to the knowledge of the other Consider two random variables X and Y with a joint probability mass function p(x, y) and marginal probability mass functions p(x) and p(y). The mutual information I (X; Y) is the relative entropy between the joint distribution and the product distribution p(x)p(y):

Relative Entropy and Mutual Information Relationship between Relative Entropy and Mutual Information We can rewrite the definition of mutual information I (X; Y) as: Thus, the mutual information I (X; Y) is the reduction in the uncertainty of X due to the knowledge of Y.

Relative Entropy and Mutual Information Since H(X,Y) = H(X) + H(Y|X), we have Finally, we note that Thus, the mutual information of a random variable with itself is the entropy of the random variable. This is the reason that entropy is sometimes referred to as self- information

Relative Entropy and Mutual Information Relationship between entropy and mutual information

Value of Information (VOI) The information theory developed by Shannon was designed to place a quantitative measure on the amount of information involved in any communication. The early developers stressed that the information measure was dependent only on the probabilistic structure of the communication process. Attempts to apply Shannon's information theory to problems beyond communications have, in the large, come to grief. The failure of these attempts could have been predicted because no theory that involves just the probabilities of outcomes without considering their consequences could possibly be adequate in describing the importance of uncertainty to a decision maker. It is necessary to be concerned not only with the probabilistic nature of the uncertainties that surround us, but also with the economic impact that these uncertainties will have on us.

Value of Information To develop a fully operational theory for dealing with uncertainty requires that issues be addressed at each of the following four levels: Level 1— –We need to find an appropriate mathematical formalization of the conceived type of uncertainty. Level 2— –We need to develop a calculus by which this type of uncertainty can be properly manipulated. Level 3— –We need to find a meaningful way of measuring the amount of relevant uncertainty in any situation that is formalizable in the theory. Level 4— –We need to develop methodological aspects of the theory, including procedures of making the various uncertainty principles operational within the theory.

VOI in the field of Computer Science VOI has been successfully applied in past in a variety of fields such as robotics and sensor networks. Ex: Scalable information-driven sensor querying and routing for ad-hoc heterogeneous sensor networks by Maurice Chu, Horst Haussecker, Feng Zhao –Application : localization and tracking –Objective: maximize Information Utility, minimize detection latency and bandwidth Human-Robot communication for corporative Decision making by Tobias Kaupp, Alexei Makarenko and Hugh Whyte –Application: human-robot cooperative decision making –Objective: adjustable autonomy

Represent Belief in some probabilistic representation –Ex: Probability function, Bayesian Network, Influence diagrams

VOI and P2P Search we model the search in an unstructured P2P network using Value of Information theory The main idea of the model is to improve the quality of search by selecting the peers to query based on utility of information they have to offer while minimizing cost of search.

A good search mechanism should aim to achieve several goals: –high quality search results, –load balance, –minimum state maintained per node and –efficient object lookup in terms of speed and bandwidth consumption. –relevance of result and the –effectiveness of the search mechanism. –Etc. Informed search mechanisms perform better in achieving these goals than many blind search methods

P2P search model The P2P search model defines and updates a belief state regarding the location of the requested data object. This belief state is incrementally updated by incorporating the next best peer that has not yet been incorporated into the current belief state. This next best peer is the one that provide maximum information utility while minimizing the cost of search.

P2P search model The current belief state needs to be held by some peer in the network. Let us call this peer the leader node l. The leader node can be a persistent one where the belief resides in the leader node for longer period of time or it can be a dynamic one where the belief dynamically travels through the network and the node holding the current belief state is assigned leader position dynamically.

P2P search model Assuming the leader node holds the current belief state, now the objective function can be defined as follows: Mc( l, j, Pr(x | {z i }i  S)) =  Mu(Pr(x | {z i }i  S), j ) – (1-  ) Ma( l, j ) The composite objective function Mc has a linear relationship with the information utility function Mu and search cost function Ma. The parameter  is a constant between the ranges 0 to 1. The selected peer j is the peer chosen from the remaining set of peers not in S that maximize the composite objective function: j = max Mc(( l, j, Pr(x | {zi}i  S))

P2P search model Each node estimates a time dependent measurement of the location of the peer containing the target information. The time dependent measurement of peer i, z i is given as follows: z i = f (x, k i (t), i (t)) x :the unknown location of the target information, k i :time dependent knowledge of peer i on queried data location, and i :peer characteristics. The function f depends on x and k i and i.

The time dependent knowledge ki of a peer i depends on the state maintained per node. This knowledge includes factors such as: –peer’s search success history, –peer’s global knowledge on other peers etc. Peer characteristics i include factors such as: –peers storage capacity, –peer’s processing power, and –peer node type, (regular node or a super node). i and ki are explicitly represented because these characteristics affect the peers estimate or measurement of the target peer.

Efficiency Metrics Objective: Low latency, Low consumed bandwidth, load balance - Peer average response time ri - Peer average bandwidth consumption per query. bi -Peer node type ti - Peer storage space / processing power pi - The number of neighbors maintained per peer ni - The location of peer xi Relevance Metrics Objective: High relevant results - Peer search success history si: - Peer global knowledge gi Cost Metrics Objective: Low latency, Low consumed bandwidth, Minimum routing state maintained per node - Number of overlay hops per query oi - Number of messages per query

peer i measurement zi can be defined by the following equation: where, zi is the measurement of peer i, ri,ti,wi,si,oi and bi stand for the peer i’s average response time per query, peer node type, processing power, peer search success history rate, overlay hops per query and average bandwidth consumption per query respectively.

Belief representation We define ‘belief’ to be the posterior probability distribution of x given the measurements z1,…zn: Pr(x | z1,…zn ) The estimate is taken to be the mean value of the probability distribution: The uncertainty  of estimate is given by the covariance of the estimate:

Peer Selection Process Calculation of the belief state requires that the measurement z1,..zn to be known prior to calculation. However, in a distributed environment like a P2P, the measurement zi and peer characteristics i reside only within peer i. Thus we need to communicate this information across peers. Communication among peers incur cost. Thus we need to intelligently chose best subset of peers providing best information utility at minimum cost.

Peer Selection Process Peer selection is an incremental process –The best sub set of peers are selected one at a time from the previously not considered peer set. The current belief state must be incrementally updated based on measurements of these previously not considered peers. The useful information a peer may provide vary based on relevance of the peers information content to find x. Also, there may exist, useful but redundant information. Therefore, incremental update of belief state requires both selection of optimal set of peers and the optimal order of incorporating these peers in to the current belief state. At each step of incorporating a new peer into the belief state it should lead to reduced uncertainty of the belief state

Information Utility he peer selection task is to choose a peer that has not yet been incorporated into the belief state yet provide the most useful information. The information utility of a peer can be formally defined as follows:  : Pr(R x )  R  maps all the probability distributions on R x and return a real number which indicates how spread out or uncertain the distribution is. Our goal is to obtain a larger value to R indicating a tighter distribution

Information Utility Assume there are N peers in the network labeled 1 to N and their corresponding measurements are {zi} 1  i  N. Let U  {1..N} be the set of peers in the network while the set S  {1... N} be the set of peers whose measurements are already incorporated in to the belief state. Clearly, S  U. The current belief is represented as: Pr(x | {zi}i  S) The next best peer, say peer j is selected from the set U-S. Incorporating a measurement zj from peer j maps the current probability distribution of x to a new probability distribution which minimizes uncertainty. The new belief state is represented as: Pr(x | {zi}i  S  {zj} )

Information Utility The best peer j to choose is: j = max  (Pr(x | {zi}i  S  {zj})) The peer j is the peer that provides the minimum uncertainty in terms of information utility.

Information Utility Measurement The information utility can be quantified in many ways. These measurements should exploit the inverse relationship between the uncertainty of belief state and the information utility: –Shannon entropy –Fisher Information Matrix –Etc.

END

A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies.probabilistic graphical modelvariables For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Formally, Bayesian networks are directed acyclic graphs whose nodes represent variables, and whose missing edges encode conditional independencies between the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent variable or a hypothesis. They are not restricted to representing random variables, which represents another "Bayesian" aspect of a Bayesian networkdirected acyclic graphslatent variable random variablesBayesian An Influence diagram is a Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.influence diagrams