Analysis of uncertain data: Selection of probes for information gathering Eugene Fink May 27, 2009.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Learning from Observations Chapter 18 Section 1 – 3.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Selection of Research Participants: Sampling Procedures
What is Statistical Modeling
Visual Recognition Tutorial
Evaluation of representations in AI problem solving Eugene Fink.
1. Estimation ESTIMATION.
Copyright ©2010 Pearson Education, Inc. publishing as Prentice Hall 9- 1 Basic Marketing Research: Using Microsoft Excel Data Analysis, 3 rd edition Alvin.
Introduction to Decision Analysis
Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.
Sample size computations Petter Mostad
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Maximum likelihood (ML) and likelihood ratio (LR) test
Simulation.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
BCOR 1020 Business Statistics
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Sampling Methods.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
* Problem solving: active efforts to discover what must be done to achieve a goal that is not readily attainable.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Hypothesis Testing in Linear Regression Analysis
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Copyright ©2011 Pearson Education
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 8 Introduction to Hypothesis Testing
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Uncertainty Management in Rule-based Expert Systems
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Optimal Eye Movement Strategies In Visual Search.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Scheduling with Uncertain Resources Eugene Fink, Jaime G. Carbonell, Ulas Bardak, Alex Carpentier, Steven Gardiner, Andrew Faulring, Blaze Iliev, P. Matthew.
Chapter 18 Section 1 – 3 Learning from Observations.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Reasoning under Uncertainty Eugene Fink LTI Seminar November 16, 2007.
DECISION MODELS. Decision models The types of decision models: – Decision making under certainty The future state of nature is assumed known. – Decision.
Lecture 1.31 Criteria for optimal reception of radio signals.
Chapter 7. Classification and Prediction
Decision Support and Business Intelligence Systems
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Likelihood intervals vs. coverage intervals
Learning from Observations
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.
DESIGN OF EXPERIMENTS by R. C. Baker
Learning from Observations
Presentation transcript:

Analysis of uncertain data: Selection of probes for information gathering Eugene Fink May 27, 2009

Outline High-level part Research interests and dreams Proactive learning under uncertainty Military intelligence applications Technical part Evaluation of given hypotheses Choice of relevant observations Selection of effective probes

Research interests and dreams Semi-automated representation changes  Problem reformulation and simplification  Selection of search and learning algorithms  Trade-offs among completeness, accuracy, and speed of these algorithms

Research interests and dreams Semi-automated representation changes Semi-automated reasoning under uncertainty  Conclusions from incomplete and imprecise data  Passive and active learning  Targeted information gathering

Research interests and dreams Semi-automated representation changes Semi-automated reasoning under uncertainty  Scheduling based on uncertain resources and constraints  Excel tools for uncertain numeric and nominal data  Analysis of military intelligence and targeted data gathering Recent projects:

Representation changes Semi-automated representation changes Semi-automated reasoning under uncertainty Theoretical foundations of AI  Formalizing “messy” AI techniques  AI-complexity and AI-completeness

Representation changes Semi-automated representation changes Semi-automated reasoning under uncertainty Theoretical foundations of AI Algorithm theory  Generalized convexity  Indexing of approximate data  Compression of time series  Smoothing of probability densities

Subject of the talk Semi-automated representation changes Semi-automated reasoning under uncertainty  Analysis of military intelligence  Targeted information gathering Theoretical foundations of AI Algorithm theory

Learning under uncertainty Learning is almost always a response to uncertainty. If we knew everything, we would not need to learn.

Learning under uncertainty Passive learning Construction of predictive models, response mechanisms, etc. based on available data.

Learning under uncertainty Passive learning Active learning Targeted requests for additional data, based on simplifying assumptions.  The oracle can answer any question.  The answers are always correct.  All questions have the same cost.

Learning under uncertainty Passive learning Active learning Proactive learning Extensions to active learning aimed at removing these assumptions.  Different questions incur different costs.  We may not receive an answer.  An answer may be incorrect.  The information value depends on the intended use of the learned knowledge.

Proactive learning architecture Model Const- ruction Model Evalu- ation Question Selection Reasoning or Optimization current model model utility and limitations questions answers Top-Level Control Data Collection

Military intelligence applications We have studied proactive learning in the context of military intelligence and homeland security. The purpose is to develop tools for: Drawing conclusions from available intelligence. Planning of additional intelligence gathering.

Modern military intelligence “Gather and analyze” Front end: Massive data collection, including satellite and aerial imaging, interviews, human intelligence, etc. Back end: Sifting through massive data sets, both public and classified. Almost no feedback loop; back-end analysts are “passive learners,” who do not give tasks to front-end data collectors.

Traditional goals Gather and analyze massive data Draw (semi-)reliable conclusions Propose actions that are likely to accomplish given objectives

Novel goals Identify critical missing intelligence and plan effective information gathering. Targeted observations (expensive). Active probing (very expensive).

Analysis of leadership and pathways We can evaluate the intent and possible future actions of an adversary through the analysis of its leadership and pathways. Leadership: Social networks, goals, and pet projects of decision makers. If Sauron and Saruman are friends, and Saruman has experience with building armies of enhanced orcs, Sauron may decide to use such orcs.

Analysis of leadership and pathways We can evaluate the intent and possible future actions of an adversary through the analysis of its leadership and pathways. Leadership: Social networks, goals, and pet projects of decision makers. Pathways: Typical projects and their sequences in research, development, and production. research on enhanced orcs secret orc development mass orc production military orc deployment observable hidden

N25 N26 N27 N3 N8 N11 N12 N2 N7 N9 N10 N1N6 N5 N4 N21 N24 N23 N22 N20 N17 N18 N19 N13 N16 N15 N14 N29 N28 N30 N31 N32 S22S21 S34 S32 S31 S25 S24 S23 S37 S36 S35 S38 C G L S2 S1 S3 S8 S7 S6 S4 S13 S12 S11 S16 S9 S19 S18 S10 S20 S17 S14 S15 S5 S33 S28 S27 S30 S29 S26 S39 S43S42 S41 S40 Analysis of leadership and pathways

Construct models of social networks and production pathways. For each set of reasonable assumptions about the adversary’s intent, use these models to predict observable events. Check which of the predictions match actual observations.

Example If Sauron were secretly forging a new ring: 80% chance we would observe deliveries of black-magic materials to Mordor. 60% chance we would observe an unusual concentration of orcs. If Sauron were conducting harmless white- magic research and development: 20% chance of black-magic deliveries. 10% chance of orc concentration. Model predictions Intelligence: The aerial imaging by eagles shows black-magic deliveries but no orcs.

General problem We have to distinguish among n mutually exclusive hypotheses, denoted H 1, H 2,…, H n. We base the analysis on m observable features, denoted obs 1, obs 2, …, obs m. Each observation is a variable that takes one of several discrete values.

Input Prior probabilities: For every hypothesis, we know its prior; thus, we have an array of n of priors, prior[1..n]. Possible observations: For every observation, obs a, we know the number of its possible values, num[a]. Thus, we have the array num[1..m] with the number of values for each observation. Observation distributions: For every hypothesis, we know the related probability distribution of each observation. Thus, we have a matrix chance[1..n, 1..m], where each element is a probability- density function. Every element chance[i, a] is itself a one- dimensional array with num[a] elements, which represent the probabilities of possible values of obs a. Actual observations: We know a specific value of each observation, which represents the available intelligence. Thus, we have an array of m observed values, val[1..m].

Output We have to evaluate the posterior probabilities of the n given hypotheses, denoted post[1..n].

Approach We can apply the Bayesian rule, but we have to address two “complications.” The hypotheses may not cover all possibilities. Sauron may be neither working on a new ring nor doing white-magic research. The observations may not be independent and we usually do not know the dependencies. The concentration of orcs may or may not be directly related to the black-magic deliveries.

Simple Bayesian case We have one observed value, val[a], and the sum of the prior[1..n] probabilities is exactly 1.0. Integrated likelihood of observing val[a]: likelihood(val[a]) = chance[1, a][val[a]] ∙ prior[1] + … + chance[n, a][val[a]] ∙ prior[n]. Posterior probability of H i : post[i] = prob(H i | val[a]) = chance[i, a][val[a]] ∙ prior[i] / likelihood(val[a]).

Rejection of all hypotheses We have one observed value, val[a], and the sum of the prior[1..n] probabilities is less than 1.0. We consider the hypothesis H 0 representing the believe that all n hypotheses are incorrect: prob[0] = 1.0 − prior[1] − … − prior[n]. Posterior probability of H 0 : post[0] = prior[0] ∙ prob(val[a] | H 0 ) / prob(val[a]) = prior[0] ∙ prob(val[a] | H 0 ) / ( prior[0] ∙ prob(val[a] | H 0 ) + likelihood(val[a]) ).

Rejection of all hypotheses Posterior probability of H 0 : post[0] = prior[0] ∙ prob(val[a] | H 0 ) / prob(val[a]) = prior[0] ∙ prob(val[a] | H 0 ) / ( prior[0] ∙ prob(val[a] | H 0 ) + likelihood(val[a]) ). Bad news: We do not know prob(val[a] | H 0 ). Good news: post[0] monotonically depends on prob(val[a] | H 0 ); thus, if we obtain lower and upper bounds for prob(val[a] | H 0 ), we also get bounds for post[0].

Plausibility principle Unlikely events normally do not happen; thus, if we have observed val[a], then its likelihood must not be too small. Plausibility threshold: We use a global constant plaus, which must be between 0.0 and 1.0. If we have observed val[a], we assume that prob(val[a]) ≥ plaus / num[a]. We use it to obtains bounds for prob(val[a] | H 0 ): Lower: (plaus / num[a] − likelihood(val[a])) / prior[0]. Upper: 1.0.

Plausibility principle We use it to obtains bounds for prob(val[a] | H 0 ): Lower: (plaus / num[a] − likelihood(val[a])) / prior[0]. Upper: 1.0. We substitute these bounds into the dependency of post[0] on prob(val[a] | H 0 ), thus obtaining the bounds for post[0]: Lower: 1.0 − likelihood(val[a]) ∙ num[a] / pluas. Upper: prior[0] / (prior[0] + likelihood(val[a])). We have derived bounds for the probability that none of the given hypotheses is correct.

Judgment calls A human has to specify a plausibility threshold and decide between the use of the lower and the upper bounds. Plausibility threshold: Reducing it leads to more reliable conclusions at the expense of a looser lower bound. We have used 0.1, which tends to give good practical results. Lower vs. upper bound: We should err on the pessimistic side. If H 0 is a pleasant surprise, use the lower bound; else, use the upper bound.

Multiple observations We have multiple observed values, val[1..m]. We have tried several approaches… Joint distributions: We usually cannot obtain joint distributions or information about dependencies. Independence assumption: We usually get terrible practical results, which are no better (and sometimes worse) than random guessing. Use of one most relevant observation: We usually get surprisingly good practical results.

Most relevant observation We identify the highest-utility observation and do not use other observations to corroborate it. Pay attention only to black-magic deliveries and ignore observations of orc armies. Advantage: We use a conservative approach, which never leads to excessive over-confidence. Drawback: We may significantly underestimate the value of available observations.

Most relevant observation We identify the highest-utility observation and do not use other observations to corroborate it. Selection procedure For each of the m observable values: Compute the posteriors based on this value. Evaluate their information utility. Select the observable value that gives the highest information utility of the posteriors.

Alternative utility measures Negation of Shannon’s entropy: post[0] ∙ log post[0] + … + post[n] ∙ log post[n]. It rewards “high certainty,” that is, situations in which the posteriors clearly favor one hypothesis over all others. It is high when the probability of some hypothesis is close to 1.0; it is low when all hypotheses are about equally likely. Drawback: It may reward unwarranted certainty.

Alternative utility measures Negation of Shannon’s entropy: post[0] ∙ log post[0] + … + post[n] ∙ log post[n]. Kullback-Leibler divergence: post[0] ∙ log (post[0] / prior[0]) + … + post[n] ∙ log (post[n] / prior[n]). It rewards situations in which the posteriors are very different from the priors. It tends to give preference to observations that have the potential for “paradigm shifts.” Drawback: It may encourage unwarranted departure from the right conclusions.

Alternative utility measures Negation of Shannon’s entropy: post[0] ∙ log post[0] + … + post[n] ∙ log post[n]. Kullback-Leibler divergence: post[0] ∙ log (post[0] / prior[0]) + … + post[n] ∙ log (post[n] / prior[n]). Task-specific utilities: We may construct better utility measures by analyzing the impact of posterior estimates on our future actions and evaluating the related rewards and penalties, but it involves more lengthy formulas.

Probe selection We may obtain additional intelligence by probing the adversary, that is, affecting it by external actions and observing its response. Increase the cost of black-magic materials through market manipulation and observe whether Sauron continues purchasing them. We have to select among k available probes.

Additional input Probe costs: For every probe, we know its expected cost; thus, we have an array of k numeric costs, cost[1..k]. Observation distributions: The likelihood of specific observed values depends on (1) which hypothesis is correct and (2) which probe has been applied. For every hypothesis and every probe, we know the related probability distribution of each observation. Thus, we have an array with n ∙ m ∙ k elements, chance[1..n, 1..m, 1..k], where each element is a probability density function. Every element chance[i, a, j] is itself a one-dimensional array with num[a] elements, which represent the probabilities of possible values of obs a.

Selection procedure For each of the k probes: Consider the related observation distributions. Select the most relevant observation. Compute the expected gain as the difference between the expected utility of the posterior probabilities and the probe cost. Select the probe with the highest gain. If this gain is positive, recommend its application.

Extensions Task-specific utility functions. Accounting for the probabilities of observation and probe failures. Selection of multiple observations based on their independence or joint distributions. Application of parameterized probes.