PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.

PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene Fink, Anatole Gershman, Dwight Dietrich, Ganesh Mani January 8, 2009

PAINT Slide 2 Objectives and Performance Goals Accomplishments RAPID Carnegie Mellon and DYNAM i X / Jaime Carbonell Technical Approach Long term objectives: A suit of general-purpose tools for the analysis of uncertain information, and planning of proactive data gathering. Integration with spreadsheets, databases, data-stream processing, and other standard data-analysis tools. Short term objectives: Classification of given hypotheses based on uncertain data. Evaluation of available information-gathering probes. Performance goals: Accurate ranking of hypotheses by their likelihood, and probes by their information value; see Slide 4 for the performance metrics. A novel mechanism for representing uncertain data, dependencies among them, and related inferences. Heuristic algorithms for evaluating hypotheses based on uncertain evidence with complex interdependencies. Game-theoretic algorithms for evaluating probes and trade-offs between their information value and cost. Integration with the PAINT architecture and with the general-purpose Excel spreadsheet functionality. Info value of observations and probes Evaluation of hypotheses Evaluation of probes Pre-processing of observations Reasoning under uncertainty Excel-based analyst interface Assessment of uncertain intelligence Hypotheses Observables and probes Observations Likelihood of specific hypotheses Hypotheses and probes Uncertain data, inference rules

PAINT Slide 3 Objectives and Performance Goals Objectives Long term A customizable general-purpose system for the analysis of incomplete and uncertain information, and planning of proactive data gathering. Integration with spreadsheets, relational databases, data-stream processing, and other standard data-processing tools. Short term Classification of given hypotheses based on available incomplete and uncertain data, represented by probability density functions. Evaluation of available information-gathering operations, which include passive observations and active probes, and analysis of the related trade-offs between their information value and data-collection costs.

PAINT Slide 4 Objectives and Performance Goals Performance metrics Hypotheses likelihood We compare RAPID ’s ranking of hypotheses with the ground-truth ranking. Specifically, we determine the number inv of inversions required to convert RAPID ’s ranking of the number hyp of given hypotheses into their ground- truth ranking, and then normalize the number of inversions to the [−1.0, 1.0] interval using the following expression: 1 − inv / (hyp ∙ (hyp − 1) / 4). Probe information value We compare RAPID ’s ranking of probes by their information value with the ground-truth ranking, and use the above equation as the normalized metric of the ranking accuracy. Speed and scalability We measure the dependency of the computational time on the number of hypotheses and observable variables.

PAINT Slide 5 Objectives and Performance Goals Capabilities and integration Microsoft Excel Analyst GUI Representation of probability distributions and qualitative uncertainty Uncertainty arithmetic Uncertainty analysis Representation of data utility Tracking utility changes during data collection Identification of critical uncertainties Situation assessment Representation of probes Evaluation of probe utility Automated selection of critical probes Proactive data collection What-if analysis of alternative future developments and data- collection plans based on an extension of Excel “scenarios” Contingency planning Processing of data streams Optional RAPID tools Uncertainty database

PAINT Slide 6 Objectives and Performance Goals Capabilities and integration Specific capabilities Evaluating the likelihood of each given hypothesis; accounting for the prior probabilities and new uncertain evidence. Evaluating the likelihood that none of the given hypotheses is correct, which represents a “surprise” situation. Evaluating the information value of specific passive observations, as well as active probes that affect the observed system. Integration We have integrated the RAPID system with the overall PAINT architecture. We have also developed a stand-alone version integrated with Excel.

PAINT Slide 7 Objectives and Performance Goals Impact The RAPID system provides a means for the automated or semi-automated analysis of available incomplete and uncertain national intelligence data, and planning of proactive collection of critical additional intelligence. To our knowledge, it is the first general-purpose tool for the planning of information gathering. It may help military analysts draw conclusions from gathered intelligence, reduce the mental load involved in deriving such conclusions, and improve their accuracy.

PAINT Slide 8 Technical Approach Key ideas Explicit representation of uncertain and partially missing data, along with basic dependencies among these data. Representation of uncertain inference rules and propagation of available knowledge through a network of inferences. Automated evaluation of hypotheses based on limited, uncertain, and partially unreliable evidence. Heuristic and game-theoretic techniques for construction and evaluation of information-gathering plans. Integration of these tools with the spreadsheet functionality.

PAINT Slide 9 Technical Approach Role in the PAINT architecture Observations, O Probes Decision Model Relational Probabilistic Pathway Model Analysis of Hypothesis Likelihood and Probe Information Value Response Probe Target System Adaptive Search and Probe Strategy Generation -Evolutionary Methods (e.g. genetic algorithms) Model Composition Framework System Dynamics Resource Models BAH MIT Lockheed Martin NSI BAE Berkeley CMU Probe Strategy Development Dynamic Target System Model Evaluation of Hypotheses and Probes Leadership Dynamic Social Network Model P 0 ={P(H i )} P 1 ={P } Response Assess

PAINT Slide 10 Technical Approach Inputs and outputs Evaluation of hypotheses Analysis of observations and related probes Probabilistic models External observations; may include uncertainty Given hypotheses Observables and probes Inputs Ranked hypotheses with their probabilities Ranked observations and probes with their information values Outputs

PAINT Slide 11 Technical Approach Uncertainty representation Uncertain nominals, numeric values, and dependencies, represented by piecewise- linear probability density functions. Indexing of uncertain data, and fast retrieval of exact and approximate matches. Possible values

PAINT Slide 12 Technical Approach Scalability The computational time of evaluating the likelihood of given hypotheses and the information value of a given probe is proportional to the number hyp of hypotheses and the number obs of observable variables: O(hyp ∙ obs). In practice, the evaluation takes less than a second in the PAINT architecture, which is negligible compared to the running time of the other PAINT components.

PAINT Slide 13 Technical Approach Generality The RAPID system does not use any domain-specific assumptions. It can be applied to the hypotheses and probe evaluation in any domain without changes to its algorithms or implementation. It relies on the modeling components of PAINT as its source of domain expertise and learned knowledge, and receives the related knowledge from the models in a domain-independent format.

PAINT Slide 14 Accomplishments: Technical Technical advances Main research results A novel mechanism for representing uncertain data, dependencies among them, and related inferences. Heuristic algorithms for evaluating hypotheses based on uncertain evidence with complex interdependencies. Game-theoretic algorithms for evaluating probes and trade-offs between their information value and cost. Key missing work We have not yet tested RAPID with realistic large-scale domains, and have not done user studies to determine if it addresses the needs of the intelligence community.

PAINT Slide 15 Accomplishments: Technical Publications Bin Fu, Eugene Fink, and Jaime G. Carbonell. Analysis of uncertain data: Tools for representation and processing. In Proceedings of the IEEE Conference on Systems, Man and Cybernetics, 2008. We are currently working on two more papers based on the PAINT work, and we aim to submit them in March 2009.

PAINT Slide 16 Accomplishments: Products Developed software Main components Hypothesis evaluation (completed): Evaluating the likelihood of given hypotheses based on their prior probabilities and uncertain evidence. Probe evaluation (completed): Evaluating the information value of given passive observations and active probes. Uncertainty arithmetic (initial version): General-purpose tools for the representation and analysis of uncertain data, dependencies among them, and data-gathering operations; integrated with Excel. Availability We have delivered the software for the hypothesis and probe evaluation to BAE Systems. Also, all software is available directly from CMU ; to get a copy, e-mail to e.fink@cs.cmu.edu.

PAINT Slide 17 Accomplishments: Products Developed software Requirements All developed software is for Windows 2000/XP/Vista. Hypothesis and probe evaluation requires Java. These components are designed for the use as part of the overall PAINT architecture. We have also delivered their stand-alone version for the testing purposes, but it has no GUI and a limited API. Uncertainty arithmetic requires Excel. It is a stand-alone component, not integrated with the PAINT architecture. Limitations The current representation of uncertain data is limited, and does not allow modeling complex dependencies among data items. As a result, RAPID does not account for such complex dependencies in the hypothesis evaluation. It uses conservative estimates of the evidence strength, and may underestimate the certainty of given hypotheses.

PAINT Slide 18 Accomplishments: Insights Main lessons Explicit representation of uncertainty is often essential for correctly interpreting real-world data and avoiding impractical over-simplifications. In practice, we can often reliably distinguish among competing hypotheses based on limited and partially inaccurate data, even when such data is insufficient for a rigorous statistical analysis. When the gathering of additional data is not free, the targeted proactive data collection is much cheaper than “scavenging” for all available data. Intelligent construction of information-gathering plans can greatly reduce the cost and time of data collection.

PAINT Slide 19 Accomplishments: Insights Remaining research issues Generality: Development of more general mechanisms for representation and analysis of uncertain data; in particular, modeling and fast processing of complex dependencies. Scalability: Extension of the developed techniques to the processing of massive data sets; in particular, investigation of related in-memory and on-disk indexing, and mechanisms for distributing the computation among multiple machines. Integration: Design of a general-purpose API and integration with standard data-processing tools, such as databases and data-stream processing.

PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.

Similar presentations

Presentation on theme: "PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.

Similar presentations

Presentation on theme: "PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene."— Presentation transcript:

Similar presentations

About project

Feedback