PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Software Process Models
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
 delivers evidence that a solution developed achieves the purpose for which it was designed.  The purpose of evaluation is to demonstrate the utility,
Scheduling with uncertain resources Elicitation of additional data Ulaş Bardak, Eugene Fink, Chris Martens, and Jaime Carbonell Carnegie Mellon University.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
C T I Metacognitive Processes for Uncertainty Handling Marvin S. Cohen, Ph.D. Bryan B. Thompson Cognitive Technologies, Inc Lorcom Lane Arlington,
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.
Introduction to Modeling
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Machine Learning Methods for Personalized Cybersecurity Jaime G. Carbonell Eugene Fink Mehrbod Sharifi Applying machine learning and artificial intelligence.
Simulation.
Automated Changes of Problem Representation Eugene Fink LTI Retreat 2007.
Novelty Detection and Profile Tracking from Massive Data Jaime Carbonell Eugene Fink Santosh Ananthraman.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.
Science and Engineering Practices
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development to validate requirements l.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
ITEC224 Database Programming
Business Analysis and Essential Competencies
1 Process Engineering A Systems Approach to Process Improvement Jeffrey L. Dutton Jacobs Sverdrup Advanced Systems Group Engineering Performance Improvement.
Wireless Networks Breakout Session Summary September 21, 2012.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
OBJECT ORIENTED SYSTEM ANALYSIS AND DESIGN. COURSE OUTLINE The world of the Information Systems Analyst Approaches to System Development The Analyst as.
1 Chapter 23 Estimation for Software Projects. 2 Software Project Planning The overall goal of project planning is to establish a pragmatic strategy for.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
MBA7025_01.ppt/Jan 13, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Introduction - Why Business Analysis.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G.
MBA7020_01.ppt/June 13, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Introduction - Why Business Analysis.
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Sensitivity and Importance Analysis Risk Analysis for Water Resources Planning and Management Institute for Water Resources 2008.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Uncertainty Management in Rule-based Expert Systems
Cmpe 589 Spring 2006 Lecture 2. Software Engineering Definition –A strategy for producing high quality software.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Chapter 4 Decision Support System & Artificial Intelligence.
MODELING AND ANALYSIS Pertemuan-4
1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime G. Carbonell / / (412)
“ Building Strong “ Delivering Integrated, Sustainable, Water Resources Solutions Sensitivity and Importance Analysis Charles Yoe
Evaluating VR Systems. Scenario You determine that while looking around virtual worlds is natural and well supported in VR, moving about them is a difficult.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Analysis of Uncertain Data: Tools for Representation and Processing Bin Fu Eugene Fink Jaime G. Carbonell.
Scheduling with Uncertain Resources Eugene Fink, Jaime G. Carbonell, Ulas Bardak, Alex Carpentier, Steven Gardiner, Andrew Faulring, Blaze Iliev, P. Matthew.
From NARS to a Thinking Machine Pei Wang Temple University.
Facilitating Document Annotation Using Content and Querying Value.
Reasoning under Uncertainty Eugene Fink LTI Seminar November 16, 2007.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
01-Business intelligence
SNS COLLEGE OF TECHNOLOGY
Lecture 3 Prescriptive Process Models
An assessment framework for Intrusion Prevention System (IPS)
REASONING WITH UNCERTANITY
Software Engineering: A Practitioner’s Approach, 6/e Chapter 23 Estimation for Software Projects copyright © 1996, 2001, 2005 R.S. Pressman & Associates,
CHAPTER 10 METHODOLOGIES FOR CUSTOM SOFTWARE DEVELOPMENT
Software Engineering: A Practitioner’s Approach, 6/e Chapter 23 Estimation for Software Projects copyright © 1996, 2001, 2005 R.S. Pressman & Associates,
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.
Chapter 26 Estimation for Software Projects.
Presentation transcript:

PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene Fink, Anatole Gershman, Dwight Dietrich, Ganesh Mani January 8, 2009

PAINT Slide 2 Objectives and Performance Goals Accomplishments RAPID Carnegie Mellon and DYNAM i X / Jaime Carbonell Technical Approach Long term objectives: A suit of general-purpose tools for the analysis of uncertain information, and planning of proactive data gathering. Integration with spreadsheets, databases, data-stream processing, and other standard data-analysis tools. Short term objectives: Classification of given hypotheses based on uncertain data. Evaluation of available information-gathering probes. Performance goals: Accurate ranking of hypotheses by their likelihood, and probes by their information value; see Slide 4 for the performance metrics. A novel mechanism for representing uncertain data, dependencies among them, and related inferences. Heuristic algorithms for evaluating hypotheses based on uncertain evidence with complex interdependencies. Game-theoretic algorithms for evaluating probes and trade-offs between their information value and cost. Integration with the PAINT architecture and with the general-purpose Excel spreadsheet functionality. Info value of observations and probes Evaluation of hypotheses Evaluation of probes Pre-processing of observations Reasoning under uncertainty Excel-based analyst interface Assessment of uncertain intelligence Hypotheses Observables and probes Observations Likelihood of specific hypotheses Hypotheses and probes Uncertain data, inference rules

PAINT Slide 3 Objectives and Performance Goals Objectives Long term A customizable general-purpose system for the analysis of incomplete and uncertain information, and planning of proactive data gathering. Integration with spreadsheets, relational databases, data-stream processing, and other standard data-processing tools. Short term Classification of given hypotheses based on available incomplete and uncertain data, represented by probability density functions. Evaluation of available information-gathering operations, which include passive observations and active probes, and analysis of the related trade-offs between their information value and data-collection costs.

PAINT Slide 4 Objectives and Performance Goals Performance metrics Hypotheses likelihood We compare RAPID ’s ranking of hypotheses with the ground-truth ranking. Specifically, we determine the number inv of inversions required to convert RAPID ’s ranking of the number hyp of given hypotheses into their ground- truth ranking, and then normalize the number of inversions to the [−1.0, 1.0] interval using the following expression: 1 − inv / (hyp ∙ (hyp − 1) / 4). Probe information value We compare RAPID ’s ranking of probes by their information value with the ground-truth ranking, and use the above equation as the normalized metric of the ranking accuracy. Speed and scalability We measure the dependency of the computational time on the number of hypotheses and observable variables.

PAINT Slide 5 Objectives and Performance Goals Capabilities and integration Microsoft Excel Analyst GUI Representation of probability distributions and qualitative uncertainty Uncertainty arithmetic Uncertainty analysis Representation of data utility Tracking utility changes during data collection Identification of critical uncertainties Situation assessment Representation of probes Evaluation of probe utility Automated selection of critical probes Proactive data collection What-if analysis of alternative future developments and data- collection plans based on an extension of Excel “scenarios” Contingency planning Processing of data streams Optional RAPID tools Uncertainty database

PAINT Slide 6 Objectives and Performance Goals Capabilities and integration Specific capabilities Evaluating the likelihood of each given hypothesis; accounting for the prior probabilities and new uncertain evidence. Evaluating the likelihood that none of the given hypotheses is correct, which represents a “surprise” situation. Evaluating the information value of specific passive observations, as well as active probes that affect the observed system. Integration We have integrated the RAPID system with the overall PAINT architecture. We have also developed a stand-alone version integrated with Excel.

PAINT Slide 7 Objectives and Performance Goals Impact The RAPID system provides a means for the automated or semi-automated analysis of available incomplete and uncertain national intelligence data, and planning of proactive collection of critical additional intelligence. To our knowledge, it is the first general-purpose tool for the planning of information gathering. It may help military analysts draw conclusions from gathered intelligence, reduce the mental load involved in deriving such conclusions, and improve their accuracy.

PAINT Slide 8 Technical Approach Key ideas Explicit representation of uncertain and partially missing data, along with basic dependencies among these data. Representation of uncertain inference rules and propagation of available knowledge through a network of inferences. Automated evaluation of hypotheses based on limited, uncertain, and partially unreliable evidence. Heuristic and game-theoretic techniques for construction and evaluation of information-gathering plans. Integration of these tools with the spreadsheet functionality.

PAINT Slide 9 Technical Approach Role in the PAINT architecture Observations, O Probes Decision Model Relational Probabilistic Pathway Model Analysis of Hypothesis Likelihood and Probe Information Value Response Probe Target System Adaptive Search and Probe Strategy Generation -Evolutionary Methods (e.g. genetic algorithms) Model Composition Framework System Dynamics Resource Models BAH MIT Lockheed Martin NSI BAE Berkeley CMU Probe Strategy Development Dynamic Target System Model Evaluation of Hypotheses and Probes Leadership Dynamic Social Network Model P 0 ={P(H i )} P 1 ={P } Response Assess

PAINT Slide 10 Technical Approach Inputs and outputs Evaluation of hypotheses Analysis of observations and related probes Probabilistic models External observations; may include uncertainty Given hypotheses Observables and probes Inputs Ranked hypotheses with their probabilities Ranked observations and probes with their information values Outputs

PAINT Slide 11 Technical Approach Uncertainty representation Uncertain nominals, numeric values, and dependencies, represented by piecewise- linear probability density functions. Indexing of uncertain data, and fast retrieval of exact and approximate matches. Possible values

PAINT Slide 12 Technical Approach Scalability The computational time of evaluating the likelihood of given hypotheses and the information value of a given probe is proportional to the number hyp of hypotheses and the number obs of observable variables: O(hyp ∙ obs). In practice, the evaluation takes less than a second in the PAINT architecture, which is negligible compared to the running time of the other PAINT components.

PAINT Slide 13 Technical Approach Generality The RAPID system does not use any domain-specific assumptions. It can be applied to the hypotheses and probe evaluation in any domain without changes to its algorithms or implementation. It relies on the modeling components of PAINT as its source of domain expertise and learned knowledge, and receives the related knowledge from the models in a domain-independent format.

PAINT Slide 14 Accomplishments: Technical Technical advances Main research results A novel mechanism for representing uncertain data, dependencies among them, and related inferences. Heuristic algorithms for evaluating hypotheses based on uncertain evidence with complex interdependencies. Game-theoretic algorithms for evaluating probes and trade-offs between their information value and cost. Key missing work We have not yet tested RAPID with realistic large-scale domains, and have not done user studies to determine if it addresses the needs of the intelligence community.

PAINT Slide 15 Accomplishments: Technical Publications Bin Fu, Eugene Fink, and Jaime G. Carbonell. Analysis of uncertain data: Tools for representation and processing. In Proceedings of the IEEE Conference on Systems, Man and Cybernetics, We are currently working on two more papers based on the PAINT work, and we aim to submit them in March 2009.

PAINT Slide 16 Accomplishments: Products Developed software Main components Hypothesis evaluation (completed): Evaluating the likelihood of given hypotheses based on their prior probabilities and uncertain evidence. Probe evaluation (completed): Evaluating the information value of given passive observations and active probes. Uncertainty arithmetic (initial version): General-purpose tools for the representation and analysis of uncertain data, dependencies among them, and data-gathering operations; integrated with Excel. Availability We have delivered the software for the hypothesis and probe evaluation to BAE Systems. Also, all software is available directly from CMU ; to get a copy, to

PAINT Slide 17 Accomplishments: Products Developed software Requirements All developed software is for Windows 2000/XP/Vista. Hypothesis and probe evaluation requires Java. These components are designed for the use as part of the overall PAINT architecture. We have also delivered their stand-alone version for the testing purposes, but it has no GUI and a limited API. Uncertainty arithmetic requires Excel. It is a stand-alone component, not integrated with the PAINT architecture. Limitations The current representation of uncertain data is limited, and does not allow modeling complex dependencies among data items. As a result, RAPID does not account for such complex dependencies in the hypothesis evaluation. It uses conservative estimates of the evidence strength, and may underestimate the certainty of given hypotheses.

PAINT Slide 18 Accomplishments: Insights Main lessons Explicit representation of uncertainty is often essential for correctly interpreting real-world data and avoiding impractical over-simplifications. In practice, we can often reliably distinguish among competing hypotheses based on limited and partially inaccurate data, even when such data is insufficient for a rigorous statistical analysis. When the gathering of additional data is not free, the targeted proactive data collection is much cheaper than “scavenging” for all available data. Intelligent construction of information-gathering plans can greatly reduce the cost and time of data collection.

PAINT Slide 19 Accomplishments: Insights Remaining research issues Generality: Development of more general mechanisms for representation and analysis of uncertain data; in particular, modeling and fast processing of complex dependencies. Scalability: Extension of the developed techniques to the processing of massive data sets; in particular, investigation of related in-memory and on-disk indexing, and mechanisms for distributing the computation among multiple machines. Integration: Design of a general-purpose API and integration with standard data-processing tools, such as databases and data-stream processing.