1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime G. Carbonell / / (412)

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

The Robert Gordon University School of Engineering Dr. Mohamed Amish
Department of Mathematics and Science
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Scheduling with uncertain resources Elicitation of additional data Ulaş Bardak, Eugene Fink, Chris Martens, and Jaime Carbonell Carnegie Mellon University.
Requirements Engineering n Elicit requirements from customer  Information and control needs, product function and behavior, overall product performance,
C T I Metacognitive Processes for Uncertainty Handling Marvin S. Cohen, Ph.D. Bryan B. Thompson Cognitive Technologies, Inc Lorcom Lane Arlington,
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Rutgers Components Phase 2 Principal investigators –Paul Kantor, PI; Design, modelling and analysis –Kwong Bor Ng, Co-PI - Fusion; Experimental design.
Mgt 240 Lecture Decision Support Systems March 3, 2005.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.
1RADAR – Scheduling Task © 2003 Carnegie Mellon University RADAR – Scheduling Task May 20, 2003 Manuela Veloso, Stephen Smith, Jaime Carbonell, Brett Browning,
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Novelty Detection and Profile Tracking from Massive Data Jaime Carbonell Eugene Fink Santosh Ananthraman.
Overview of Software Requirements
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
Scheduling with uncertain resources: Representation and utility function Ulas Bardak, Eugene Fink, and Jaime Carbonell Reflective Agent with Distributed.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Building Knowledge-Driven DSS and Mining Data
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques.
System Engineering Instructor: Dr. Jerry Gao. System Engineering Jerry Gao, Ph.D. Jan System Engineering Hierarchy - System Modeling - Information.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Science and Engineering Practices
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development to validate requirements l.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 8 Slide 1 Software Prototyping l Rapid software development to validate requirements.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Master Thesis Defense Jan Fiedler 04/17/98
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Automated Assistant for Crisis Management Reflective Agent with Distributed Adaptive Reasoning RADAR.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
소프트웨어공학 강좌 1 Chap 7. Software Prototyping - Rapid software development to validate requirements -
Lecture 7: Requirements Engineering
Data Mining By Dave Maung.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
NIMD 1 Scalable Data Exploration and Novelty Detection NIMD Grand Finale PI Meeting April 18, 2006 Main contacts: Prof. Jaime Carbonell, Carnegie Mellon.
Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
2131 Structured System Analysis and Design By Germaine Cheung Hong Kong Computer Institute Lecture 8 (Chapter 7) MODELING SYSTEM REQUIREMENTS WITH USE.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.
Cmpe 589 Spring 2006 Lecture 2. Software Engineering Definition –A strategy for producing high quality software.
Project Management Cross lifecycle Activity
Software Prototyping Rapid software development to validate requirements.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
RADAR May 5, RADAR /Space-Time Assistant: Crisis Allocation of Resources.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Analysis of Uncertain Data: Tools for Representation and Processing Bin Fu Eugene Fink Jaime G. Carbonell.
RADAR February 15, RADAR /Space-Time Learning.
Scheduling with Uncertain Resources Eugene Fink, Jaime G. Carbonell, Ulas Bardak, Alex Carpentier, Steven Gardiner, Andrew Faulring, Blaze Iliev, P. Matthew.
Reasoning under Uncertainty Eugene Fink LTI Seminar November 16, 2007.
What has been accomplished at the end of MSD 1 & 2?
Automated Assistant for Crisis Management (Reflective Agent with Distributed Adaptive Reasoning) RADAR.
Scheduling with uncertain resources Collaboration with the user Eugene Fink, Ulaş Bardak, Brandon Rothrock, Jaime Carbonell Carnegie Mellon University.
1 Chapter 1 Introduction to Accounting Information Systems Chapter 2 Intelligent Systems and Knowledge Management.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
RADAR/Space-Time: Allocation of Rooms and Vendor Orders
Scheduling under Uncertainty
Presentation transcript:

1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime G. Carbonell / / (412) Dr. Eugene Fink / / (412) Dr. Anatole Gershman / / (412) DYNAMiX Technologies POC: Dr. Ganesh Mani / / (412) Mr. Dwight Dietrich / / (724) PAINT

2 Carnegie Mellon Faculty Jaime G. Carbonell Eugene Fink Anatole Gershman Students Bin Fu Diwakar Punjani Andrew Yeager People DYNAM i X Principals Dwight Dietrich Ganesh Mani Engineers Atul Bhandari Jeremy Hermann Veera Manda

3 Outline of the presentation RAPID functionality Preliminary demo Architecture and main components Integration with REALISM Current results and work plan

4 Analysis of uncertain intelligence RAPID is a probabilistic reasoning engine for the analysis of dynamically evolving intelligence data. Intelligence results RAPID will help: Identify important holes Locate most crucial missing pieces Insert these pieces Initial knowledge Available knowledge Observable facts Hidden facts Jigsaw analogy: Knowledge sources: Public domain Intelligence Inferences

5 Analysis of uncertain intelligence RAPID will help intelligence analysts to accomplish the following tasks. Draw probabilistic conclusions from available intelligence, including uncertain and missing data Identify potentially surprising developments Formulate and assess hypotheses Identify critical uncertainties Develop strategies for proactive collection of additional intelligence to resolve uncertainties, based on the analysis of cost / benefit trade-offs Filtering and processing of new intelligence Propagation of inferences Analysis of key indicators Development of intelligence- collection plans Massive new intelligence Intelligence collection Analysts

6 Underlying functionality Representation of uncertainty: Novel representation of massive uncertain data, which supports fast matching and inferences Inferences from uncertain data: Scalable inference mechanism for reasoning about uncertain intelligence Analysis of critical uncertainties: Assessment of uncertain situations, evaluation of data utility, and identification of important missing data Proactive intelligence planning: Evaluation of available probes and construction of optimized intelligence-collection plans

7 Outline of the presentation RAPID functionality Preliminary demo Architecture and main components Integration with REALISM Current results and work plan

8 Preliminary demo Uncertainty analysis and probe evaluation, integrated into Excel.

9 Outline of the presentation RAPID functionality Preliminary demo Architecture and main components Integration with REALISM Current results and work plan

10 Architecture Advanced analysis of incomplete data, identification of critical uncertainties, evaluation and selection of probes, what-if analysis, and visualization. Excel extension for the analysis of uncertainty, probes, and proactive data collection Uncertainty calculus and proactive probe planning A large-scale database of incomplete and uncertain facts, uncertain inference rules, and hypotheses, which allows scalable planning of proactive data collection. Scalable assessment of uncertain intelligence Relational database of uncertain data and inference rules Uncertain situation assessment and data-collection planning An advanced API for integration with other systems. Optional user interface for the integrated access to all system components, which extends the standard Excel interface. Analyst interface

11 Architecture Proactive intelligence collection General intelligence collection Massive new intelligence Processing of data streams Real-time matching of queries and inference rules against a massive stream of new data Approved plans for proactive data collection Fast database operations on a stream of newly incoming data, and integration of this stream with the static database. Scalable assessment of uncertain intelligence Relational database of uncertain data and inference rules Uncertainty calculus and proactive probe planning Excel extension for the analysis of uncertainty, probes, and proactive data collection Uncertain situation assessment and data-collection planning Analyst interface Hypotheses, conclusions, and data-collection plans

12 Architecture Proactive intelligence collection General intelligence collection Massive new intelligence Scalable assessment of uncertain intelligence Relational database of uncertain data and inference rules Uncertainty calculus and proactive probe planning Excel extension for the analysis of uncertainty, probes, and proactive data collection Uncertain situation assessment and data-collection planning Analyst interface Processing of data streams Real-time matching of queries and inference rules against a massive stream of new data Value-added reasoning tools Hypotheses, conclusions, and data-collection plans Approved plans for proactive data collection

13 Processing of data streams Value-added reasoning tools Uncertainty database Uncertainty calculus and proactive probe planning Microsoft Excel Representation of probability distributions and qualitative uncertainty Uncertainty arithmetic Uncertainty analysis Representation of data utility Tracking utility changes during data collection Identification of critical uncertainties Situation assessment Representation of probes Evaluation of probe utility Automated selection and launching of critical probes Proactive probe planning What-if analysis of alternative future developments and data- collection plans based on an extension of Excel “scenarios” Contingency planning Analyst interface

14 Scalable assessment of uncertain intelligence Uncertain facts Goals, queries, and hypotheses Prioritized plans for proactive data collection Uncertain inference rules Semantic network Critical uncertainties Query matches Evaluation of hypotheses Inferred facts Learned inference rules Conflict detection Manual entry, selection, and editing of knowledge Analyst interface

15 Value-added reasoning tools Part of uncertainty database Known patterns Identification of patterns and their gradual changes in massive data streams ARGUS data explorer Contingency analysis What-if analysis of alternative hypotheses, data-collection plans, and possible future developments Alternative scenarios and their implications Markov reasoning Selection of most likely hypotheses and possible future developments Markov models Adversarial search Analysis of possible concealment and disinformation, and plans to prevent them Adversarial goals and resources Identification of syntactically different words that refer to the same objects Entity co-reference These tools are not essential for the core functionality. Uncertainty calculus and proactive probe planning Excel extension for the analysis of uncertainty, probes, and proactive data collection The available intelligence data and inference rules are in Excel tables, and in the uncertainty database integrated with Excel.

16 Analyst interface Optional extension of the Excel interface Visualization and explanation of intelligence data, inferences, and data-collection plans

17 Outline of the presentation RAPID functionality Preliminary demo Architecture and main components Integration with REALISM Current results and work plan

18 Integration goals We will integrate the text-extraction system developed by HNC / Fair Isaac with the uncertainty-analysis system developed by CMU / DYNAM i X. The integrated system will support the following capabilities. Extraction of facts, relations, and causal links from natural-language documents Evaluation of given hypotheses Proactive information gathering Application to the analysis of Iranian nano-technology plans and capabilities

19 Inputs and outputs Output: Large structured tables of relevant facts and entities, which include uncertainty Inference-rule representation of relations and causal links, also including uncertainty Input: Requirements and filters for the information extraction Natural-language documents World-wide web Output: Inferences from uncertain data Exact and approximate matches for given queries Hypothesis assessment Proactive plans for collecting additional data Input: Tables of uncertain facts Uncertain inference rules Queries for specific data Analyst hypotheses REALISMRAPID

20 Architecture Hypotheses, conclusions, and data-collection plans Information requests REALISM HNC / Fair Isaac Structured relations and causal links Structured facts and entities Topic filters RAPID CMU / DYNAM i X Analyst interface Scalable assessment of uncertain intelligence Uncertainty calculus and proactive probe planning Uncertain situation assessment and data-collection planning

21 Outline of the presentation RAPID functionality Preliminary demo Architecture and main components Integration with REALISM Current results and work plan

22 Initial results Detailed technical plan of uncertain situation assessment and proactive probe planning: architecture, functionality, and algorithms Uncertain intelligence scenario based on public data about Iranian nano-technology Preliminary prototype of situation assessment tools integrated with a relational database Preliminary prototype of a tool for the resolution of entity co-references Application of DYNAM i X Data Explorer to the nano-tech conference data provided by PAINT

23 Current work Uncertainty calculus, integrated with Excel Proactive probe planning Scalable uncertainty assessment, integrated with a relational database Integration with REALISM Initial analyst interface

24 Prototype of uncertainty calculus March Prototype of probe-planning tools March Initial RAPID / REALISM integration May Initial analyst interface (extended Excel) June Prototype of uncertainty database July Short-term plan

25 Uncertain situation assessment and proactive probe planning July 2008 Discrimination among competing hypotheses and identification of critical uncertainties July 2009 Fully integrated deployable prototype July 2009 Advanced proactive-intelligence planning and learning of inference rules July 2010 Value-added tools, which may include data- stream processing, entity co-reference, adversarial search, and Markov reasoning July 2011 Fully integrated deliverable system Jan 2012 All versions of RAPID will demonstrate all main capabilities, with increasing functionality over time. Long-term plan

26 Evaluation We expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems. To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools. Experimental group: Use of RAPID Control group: Use of standard tools

27 Evaluation We expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems. To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools. We will view RAPID as success if it consistently outperforms the standard tools, and the analysts report the overall positive experience of using it.

28 Adjustment of the earlier plan We need to adjust the plan to the new budget. We will deliver the full core functionality, but we propose to reduce the work on value-added tools. Reduced work Processing of data streams Advanced contingency analysis Analyst interface Suspended work Predictive Markov models Analysis of adversarial actions

29

30 Appendices Previous work Empirical evaluation PAINT contributions

31 ARGUS ARGUS project sponsored by DTO/ARDA : Identification and tracking of novel patterns in massive databases and data streams. Create Background Model Detect Novel Events Generate Profiles Re-cluster Update Profiles Match Historical Data Background Model Novel Events Novel Clusters Tracked Events New Profiles Data Alerts Analysts Create Background Model Detect Novel Events Generate Profiles Re-cluster Match Historical Data Background Model Novel Events Novel Clusters New Profiles New Alerts Analysts

32 ARGUS Estimate the density function at t 0 Grow the cluster for a period of Δt while reducing the weight of old records Estimate the new density function at t 0 +Δt Compare the two estimates

33 ARGUS t0t0 + Δt Re-clustering Respiratory Diseases SARS Density change

34 RADAR RADAR project sponsored by DARPA : Analysis and management of volatile crisis situations based on uncertain data. Data elicitorParserOptimizer Process new data Update crisis- management plans Suggest data- collection strategies Top-level control and learning Analysts

35 RADAR We have applied the system to repair a schedule of a conference after a crisis loss of rooms. After Crisis 0.50 Manual Repair 0.61 Auto w/o Elicitation 0.72 Auto with Elicitation 0.93 Schedule Quality Manual and auto repair Schedule Quality Number of Questions (Out of 1100) Dependency of the quality on the number of questions 0

36 RAPID Unlike ARGUS … Represents and analyzes uncertainty Supports complex inferences Unlike RADAR … Scales to massive intelligence datasets Analyzes complex “external” situations Develops intelligence-collection plans

37 Appendices Previous work Empirical evaluation PAINT contributions

38 Evaluation goals We expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems. To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools. Experimental group: Use of RAPID Control group: Use of standard tools

39 Experimental setup We expect to recruit retired intelligence analysts for the system evaluation, and ask them to perform several tasks based on given uncertain data. Identify the data most relevant to given tasks Evaluate the validity of given hypotheses Find relevant hidden patterns Identify critical missing data and propose a cost-effective plan for collecting this data

40 Performance measurements We will measure the following main factors to evaluate the performance of analysts: Number of high-level tasks completed within the experiment time frame Accuracy of hypothesis evaluation Number and relevance of identified patterns Effectiveness and costs of data-collection plans We will also ask analysts to complete a questionnaire on their overall experience.

41 Expected results We will view the proposed work as success if RAPID consistently outperforms the off-the- shelf tools in all four performance factors, the performance difference for each factor is statistically significant, and analysts report the overall positive experience of using the system.

42 RAPID / REALISM evaluation Component utility: We will also evaluate the utility of REALISM and RAPID by comparing the productivity of subjects under the following three conditions: Use of the integrated system Use of REALISM without RAPID Use of RAPID without REALISM Component evaluation: We will measure the following performance factors: Accuracy and completeness of text extraction Accuracy of hypothesis evaluation Effectiveness of data-collection plans Speed of each system component

43 Appendices Previous work Empirical evaluation PAINT contributions

44 Main contributions Feedback Strategy Generation and Exploration Dynamic Simulation Models Response Options Representation of massive uncertain knowledge Automated discovery of causal relationships Fast probabilistic integration of all evidence Analysis of possible future developments 1 Identification of critical uncertainties Planning of proactive intelligence gathering Data

45 Inputs and outputs Uncertain intelligence and analyst opinions: Massive stream of structured records Specific hypotheses New learned rules Data-search queries Query matches Evaluation of hypotheses Plans for proactive intelligence collection Uncertain situation assessment Inference rules Domain knowledge RAPID General intelligence collection Proactive intelligence collection

46 Inputs From other PAINT components: Available intelligence data and its certainty Hypotheses about unknown factors Available domain knowledge From analysts: Intelligence-analysis tasks and priorities Hypotheses and related opinions Responses to RAPID -generated probes Additional domain knowledge From other sources: Databases with available intelligence Public databases with relevant data

47 Outputs Inferences from available uncertain data Evaluation of given hypotheses New hypotheses and their certainties Plans for proactive intelligence collection Learned inference rules