Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G.

Similar presentations


Presentation on theme: "Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G."— Presentation transcript:

1 Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G. Carbonell / jgc@cs.cmu.edu / (412) 268-7279 Co-PI: Eugene Fink / e.fink@cs.cmu.edu / (412) 268-6593 HNC and Fair Isaac Co-PI: Dayne Freitag / daynefreitag@fairisaac.com / (858) 369-8191 Co-PI: Richard Rohwer / richardrohwer@fairisaac.com / (858) 369-8318 PAINT

2 Unclassified//For Official Use Only 2 Proposed functionality We will integrate the text-extraction system developed by HNC / Fair Isaac with the uncertainty-analysis system developed by CMU / DYNAM i X. The integrated system will support the following main capabilities. Extraction of relevant facts, relations, and causal links from natural-language text documents Automated intent inferences and identification of surprising developments based on uncertain data Evaluation of given hypotheses Proactive information gathering Application to the analysis of Iranian nanotechnology plans and capabilities We will also build an external API for future integration with other PAINT systems, and evaluate its effectiveness by implementing an optional “loose” integration with the predictive-analysis system developed by Berkeley / LLC.

3 Unclassified//For Official Use Only 3 HNC / Fair Isaac: REALISM System Knowledge base entities, relations, implication pool Unstructured text archive by genre Basic IE model learning [background] Information extraction (entities and relations) Abstract IE model learning [background] Data acquisition [Real-time IR] IE models Academic Newswire Blog... Genre detection Background / model-learning data paths Real-time / modeling data paths Extracted relations and causal links (structured rules) Extracted facts and entities (structured tables)

4 Unclassified//For Official Use Only 4 HNC / Fair Isaac: REALISM System Output: Large structured tables of relevant facts and entities, which include uncertainty Inference-rule representation of relations and causal links, also including uncertainty Input: Requirements and filters for the information extraction Natural-language documents World-wide web

5 Unclassified//For Official Use Only 5 Critical uncertainties CMU / DYNAM i X : RAPID System Uncertain inference rules Query matches Evaluation of hypotheses Prioritized plans for proactive data collection Inferred facts Learned inference rules Goals, queries, and hypotheses Reality interpretation: Tables of uncertain facts Manual entry, selection, and editing of knowledge

6 Unclassified//For Official Use Only 6 CMU / DYNAM i X : RAPID System Output: Inferences from uncertain data New learned inference rules Exact and approximate matches for given queries Hypothesis assessment Proactive plans for collecting additional data Input: “Reality interpretation tables,” which represent uncertain facts Uncertain inference rules Queries for specific relevant data Analyst’s hypotheses

7 Unclassified//For Official Use Only 7 Integrated system Reality interpretation: Tables of uncertain facts Uncertain inference rules Information requests Query matches Evaluation of hypotheses Plans for proactive data collection Manual entry, selection, and editing of knowledge Inferred facts Learned inference rules HNC / Fair Isaac CMU / DYNAM i X Structured relations and causal links Structured facts and entities Goals, queries, and hypotheses Topic filters Testing with Berkeley / LLC System

8 Unclassified//For Official Use Only 8 Empirical evaluation Data: We will use public data about Iranian nanotechnology in the system evaluation. When the PAINT challenge-problem data about Iran becomes available, we will combine it with the public data. Component evaluation: We will measure the following performance factors: Accuracy and completeness of text extraction Accuracy of hypothesis evaluation Effectiveness of data-collection plans Speed of each system component

9 Unclassified//For Official Use Only 9 Empirical evaluation Specific tasks: Find data relevant to given hypotheses Evaluate the validity of these hypotheses Identify critical uncertainties and propose a plan for collecting additional relevant data Performance measurements: Number of tasks completed during the experiment Accuracy of hypothesis evaluation Effectiveness of proactive data-collection plans Experimental group: Use of REALISM / RAPID Control group: Use of standard tools Evaluation of the integrated system: We will compare the productivity of subjects using the developed system with that of subjects who perform the same tasks using off-the-shelf tools.

10 Unclassified//For Official Use Only 10 Empirical evaluation Component utility: We will also evaluate the utility of REALISM and RAPID by comparing the productivity of subjects under the following three conditions: Use of the integrated system Use of REALISM without RAPID Use of RAPID without REALISM

11 Unclassified//For Official Use Only 11 Schedule Data models and API sNov. 2007 Initial versions of separate REALISM and RAPID systems; feasibility demo Feb. 2008 Integrated REALISM / RAPID system; initial integrated demo May 2008 Evaluation of the integrated system; extended demo Aug. 2008


Download ppt "Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G."

Similar presentations


Ads by Google