USE OF BACKGROUND KNOWLEDGE IN NATURAL LANGUAGE UNDERSTANDING FOR INFORMATION FUSION Stuart C. Shapiro Department of Computer Science and Engineering Center.

Slides:



Advertisements
Similar presentations
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
Advertisements

Dr. David A Ferrucci -- Logic Programming and AI Lecture Notes Knowledge Structures Building the Perfect Object.
Referring Expressions: Definition Referring expressions are words or phrases, the semantic interpretation of which is a discourse entity (also called referent)
An F-Measure for Context-Based Information Retrieval Michael Kandefer and Stuart C. Shapiro University at Buffalo Department of Computer Science and Engineering.
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Discourse Martin Hassel KTH NADA Royal Institute of Technology Stockholm
Graph Data Management Lab, School of Computer Science Put conference information here.
Extended Response Questions
S.C. Shapiro Development of a Cognitive Agent Stuart C. Shapiro Department of Computer Science and Engineering and Center for Cognitive Science.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Case-based Reasoning System (CBR)
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
S.C. Shapiro An Introduction to SNePS 3 Stuart C. Shapiro Department of Computer Science and Engineering and Center for Cognitive Science State.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Intelligent Tutoring Systems Traditional CAI Fully specified presentation text Canned questions and associated answers Lack the ability to adapt to students.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
® IBM Software Group © 2006 IBM Corporation Writing Good Use Cases Module 4: Detailing a Use Case.
Helpdesk video  bhtRU bhtRU.
1SSPD Back to Jackson Consider how JSP views programming- –Describe structure I/O datastreams –Combine to produce a program structure –List operations.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Domain Modeling 中国科学技术大学软件学院 孟宁 2012 年 10 月 Domain Modeling ♦What: A process performed by the development teams to acquire domain knowledge. ♦Why: –Because.
Inference Graphs: A Roadmap Daniel R. Schlegel and Stuart C. Department of Computer Science and Engineering L A – Logic of Arbitrary.
Understanding PML Paulo Pinheiro da Silva. PML PML is a provenance language (a language used to encode provenance knowledge) that has been proudly derived.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
CSC 395 – Software Engineering Lecture 13: Object-Oriented Analysis –or– Let the Pain Begin (At Least I’m Honest!)
 Architecture and Description Of Module Architecture and Description Of Module  KNOWLEDGE BASE KNOWLEDGE BASE  PRODUCTION RULES PRODUCTION RULES 
NATURAL LANGUAGE UNDERSTANDING FOR SOFT INFORMATION FUSION Stuart C. Shapiro and Daniel R. Schlegel Department of Computer Science and Engineering Center.
Concurrent Inference Graphs Daniel R. Schlegel Department of Computer Science and Engineering Problem Summary Inference graphs 2 in their current form.
Modeling system requirements. Purpose of Models Models help an analyst clarify and refine a design. Models help simplify the complexity of information.
1 Knowledge Representation CS 171/CS How to represent reality? Use an ontology (a formal representation of reality) General/abstract domain Specific.
Concurrent Reasoning with Inference Graphs Daniel R. Schlegel and Stuart C. Shapiro Department of Computer Science and Engineering University at Buffalo,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Introduction to Dialogue Systems. User Input System Output ?
1 COSC 4406 Software Engineering COSC 4406 Software Engineering Haibin Zhu, Ph.D. Dept. of Computer Science and mathematics, Nipissing University, 100.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Creating a Resume Career Education Harrison Center.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
UI's for inputting and presenting the metadata of hypermedia documents Kai Kuikkaniemi HUT T
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
March, 2007RCO LLC, RCO Text Analysis Technologies for information extraction and business intelligence We can tell you everything about.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Concurrent Reasoning with Inference Graphs Daniel R. Schlegel Stuart C. Shapiro Department of Computer Science and Engineering Problem Summary Rise of.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Enhanced Entity-Relationship and Object Modeling Objectives
Lecture #1 Introduction
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Logical Agents.
Referring Expressions: Definition
Key Linguistic DEVICES Concepts
Social Knowledge Mining
CSE 635 Multimedia Information Retrieval
MIS2502: Data Analytics Relational Data Modeling 2
CGS 2545: Database Concepts Summer 2006
Presentation transcript:

USE OF BACKGROUND KNOWLEDGE IN NATURAL LANGUAGE UNDERSTANDING FOR INFORMATION FUSION Stuart C. Shapiro Department of Computer Science and Engineering Center for Multisource Information Fusion and Center for Cognitive Science University at Buffalo, Buffalo, New York Daniel R. Schlegel Department of Biomedical Informatics Center for Multisource Information Fusion and Center for Cognitive Science University at Buffalo, Buffalo, New York

Outline Introduction to Tractor Background Knowledge in: Text processing Propositionalizing Knowledge Base Enhancement Syntax-Semantics Mapping Evaluation Acknowledgments 7/8/ S. C. Shapiro & D. R. Schlegel Fusion 2015

Tractor Input: Short English intelligence message. Output: Semantic knowledge base (KB) representing contents of message. 3 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Context: Hard & Soft Information Fusion Information from multiple Soft information sources English messages Hard information sources RADAR, SONAR, LIDAR, … are fused for situation assessment. => Requirement: Capture semantic content of each message as completely and correctly as possible. 4 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Approach Not: Information Extraction Look for prespecified classes of Entities Events Properties. Instead: Natural Language Understanding (Semantic Analysis) Translate Entities, Events, Properties, Relations, … Expressed in the text Into a formal Knowledge Representation (KR) language that supports reasoning. 5 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Tractor Architecture 6 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Background Knowledge for NLU 7 Background knowledge includes: Knowledge of how the language is used Knowledge of the world Knowledge of the domain Knowledge of the axioms of the relations used in the text Tractor makes use of all of these to identify and represent: Entities and events Attributes of entities and events Relations between entities and events 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Background Knowledge in Text Processing 8 List-based background knowledge 138 lists – people, places, job titles, companies, religious groups, etc… Rule-based background knowledge Use results from list-based NER + syntactic properties Allows for identification of NEs not in lists. 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Background Knowledge in Propositionalizing 9 Uses knowledge about formats of dates, times, weights, and heights Converted into standardized formats “10/24/2010” -> Domain knowledge to process structured portions of messages Headers containing date/time of message Messages with pre-defined structures for special purposes, e.g., communication intercepts 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Background Knowledge in Propositionalizing /16/10 - RT: 0700hrs - |C:| || satellite phone| |P1:| caller| Dhanun Ahmad|||| |P2:| receiver| Dhanun Ahmad’s handler’s voice drop-box|||| |A:| “Ahmad said he is in al-Kut after driving through the night. He could not stay on the phone long, as the man traveling with him was watching closely.” Message Number 248 Date Normalized to Report time was 0700 Mode of communication was satellite phone. Caller was Dhanun Ahmad Receiver was Ahmad’s handler’s voice drop box. Analysts notes. 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Enhancing the Knowledge Base 11 Ontological Information CBIR – Context Based Information Retrieval Geographical Information 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015 “East Dora”: Is a Section of populated place MGRS: 38SMB Lat/Long: , vehicle%1:06:00::’s are conveyance%1:06:00::s

Background Knowledge in Syntax-Semantics Mapping 12 Makes use of linguistic, categorical, and domain-specific knowledge Data from all prior processing steps Curated databases of linguistic properties Hand-crafted mapping rules using the above 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Semantic Databases 13 Database of adjective X category -> attribute “a young man”: a man whose age is young “a large gathering”: a group whose cardinality is large Mereological database of parts and wholes “the man’s arm”: the arm is part of the man, not owned by the man List of mass nouns “a man with dark hair”: a man who has as a part something which is made of hair whose color is dark 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

General Knowledge 14 Plural roles/job titles “BCT analysts”: a group of analysts Noun phrases naming vehicles “his black 2010 Ford Escape SUV.” Paths of movement “Dillinger was last seen driving his black 2010 Ford Escape SUV westward down Indianapolis Road at 1:20pm on 3/17/2013”: Indianapolis road forms a path, with the direction of movement being westward. Searching a place and finding an object indicates the place of the object. “a search of his car netted IED devices”: the IED devices were located in the car. 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Understanding Noun-Noun Modification 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion Noun-Noun Modification Express a wide variety of semantic relations. Mapping rules recognize several It is an office chair. Head noun Modifying noun

Understanding Noun-Noun Modification 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion Inner locations If both nouns are locations, then the modifying noun is located in the head noun. “Rashid, Baghdad” : Rashid is a neighborhood within Baghdad Facilities and buildings in locations If the head noun is a facility, then it’s in the location of the modifying noun. “Second District Courthouse” : a courthouse located in the Second District.

Understanding Noun-Noun Modification 17 Headquarters in a location If the modifying noun is a location, but the head noun is not, the head noun is headquartered in the location of the modifying noun “A Baghdad company” : a company headquartered in Baghdad. Entity names If neither noun is a location, but both are proper nouns, they can both be assumed to be names of an entity. “Ahmad Mahmud”: a person with “Ahmad” and “Mahmud” as names, as well as the full name “Ahmad Mahmud.” Religious Membership If the head noun is a person, and the modifying name is a religious group, the person is a member of the religious group and has that religion. “a Sunni munitions trafficker” : a munitions trafficker whose religion is Sunni and is a member of the religious group named “Sunni.” 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Understanding Noun-Noun Modification 18 Subgroups If both nouns are groups, then the head noun denotes a group which is a subgroup of the group denoted by the modifying noun. “BCT analysts” : A group of analysts, all members of the organization “BCT” Organization Membership If the modifying noun is an Organization, and the head noun is not, then the head noun is a member of the organization. “the ISG affiliate” : someone filling the role of affiliate within the “ISG” organization. 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Understanding Copulas 19 Copulas link subjects to predicates. Subject is a Noun If there is a copula between a subject and noun, then the subject is co-referential with an instance of the category of the noun. “the rented vehicle is a white van” : one entity is both a rented vehicle, and a white van. Dimensional scale Predicate adjectives which imply a dimension can be used to say the subject has a value on that dimensional scale. “Dillinger is old” : Dillenger’s age has the linguistic value “old” Simple properties Predicate adjectives which don’t imply a dimension, are simple properties of the subject. “he is secretive” : “he” has the property “secretive” 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Making Inferences 20 Symmetric relations “The trigger devices netted in the arrest of Dhanun Ahmad Mahmud Ahmad on 01/27/10 match materials found in the truck of arrested ISG affiliate Abdul Wahied.” : “the devices match the materials” and “the materials match the devices” are represented. Location of act participants If a person participates in an act at a location, that person was at that location at the time of the act. “Ahmad Mahmud was arrested at Expressway on ” : Ahmad was located at the Expressway on /8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Making Inferences 21 Persons in vehicles If a person drives a vehicle, then they are located in that vehicle at that time. “Dillinger was last seen driving his black 2010 Ford Escape SUV westward down Indianapolis Road at 1:20pm on 3/17/2013,” Dillinger is understood both to be the driver of the SUV and to be located in the SUV at 1320 on Location and subgroup membership are transitive. Dillinger is in a car driving down Indianapolis Road, therefore Dillinger’s location is Indianapolis Road. 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Evaluation Mapping Rules developed using training messages Evaluated using test messages Previous results have shown: Rules are general Rules are very thorough Rules are not too general Now: Grading rubric Measures correctness and completeness against manually produced gold standard Grades correctness of attributes, entities, and relations “Answer key” created by one or more humans (“gold standard”) “Submission” compared with answer key 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion

Grading Rubric Results SetCountRPF AvgStDevAvgStDevAvgStDev Development Test Tractor generalizes well across datasets within the domain. Most of the semantic content of the messages is understood 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

How General are the rules? 7/10/2013 S. C. Shapiro & D. R. Schlegel Fusion TypeCount# Fired% FiredTimes FiredTimes Per Msg CBIR11100% SYN %1, SEM55100% SYNSEM %2, INFER9888.9% CLEAN10880%6, TOTAL %11, Rule firings on test messages Conclusion: Reasonably general.

How thorough are the rules? 7/10/2013 S. C. Shapiro & D. R. Schlegel Fusion KBSyntactic Assertions Semantic Asertions % Semantic Syntactic24691, % Semantic w/ CBIR53848, % Semantic w/o CBIR5385, % Syntactic vs. Semantic Assertions Observation: Rules convert from 68% syntactic to 91% semantic w/o CBIR assertions Conclusion: Very thorough.

Are the rules too general? 7/10/2013 S. C. Shapiro & D. R. Schlegel Fusion Type# FiredTimes Fired# Correct% Correct CBIR % SYN131,5671, % SEM % SYNSEM562,6512, % INFER % CLEAN56, % TOTAL8811,59711, % Rule firings on correct parses in test messages Conclusion: Not too general.

Conclusions Natural Language Understanding requires the use of background knowledge Background knowledge includes: Knowledge of how the language is used Knowledge of the world Knowledge of the domain Knowledge of the axioms of the relations used in the text Tractor makes use of all of these to identify and represent: Entities and events Attributes of entities and events Relations between entities and events Evaluation shows that Tractor does this identification very well with few mistakes. 27 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015

Acknowledgments This work has been supported by a Multidisciplinary University Research Initiative (MURI) grant (Number W911NF ) for "Unified Research on Network-based Hard/Soft Information Fusion", issued by the US Army Research Office (ARO) under the program management of Dr. John Lavery. 28 7/8/2015 S. C. Shapiro & D. R. Schlegel Fusion 2015