Presentation is loading. Please wait.

Presentation is loading. Please wait.

SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,

Similar presentations


Presentation on theme: "SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,"— Presentation transcript:

1 SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS, Hursley, IBM UK Steve Poteet, Ping Xue, Anne Kao, Boeing

2 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 2 Project 4 Task 2 Research Objectives Improve extraction of facts (in CE) from documents (in Natural Language) –unambiguous "semantics" of the document –machine can assist analyst in inference of new conclusions Provide rationale for linguistic and analytic processing –allow the human to be part of the NL processing –reasoning, argumentation about ambiguities, incomplete parsing... Define a model of linguistics, grammar, semantics –facilitate configuration of NLP tools in a CNL –human analyst can better understand the processing Improve Expressibility of CE –interest in CE, but needs a more "stylistic" grammar How is the Natural Language Processing related to the "Analysts Conceptual Model" (ACM)

3 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 3 Example processing (1) BCT patrol in East Rashid discover a bomb-making facility on Abu Tajara Street //MGRSCOORD: 38S MB 43655 78909// the patrol unit '|BCT patrol|' finds the facility '|p6|' and is contained in the place '|East Rashid|' and is located in the place '|East Rashid|' and is a NATO military unit.... ISSUES: names are a bit strange unnecessary "contained in" missed the bomb-making and the "on..." ignored the MGRS information "a NATO military unit" is unnecessary? BUT: this is CE, fully conformant to the ACM this is machine-processable this has a defined semantics rationale for processing is available

4 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 4 Current NL Processing Stanford Parser Entity Extractor Situation Extractor Names CE Aggregator CEStore SYNCOIN Reports Message PreProcessor "Stylistic" CE Conceptual Model (concepts, logical rules, linguistic expression) Proper Nouns (places, units) For Analysis Just exploratory steps

5 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 5 Conceptual Model(s) Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model belongs to, has as domain Semiotic Triangle Thing, Meaning, Symbolstands for, expresses General Agent, Spatial Entity, Temporal Entity, Situation, Container has as agent role, is contained in Linguistic Sentence, Phrase, Word, Noun, Linguistic Category, Linguistic Frame has as modifier, is parsed from ACM Place, Church, Person, Village, IED, Facility,....is located in meaning symbol thing conceptualises stands for expresses "Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]

6 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 6 Lots of stuff we didnt talk about !

7 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 7 Extending ITA Controlled English

8 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 8 Strong need to improve stylistic expressiveness Allow "common name" identity handling –the person John... Prepositional phrases –in, at, on Adjectives Reduce need to state the type explictly –John... Collections –the group of... Tense and aspect inflection in verbs... John met the group of US soldiers in East Rashid

9 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 9 Parallel NL and CNL parsers NL Parser CNL Parser lexicon conceptual model Reference English Grammar Semantic Theory Increase stylistic expressibility of CE Better understanding of linguistics stylistically expressive CE basic CE or predicate logic or CE-in-Java stylistically expressive CE NLP

10 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 10 Discussion Is SYNCOIN representative? –similar to style of intelligence reports –use of CE from reports to allow analysis was a key requirement in Pathfinder –but we should check again with Gavin Pearson Should we be analysing "chat"? –it was felt that a possible application for NL processing was in extracting information from "chat" –there are other US groups that are analysing chats –is chat more or less complex than fuller NL? –should we swap out the Stanford parser with something simpler? What about slang and acronyms? –is this just a question of using the same techniques with a different mapping of language to concept? –or looking for predefined patterns at a pre-parsing stage?

11 SWG Strategy – Emerging Technology Services, Hursley (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. 11 Conclusions We should continue with SYNCOIN –analysis of intelligence reports is our chosen path –fundamentally the same problem (try to communicate concepts via language) so we expect principles to be relevant to chat We should position chat and reports on a "space" –we will find examples of chat, and review whether it is similar or fundamentally different to more formal reporting eg is it the degree to which information is explicit in the text, or the degree of grammaticality? –maybe analysis of chat is a separate transition? We should compare our work with that of other US groups –we believe that the use of CE to facilitate the linguistic processing is different to other work


Download ppt "SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,"

Similar presentations


Ads by Google