Download presentation
Presentation is loading. Please wait.
Published byClarence Ward Modified over 9 years ago
1
International Technology Alliance in Network & Information Sciences Using the English Resource Grammar to extend fact extraction capabilities v1.1 David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge ITA Fall Meeting October 2013
2
Research Objectives Extraction of facts in Controlled English from Natural Language documents express the document in a formal but still readable way extracted facts can be used to infer new information Facilitate configuration of NL processing tools in CE human analyst can be more involved in the NL processing a common model of linguistics, grammar, and semantics Provide rationale for linguistic and analytic processing human can better understand and review the reasoning facilitate evaluation of the quality of the reasoning We are not tasked with creating fundamental breakthroughs in the theory of NL processing
3
other data Reference data Supporting the analyst doc27 CE Facts InferenceRationale Argumentation Query Analysts Conceptual Model Assumption s Uncertainty CE Tools NLP Requirements Product Linked data web Structured data CE Facts The analyst does not have time to read all the reports
4
Working Scenario Imagine you are an analyst in a team, being asked to provide high value information about events on the ground Based upon reports and background reference material: You want to extract basic facts from these reports and to infer new information You want to have “new ideas” and implement this quickly without IT involvement You want to understand and review the collaborative reasoning of the team which may contain differing skills 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds Source: SYNCOIN simulated reports Graham, Rimland, & Hall. (2011). A COIN-inspired Synthetic Dataset for Qualitative Evaluation of Hard and Soft Fusion Systems: Proc, 14th international conference on information fusion. Chicago, IL.
5
The state of the BPP11 research We are using CE as the target language for expressing facts as the shared model of the concepts being expressed as the language to configure NL systems Detecting structures in phrases Mapping language expressions to concepts as the way to reveal reasoning performed by a collaborative team Text Phrase structures Facts Generic Semantics Domain Semantics Controlled English Analysts Reasoning High Value Facts
6
Motivation for using DELPH-IN linguistics Collaborate with DELPH-IN consortium, to extend our NL and fact extraction capabilities ERG is a high-coverage, high-precision English grammar, developed over 20 years MRS is the representation of semantics PET parser is an efficient parser Explore Controlled English as possible facilitator for the use of DELPH-IN linguistic resources Provide opportunity to research into deeper semantic processing contribute to the NL research community Typed Feature Structures English Resource Grammar, Stanford Linguistic Knowledge Builder, Cambridge PET parser Minimal Recursion Semantics, Cambridge Japanese, German, Norwegian, Thai, Chinese, Spanish,... Translation
7
Integrating CE and the ERG Use ERG (and PET) to parse sentences and provide phrase structures Use MRS to express generic semantics Represent domain semantics in MRS, by extending generic semantics Research into the integration of domain semantics and linguistic processing Text Phrase structures Facts Generic Semantics Domain Semantics Controlled English Analyst’s Reasoning High Value Facts ERG MRS ?
8
Raw ERG system output PARSE TREE (syntax) MRS (semantics) We will turn this into CE
9
Defining the ERG lexicon in CE Transformation between the ERG structures (Typed Feature Structures) and CE there is a count noun named checkpoint_n1 that is written as the word |checkpoint| and is a form of the noun sense ‘_checkpoint_n_1_rel’. checkpoint_n1 := n_-_c_le & [ ORTH, SYNSEM [ LKEYS.KEYREL.PRED "_checkpoint_n_1_rel", PHON.ONSET con ] ]. The user has to define this link Is this easier to understand? the noun sense ‘_checkpoint_n1_rel’ expresses the entity concept ‘checkpoint’. Mapping between generic semantics and specific semantics the noun sense ‘_carpet_n1_rel’ expresses the entity concept ‘carpet’.
10
Defining ERG grammar rules in CE Subcomponents of phrase are “head daughter” followed by “non head” daughter basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS ]. there is a linguistic frame named f1 that defines the basic-head-initial PH and has the sequence ( the sign A0, and the sign A1 ) as subcomponents and has the statement that ( the basic-head-initial PH has the sign A0 as HD-DTR and has the sign A1 as NH-DTR ) as semantics. a basic-head-initial ARGS a list 0TH a sign HD-DTRa thing NH-DTR a thing 1ST a sign
11
Three stage approach to defining MRS in CE 1.Generate raw representation of : elementary predications (EPs) as objects with predicate and arguments scope information between EPs features of the entities involved 2.Extract intermediate, but generic, concepts describing the raw MRS: patterns of quantification … 3.Transform into domain specific CE concepts using links between the predicate and the CE concept. …
12
Step 1 - CE version of raw MRS x5 – “I” x9 – “new carpet” x5 “needs” x9 Still needs to be turned into more understandable concepts …
13
if ( there is an indefinite quantification Q that is on the thing T and has the mrs predicate MRS as sense ) and ( the mrs predicate MRS expresses the entity concept EC ) then ( the thing T is an EC ). the mrs elementary predication #ep7_3 is an instance of the mrs predicate ‘_udef_q_rel’ and has the thing x9_8 as zeroth argument. there is an indefinite quantification named q2 that is on the thing x9_8 and has the mrs predicate “_carpet_n_1_rel” as sense. the mrs elementary predication #ep7_5 is an instance of the mrs predicate '_carpet_n_1_rel’ and has the thing x9_8 as zeroth argument. the mrs predicate “_carpet_n_1_rel” expresses the entity concept ‘carpet’. the thing x9_8 is a carpet. the mrs elementary predication #ep7_3 equals modulo quantifiers the mrs elementary predication #ep7_5. rule to detect quantifier pattern in MRS Raw Intermediate Domain 3 Steps to Domain Semantics
14
Facts extracted from example sentence 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds If other reports can add to information on the man x5_8 then we may know who is requiring new carpets, and could predict future events? This requires a number of linguistic and domain specific steps
15
Discussion DELPH-IN community have developed excellent Natural Language capabilities We are integrating the “ERG system” and expressing lexicon, grammar rules and semantics in CE However in the ERG system, the semantics are not completely separated from the linguistic structures we propose intermediate semantic structures in CE, for bridging gap between generic and domain semantics We are introducing domain semantics to represent facts in CE provides a “target” for output of the ERG system opportunity to explore how this can affect parsing of sentences Much needs to to be done improve integration extend intermediate MRS obtain rationale feedback of semantic reasoning into the parsing mechanisms to help adding/understanding of rules
16
Extra
17
ERG rules & typesERG lexicon PET parser Text MRS CE lexicon Conceptual model shallow processing CE facts PET parse tree Parse tree as CE Stanford Parser Raw MRS as CE Use same transformation to be consistent CE linguistic frames Information Flow Red links have been partially implemented
18
Rationale “the group of things x10 has the entity concept survey as categorisation.” The rationale from the elementary predicates is: How do we get the rationale FOR the elementary predicates? could follow the parser tree + the TFS definitions, but nee a link between parse tree and MRS, which is so far not available
19
A layered Conceptual Model Meta ModelConcept, Entity Concept, Relation Concept, Conceptual Model belongs to, has as domain SemioticsThing, Meaning, Symbolstands for, expresses General Semantics Agent, Spatial Entity, Temporal Entity, Situation, Container has as agent role, is contained in LinguisticSentence, Phrase, Word, Noun, Fragment, Linguistic Frame has as dependent, is parsed from, expresses Analysts Domain Model Place, Person, Village, Communication, IED, Facility,.... is located in, monitors Our Semiotic Triangle, based on [Ogden, C. K. and Richards, I. A. (1923). ]
20
The ERG system architecture PET is run under Linux (DEBIAN) in an ORACLE VirtualBox image A Prolog program provides a web service for parsing sentences and turning the result into CE Aiming to integrate to our CE Store sentence CE parse tree and MRS PET parser with ERG PROLOG CE generator PROLOG web service sentence parse tree and MRS CEparse tree and MRS
21
Feedback of domain reasoning to the parsing? We want the domain to affect the parse, eg: creating new lexical entries and grammar rules prior to parsing But we also want arbitrary domain reasoning to affect the parse at runtime Could this: rule out inconsistent parses provide disambiguations, and dialog context? ERG/PET DOMAIN REASONER facts constraints on linguistic phenomena ERG DOMAIN MODEL lexical entries, grammar rules
22
Linking text to domain situations
23
Working out the “requirer” This can only be done by analysis of the communications as a whole (including anaphoric reference) 02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds STEP C STEP A Step C needs knowledge of the structure of the report and of communications Step A needs linguistic knowledge
24
Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE
25
Domain Situations a requirement a production a delivery a usage an agent the material has as material is requested by is requested from has as material is produced by is delivered by is delivered to an agent has as material an agent is performed by needs are these the same agent?
26
CE representation for parse tree
27
Defining ERG grammar rules in CE basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS ]. headed_phrase := phrase & [ SYNSEM.LOCAL [ CAT [ HEAD head & #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ], HD-DTR.SYNSEM.LOCAL local & [ CAT [ HEAD #head, HC-LEX #hclex ], AGR #agr,CONJ #conj ] ]. Ordered sequence of subcomponents, Head daughter followed by non head daughter Some info is passed up from head daughter to “this” phrase Analysis of the rules for hd_cmp_u_c
28
Example CE rules if ( the communication C has the agent A as initiator ) and ( the agent A is located in the place P ) then ( the communication C is from the place P ). if ( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel' and has the thing T as first argument and has the thing C as second argument ) then ( the thing T is contained in the container C ). DOMAIN RULE LINGUISTIC RULE
29
Calling ERG system from Word
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.