Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Rich Knowledge from Text John D. Prange President 410-964-0179

Similar presentations


Presentation on theme: "Extracting Rich Knowledge from Text John D. Prange President 410-964-0179"— Presentation transcript:

1 Extracting Rich Knowledge from Text John D. Prange President 410-964-0179 john.prange@languagecomputer.com www.languagecomputer.com

2 Our Company  Language Computer Corporation (LCC) –Human Language Understanding Research and Development –Founded 11 years ago in Dallas, Texas; Established a second office in Columbia, MD in mid-2006 –~70 research scientists and engineers –Research funding primarily from DTO, NSF, AFRL, DARPA and several individual Government Agencies –Technology has been transferred to individual Government Organizations, Defense contractors and more recently to Commercial Customers

3 Outline of Talk Three Lines of Research & Development within LCC that impact Semantic-Level Understanding –Information Extraction  CiceroLite and other Cicero Products –Extracting Rich Knowledge from Text  Polaris: Semantic Parser  XWN KB: Extended WordNet Knowledge Base  Jaquar: Knowledge Extraction from Text  Context and Events: Detection, Recognition & Extraction –Cogex: Reasoning and Inferencing over Extracted Knowledge  Semantic Parsing & Logical Forms  Lexical Chains & On-Demand Axioms  Logic Prover

4  Information Extraction –Given an entire corpus of documents –Extracting every instance of some particular kind of information  Named Entity Recognition – extraction of entities such as person, location and organization names  Event-based Extraction – extraction of real world events such as bombings, deaths, court cases, etc. LCC’s Areas of Research

5 CiceroLite & Cicero-ML: Named Entity Recognition Systems

6 Two High-Performance NER Systems  Accurate and customizable NE Recognition for English  Classifies 8 high-frequency NE classes with over 90% precision and recall  Currently extended to detect over 150 different NE classes  Non-deterministic Finite-State Automata (FSA) framework resolves ambiguities in text, performs precise classification  Machine Learning-based NER for multiple languages  Statistical machine learning- based framework makes for rapid extension to new languages  Currently deployed for Arabic, German, English, and Spanish  Arabic: Classifies 18 NE classes with an average of nearly 90% F CiceroLiteCiceroLite-ML

7 CiceroLite  Designed specifically for English, CiceroLite categorizes 8 high- frequency NE classes with over 90% precision and recall.  But it’s capable of much much more: as currently deployed, CiceroLite can categorize up to 150 different NE classes, including: Over 100 more!

8 CiceroLite-ML (Arabic)  CiceroLite-ML currently detects a total 18 different classes of named entities for Arabic with between 80% - 90% F.

9 Other Cicero Products  CiceroLite-ML (Mandarin Chinese) Similar scope and depth of Arabic Version shown on previous slide  CiceroCustom User customizable event extraction system using a variant of supervised learning called “active learning”  TASER (Temporal & Spatial Normalization System) Recognize 8 different types of time expressions and over 50 types of spatial expressions; Normalizies time using ISO8601; Exact Lat/Long for ~8M place names  Under Contractual Development (With Deliveries in 2007) –CiceroRelation Relation Detection based upon ACE 2007 specifications –CiceroCoref Entity coreference utilizing CiceroLite NER; to include cross document entity tracking –CiceroDiscourse Extract discourse structure & topic semantics

10  Extracting Rich Knowledge From Text –Explicit knowledge –Implicit knowledge: implicatures, humor, sarcasm, deceptions, etc. –Other textual phenomena: negation, modality, quantification, coreference resolution Lexical Level & Syntax Semantic Relations Contexts Events & Event Properties Meta-Events Event Relations LCC’s Areas of Research Skip Back

11 Extracting Rich Knowledge from Text Innovations –A rich and flexibility representation of textual semantics –Extract concepts and semantic relations between concepts, rich event structures –Extract event properties; extend events using event relations –Handle textual phenomena such as negation and modality –Mark implicit knowledge and capture meaning suggested by it whenever possible

12 Four-Layered Representation  Syntax Representation –Syntactically link words in sentences; Apply Word Sense Disambiguation (WSD)  Semantic Relations –Provide deeper semantic understanding of relations between words  Context Representation –Place boundaries around knowledge that is not universal  Event Representation –Detect events, extract their properties, extend using event relations

13 Hierarchical Representation Lexical Level & Syntax Gilda_Flores_NN(x1) & _human_NE(x1) & _s_POS(x1,x2) & kidnapping_NN(x2) & occur_VB(e1,x2,x3) & on_IN(e1,x4) & _date_NE(x4) & time_TMP(BeginFn(x4),1990,1,13,0,0,0) & time_TMP(EndFn(x4),1990,1,13,23,59,59) he_PRP(x1) & fire_VB(e3,x1,x5) & kidnapper_NN(x5) & _date_NE(x6) & time_TMP(BeginFn(x6),1990,1,6,0,0,0) & time_TMP(EndFn(x6),1990,1,6,23,59,59) Gilda Flores’s kidnapping occurred on January 13, 1990. A week before, he had fired the kidnappers. Input Text THM_SR(x1,x2) & AGT_SR(x2,e1) & TMP_SR(x4,e1) AGT_SR(x1,e3) & THM_SR(x5,e3) & TMP_SR(x6,e3) Semantic Relations during_TMP(e1,x4)during_TMP(e3,x6) Contexts event(e2,x2) & THM_EV(x1,e2) & TMP_EV(x4,e2) event(e4,e3) & AGT_EV(x5,e2) & AGT_EV(x1,e4) & THM_EV(x5,e4) & TMP_SR(x6,e4) Events & Event Properties CAUSE_EV(e4,e2), earlier_TMP(e4,e2) Event Relations REVENGE Meta-Events

14 Polaris: Semantic Parser

15 Polaris Semantic Relations #Semantic RelationAbbr 1POSSESSIONPOS 2KINSHIPKIN 3PROPERTY-ATTRIBUTE HOLDERPAH 4AGENTAGT 5TEMPORALTMP 6DEPICTIONDPC 7PART-WHOLEPW 8HYPONYMYISA 9ENTAILENT 10CAUSECAU 11MAKE-PRODUCEMAK 12INSTRUMENTINS 13LOCATION-SPACELOC 14PURPOSEPRP 15SOURCE-FROMSRC 16TOPICTPC 17MANNERMNR 18MEANSMNS 19ACCOMPANIMENT-COMPANIONACC 20EXPERIENCEREXP #Semantic RelationAbbr 21RECIPIENTREC 22FREQUENCYFRQ 23INFLUENCEIFL 24ASSOCIATED-WITH / OTHEROTH 25MEASUREMEA 26SYNONYMY-NAMESYN 27ANTONYMYANT 28PROBABILITY-OF-EXISTENCEPRB 29POSSIBILITYPSB 30CERTAINTYCRT 31THEME-PATIENTTHM 32RESULTRSL 33STIMULUSSTI 34EXTENTEXT 35PREDICATEPRD 36BELIEFBLF 37GOALGOL 38MEANINGMNG 39JUSTIFICATIONJST 40EXPLANATIONEXN

16 Propbank vs. Polaris Relations QuestionPropbank RelationsPolaris Relations Who?AGENT, PATIENT, RECIPROCAL, BENEFICIARY AGENT, EXPERIENCER, THEME, POSSESSION, RECIPIENT, KINSHIP, ACCOMPANIMENT-COMPANION, MAKE-PRODUCE, SYNONYMY, BELIEF What?AGENT, THEME, TOPIC AGENT, THEME, TOPIC, POSSESSION, STIMULUS, MAKE-PRODUCE, HYPONYMY, RESULT, BELIEF, PART-WHOLE … Where?LOCATION, DIRECTION LOCATION, SOURCE-FROM, PART-WHOLE When?TEMPORAL, CONDITION TEMPORAL, FREQUENCY Why?PURPOSE, CAUSE, PURPOSE-NOT- CAUSE PURPOSE, CAUSE, INFLUENCE, JUSTIFICATION, GOAL, RESULT, MEANING, EXPLANATION, … How?MANNER, INSTRUMENT MANNER, INSTRUMENT, MEANS, … How much? EXTENT, DEGREEEXTENT, MEASURE Possible?CONDITIONAL (?)POSSIBILITY, CERTAINTY, PROBABILITY

17 Example: Polaris on Treebank We're talking about years ago before anyone heard of asbestos having any questionable properties. Treebank RelationsPolaris Relations TMP(talking, years ago before anyone heard of asbestos having any questionable properties) AGT(talking, We) TPC(talking, about years ago before anyone heard of asbestos having any questionable properties) EXP(heard, anyone) STI(heard, of asbestos having any questionable properties) AGT(having, asbestos) THM(having, any questionable properties) PW(asbestos, any questionable properties) PAH(properties, questionable) Propbank Relations AGT(hear, anyone) THM(hear, asbestos having any questionable properties) AGT(talking, we) THM(talking, years ago before anyone heard of asbestos having any questionable properties) Hand tagged… Automatically generated! (from Treebank tree)

18 XWN KB: Extended WordNet Knowledge Base

19 XWN Knowledge Base (1/2)  WordNet® - free from Princeton University –A large lexical database of English, developed by Professor George Miller, Princeton Univ; now under the direction of Christiane Fellbaum. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.  eXtended WordNet - free from UTD –Glosses: parsed; word sense disambiguated; transformed into logic forms  XWN Knowledge Base - done at LCC –Glosses: converted into semantic relations (using Polaris semantic parser) –Represented in a Knowledge Base  Reasoning tool  Axiom generator  Lexical chain facilitator XWN Knowledge Base

20 XWN Knowledge Base (2/2)  Summary: The rich definitional glosses from WordNet are processed through LCC’s Knowledge Acquisition System (Jaguar) to produce a semantically rich upper ontology  The Clusters: Noun glosses are transformed into sets of semantic relations, which are then arranged into individual semantic units called clusters, with one cluster per gloss  The Hierarchy: The clusters (representing one noun synset each) are arranged in a hierarchy similar to that of WordNet  The Knowledge Base: The generated KB has not only the hierarchy of WordNet, but also a rich semantic representation of each entry in the hierarchy (based on the definitional gloss)

21 Example: WordNet Gloss  Tennis is a game played with rackets by two or four players who hit a ball back and forth over a net that divides the court  ISA (Tennis, game)  AGT (two or four players, play)  THM (game, play)  INS (rackets, play)  MEA (two or four, players)  AGT (two or four players, hit)  THM (a ball, hit)  MNR (back and forth, hit)  LOC (over a net that divides the court, hit)  AGT (a net, divides)  THM (the court, divides)

22 Semantic Cluster of a WordNet Gloss tennis ISA game player MEA two or four play AGT player THM INS game racket hit AGT player THM MNR ball back and forth LOC over a net divide AGT net THMcourt Synset ID: 00457626Name: tennis, lawn_tennis

23 Hierarchy (as in WordNet) tennis basketball squash court game athletic game outdoor game golf croquet

24 Jaguar: Knowledge Extraction From Text

25 Jaguar: Knowledge Extraction  Automatically generate ontologies and structured knowledge bases from text –Ontologies form the framework or “skeleton” of the knowledge base –Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base train passenger trainfreight train IS-A transport AGENT THEME MEANS X Corp. products freight train OntologySemantic relations …

26  Automatically generate ontologies and structured knowledge bases from text –Ontologies form the framework or “skeleton” of the knowledge base –Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base IS-A carry AGENT conduct THEME board AGENT THEME board MEANS ship transport MEANS AGENT arrive run stop Joined train passenger train freight train Jaguar : Knowledge Extraction

27 Automatically Building the Ontology Jaguar builds an ontology using the following steps  Seed words selected either manually or automatically  Find sentences in the input documents that contain seed words  Parse those sentences and extract semantic relations; focusing on selected relations such as IS-A; Part-Whole; Kinship; Locative; Temporal  Integrate the selected semantic relations into the ontology being produced  Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology  If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations.  Finally, use the hyponymy information found in Extended WordNet to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes

28 Result: Jaguar Knowledge Base anthrax biological weapon

29 Context & Events: Detection, Classification & Extraction

30 Types of Context  Temporal –It rained on July 7 th  Spatial –It rained in Dallas  Report –John said “It rains”  Belief –John thinks that it rains  Volitional –John wants it to rain  Planning –It is scheduled to rain  Conditional –If it’s cloudy, it will rain  Possibility –It might rain

31 Events in Text  Basic Definition: –X is an Event, if X is a possible answer to the question:  What happened?  Applying Definition to Verbs and Nouns –Verb V is an Event if the sentence:  Someone/something V-ed (someone/something)  is an answer to the question “What happened”? –Noun N is an Event if the sentence:  There was/were (a/an) N  is an answer to the question “What happened”?

32 Events in Text  Most Adjectives are not potential Events –Verbal 'adjectives' are treated as verbs. eg. 'lost', 'admired'  Factatives ('Light' Verbs) are not separate events –Suffer-a Loss; Take-a Test; Perform-an Operation  Aspectual Markers Can Combine with a Wide Range of Events –e.g., Stop, Completion, Start, Continue, Fail, Succeed, Try  Modalities are not separate events –Possibility, Necessity, Prescription, Suggestion, Optative

33 Event Detection  Approach for Event Detection –Annotate WordNet synsets that are Event concepts  Annotation completed for Noun and Verb hierarchies –Detect events by lexical lookup for concepts in annotated WordNet  Project Status –Prototype implemented for Event detection –Run Benchmarks  Precision: 93%, Recall: 79% –Currently Tuning Performance

34 Event Extraction – Future  Event Structures for Modelling Discourse –Aspect (Start, Complete, Continue, Succeed, Fail, Try) –Modality (Possibility, Necessity, Optativity) –Event Participants (Actors, Undergoers, Instruments) –Context (Spatial, Temporal, Intensional)  Event Relations (Causation, Partonomy, Similarity, Contrast ) –Event Taxonomy/Classification –Event Composition

35  Cogex: Reasoning & Inferencing Over Extracted Knowledge LCC’s Areas of Research Answer / Entailment NL Justification World K Axioms Linguistic Axioms Q/T A/H Q/T LF A/H LF Axioms Lex Chains Axiom Building Temporal Axioms Logic Forms XWN KBase Semantic Calculus Context Semantic Parser Relaxation Logic Prover Answer Or Entailment Ranking

36 Reasoning & Inferences: Example Tasks that Require Both

37 TREC Question Answering Track  TREC Question Answering Track held annual since its inception in TREC-9 (1999)  Main Task TREC-2006 QA Track –AQUAINT Corpus of English News Text  http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31  Newswire text data in English, drawn from three sources:  Xinhua News Service (People's Republic of China),  New York Times News Service  Associated Press Worldstream News Service.  Roughly 3 GBytes of Text; Million+ documents –Test Set: 75 Sets of Questions organized around a common target; where the target is a Person, Organization, Event or Thing –Each Series of Question contains 6-9 questions; 4-7 Factoids, 1-2 List, and 1 Other –Total: 403 Factoid Questions; 89 List Questions; 75 Other Questions

38 TREC-2006 Question Answering Track 145. Target EventJohn Williams convicted of Murder 145.1 FactoidHow many non-white members of the jury were there? 145.2 FactoidWho was the foreman for the jury 145.3 FactoidWhere was the Trial held? 145.4 FactoidWhen was King convicted? 145.5 FactoidWho was the victim of the murder 145.6 ListWhat defense and prosecution attorneys participated in the trial? 145.7 Other

39 Textual Entailment  Textual Entailment –Textual Entailment Recognition is a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarization. –Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T  PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) RTE (Recognizing Textual Entailment) Challenge –RTE-1 (2004-05); RTE-2 (2005-06) and RTE-3 (2006-07) –http://www.pascal-network.org/Challenges/RTE/http://www.pascal-network.org/Challenges/RTE/  The Question Answering Task can be interpreted as a Textual Entailment task as follows: –Given a Question Q and a possible Answer Text Passage A, the QA task is then one of applying semantic inference to the pair (Q, A) to infer whether or not A contains the Answer to Q.

40 RTE-2: Example T  H Pairs Entailment?: “Yes” T: Tibone estimated diamond production at four mines operated by Debswana – Botswana’s 50-50 joint venture with DeBeers – could reach 33 million carats this year. H: Botswana is a business partner of DeBeers. Entailment?: “Yes” T: The EZLN differs from most revolutionary groups by having stopped military action after the initial uprising in the first two weeks of 1994. H: EZLN is a revolutionary group. Entailment?: “No” T: Two persons were injured in dynamite attacks perpetrated this evening against two bank branches in this Northwestern Colombian city. H: Two persons perpetrated dynamite attacks in a Northwestern Colombian city. Entailment?: “No” T: Such a margin of victory would give Abbas a clear mandate to renew peace talks with Israel, rein in militants and reform the corruption-riddled Palestinian Authority. H: The new Palestinian president combated corruption and revived the Palestinian economy.

41 Cogex: Logic Prover

42 Semantically Enhanced COGEX Answer / Entailment NL Justification Q /T A / H Q/T LF A/H LF Axioms Lex Chains Axiom Building Temporal Axioms Logic Forms XWN KBase Semantic Calculus Context Semantic Parser Relaxation Logic Prover Answer Or Entailment Ranking Linguistic Axioms World K Axioms

43 Output of Semantic Parser Question: What is the Muslim Brotherhood's goal? The output of the semantic parser: PURPOSE(x, Muslim Brotherhood) Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. The output of the semantic parser: AGENT(Muslim Brotherhood, advocate) PURPOSE(turning Egypt into a strict Muslim state, advocate) TEMPORAL(1928, establish) TEMPORAL(1992, took up arms) PROPERTY(strict, Muslim state) MEANS(political means, turning Egypt into a strict Muslim state) SYNONYMY(Muslim Brotherhood, Egypt's biggest fundamentalist group) Semantic Parser

44 Generation of Logical Forms Question: What is the Muslim Brotherhood's goal? Question Logical Form (QLF): (exists x0 x1 x2 x3 (Muslim_NN(x0) & Brotherhood_NN(x1) & nn_NNC(x2,x0,x1) & PURPOSE_SR(x3,x2))). Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992. Answer Logical Form (AFL): (exists e1 e2 e3 e4 e5 e6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 (Muslim_NN(x1) & Brotherhood_NN(x2) & nn_NNC(x3,x1,x2) & Egypt_NN(x4) & _s_POS(x5,x4) & biggest_JJ(x5) & fundamentalist_JJ(x5) & group_NN(x5) & SYNONYMY_SR(x3,x5) & establish_VB(e1,x20,x5) & in_IN(e1,x6) & 1928_CD(x6) & TEMPORAL_SR(x6,e1) & advocate_VB(e2,x5,x21) & AGENT_SR(x5,e2) & PURPOSE_SR(e3,e2) & turn_VB(e3,x5,x7) & Egypt_NN(x7) & into_IN(e3,x8) & strict_JJ(x15,x14) & Muslim_NN(x8) & state_NN(x13) & nn_NNC(x14,x8,x13) & PROPERTY_SR(x15,x14) & by_IN(e3,x9) & political_JJ(x9) & means_NN(x9) & MEANS_SR(x9,e3) & set_VB(e5,x5,x5) & itself_PRP(x5) & apart_RB(e5) & from_IN(e5,x10) & militant_JJ(x10) & group_NN(x10) & take_VB(e6,x10,x12) & up_IN(e6,x11) & arms_NN(x11) & in_IN(e6,x12) & 1992_CD(x12) & TEMPORAL_SR(x12,e6)). ALF Logic Forms QLF

45 Lexical Chains & Axioms: On Demand Input into Cogex

46 Lexical Chains from XWN  Lexical chains –Lexical Chains establish connections between semantically related concepts, i.e. WordNet synsets. (note concepts, not words which means Word Sense Disambiguation is necessary) –Concepts and relations along the lexical chain explain the semantic connectivity of the end concepts –Lexical chains start by using WordNet relations (ISA, Part- Whole) and gloss co-occurrence (weak relation) –XWN Knowledge Base then adds more meaningful (precise) relations  “Tennis  a game played with rackets by two or four players…”  Prior to XWN-KB: ‘tennis’  ‘two or four’ (gloss co- occurrence)  With XWN-KB: ‘tennis’  ‘game’  ‘play’  ‘player’  ‘two or four’ ISA AGTTHMMEA Lexical Chains XWN Knowledge Base

47 Examples of Lexical Chains Question: How were biological agents acquired by bin Laden? Answer: On 8 July 1998, the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders, which was founded by Bin Laden, purchased three chemical and biological_agent production facilities in Lexical Chain: ( V - buy#1, purchase#1 ) – HYPERNYM  (V - get#1, acquire#1 ) Question: How did Adolf Hitler die? Answer: … Adolf Hitler committed suicide … Lexical Chain: ( N - suicide#1, self-destruction#1, self-annihilation#1 ) – GLOSS  ( V - kill#1 ) – GLOSS  ( V - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 )

48 Propagating syntactic structures along the chain The goal is to filter out unacceptable chains, and to improve the ranking of chains when multiple chains can be established Example 1: AGENT Q: Who did Floyd Patterson beat to win the title? PATIENT WA: He saw Ingemar Johanson knock down Floyd Patterson seven times there in winning the title. V - beat#2 – entail  V - hit#4 – derivation  N - hitting#1,striking#2 – derivation  V - strike#2 – hyponym  V - knock-down#2 Example 2: AGENT THEME MEASURE S1: John bought a cowboy hat for $50. AGENT MEASURE THEME S2: John paid $50 for a cowboy hat. V - buy#1 – entail  V - pay#1

49 Axioms on Demand (1/3)  Extract world knowledge, in the form of axioms, from text or other resources automatically and “on demand” –When the logic prover runs out of rules to use, it can request one from external knowledge sources  Will ask for a rule connecting two concepts –Generate axioms on the fly from multiple knowledge sources  WordNet and eXtended WordNet: glosses and lexical chains  Instantiation of NLP rules  Open text from a trusted source (dictionary, encyclopedia, textbook on a relevant topic, etc.)  An automatically-built knowledge base

50 Axioms on Demand (2/3)  eXtended WordNet axiom generator –Question: What all can a ‘player’ do?  Look at all contexts with ‘player’ as AGT  Gloss of ‘tennis’: a ‘player’ can ‘hit’ (a ball), ‘play’ (a game)  Gloss of ‘squash’: A ‘player’ can ‘strike’ (a ball), etc –Connect related-concepts  kidnap_VB(e 1,x 1,x 2 ) -> kidnapper_NN(x 1 )  (asian_JJ(x1,x2)  asia_NN(x1) & _continent_NE(x1))  World Knowledge axioms –WordNet glosses –jungle_cat_NN(x 1 ) -> small_JJ(x 2,x 1 ) & Asiatic_JJ(x 3,x 1 ) & wildcat_NN(x 1 )  NLP axioms –Linguistic rewriting rules –Gilda_NN(x 1 ) & Flores_NN(x 2 ) & nn_NNC(x 3,x 1,x 2 ) -> Flores_NN(x 3 ) XWN World K Axioms Linguistic Axioms

51 Axioms on Demand (3/3)  Semantic Relation Calculus –Combine two or more local semantic relations to establish broader semantic relations –Increase the semantic connectivity –Mike is a rich man → Mike is rich  ISA_SR(Mike,man) & PAH_SR(man,rich) →PAH_SR(Mike,rich) –John lives in Dallas, Texas  John lives in Texas.  LOC(John,Dallas) & PW(Dallas,Texas) -> LOC(John, Texas)  Temporal Axioms –Time Transitivity of Events  during_CTMP(e 1,e 2 ) & during_CTMP(e 2,e 3 )  during_CTMP(e 1,e 3 ) –Dates entail more general times  October 2000 → year 2000 Semantic Calculus Temporal Axioms Axioms

52 Contextual Knowledge Axioms Examples  If someone boards a plane and the flight takes 3 hours, then that person travels for 3 hours  The person leaves at the same time and arrives at the same time with the traveling plane  If the departure of a vehicle has a destination and the vehicle arrives at the destination then the arrival is located at the destination  If something is exactly located somewhere, then nothing else is exactly located in the same place  If a Process is located in an area, then all sub Processes of the Process are located in the same area Contextual Knowledge Axioms

53 Logic Prover: The Heart of Cogex

54 Logic Prover (1/2)  A first order logic resolution style theorem prover  Inference rule sets are based on hyperresolution and paramodulation  Transform the two text fragments into 4-layered logic forms based upon LCC’s Syntactic, Semantic, Contextual and Event Processing and Analysis  Automatically create “Axioms on Demand” to be used during the proof –Lexical Chains axioms –World Knowledge axioms –Linguistic transformation axioms –Contextual / Temporal axioms

55 Logic Prover (2/2)  Load COGEX’s SOS (Set of Support) with Candidate Answer Passage(s) A and  Question Q and its USABLE list of clauses with the generated axioms, semantic and temporal axioms  Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found –If no contradiction is detected  Relax arguments  Drop entire predicates from H  Compute “Proof Score” for each Candidate  Select best Result & Generate NL Justification

56 Reasoning & Inference: How Well Does LCC Do?

57 Evaluations: QA (TREC-06)  LCC’s PowerAnswer Question Answering (QA) system finished 1 st on Factoid Questions and Overall Combined Score. A second LCC QA system, Chaucer, finished 2 nd in both categories in the TREC QA 2006 evaluation.  An LCC QA system has finished 1 st every year that the TREC QA Evaluation has been conducted (Annually since TREC-8 in 1999) Mean: 18.5% Top Score: 57.8%

58 Evaluations: PASCAL RTE-2  LCC’s Groundhog system finished 1 st overall at the Second PASCAL Recognizing Textual Entailment Challenge (RTE-2) and LCC’s COGEX system finished 2 nd. (http://www.pascal-network.org/Challenges/RTE/ )http://www.pascal-network.org/Challenges/RTE/ Mean: 57.5% Best: 75.4%

59 Contact Information  Home Office 1701 N. Collins Boulevard Suite 2000 Richardson, TX 75080 972-231-0052 (Voice) 972-231-0012 (Fax)  Maryland Office 6179 Campfire Columbia, MD 21045 410-715-0777 (Voice) 410-715-0774 (Fax) 443-878-8894 (Cell)

60 June Sunrise over Kirkwall Bay in the Orkney Islands of Scotland Your Questions & Comments


Download ppt "Extracting Rich Knowledge from Text John D. Prange President 410-964-0179"

Similar presentations


Ads by Google