Presentation is loading. Please wait.

Presentation is loading. Please wait.

QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

Similar presentations


Presentation on theme: "QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)"— Presentation transcript:

1 QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

2 Project Goals Break answer-by-retrieval bottleneck Deep (semantic) understanding of queries and answers Integration of heterogeneous sources Formalized knowledge to integrate state-of-the-art IR components with state-of-the-art knowledge bases

3 Answer by retrieval Q: Who was the first president of Zambia? ……………………………………… … Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

4 Answer by reasoning Q: Who sponsored Kai’s attack against Pamina? …On February 13, Kai detonated the truck in front of Pamina’s HQ… …On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank… … On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

5

6

7

8

9 QUIRK strategy Use Formalized knowledge for: –Semantic understanding of queries; –Justification of answers; Use Formalized knowledge as: –Format for data normalization –‘Glue’ for data integration of: information extracted from unstructured data SQL queries against structured DBs Cyc’s knowledge

10 Blackboard Query Manager Answer Manager Inference Agent IR Agent Cyc KB GuruQA (IBM) DB1 DB2 DB-N Preemptive annotations Unstructured Documents

11 Q-Eng A-Eng Q-CycL A-CycL Q-Guru A-Guru Query InterpreterGuruQA Assistant GuruQA (IBM) Cyc English GeneratorCyc Inference EngineAnswer Manager Query Refiner Blackboard

12 Blackboard architecture Add/remove agents without disrupting existing architecture Test performance/speed with several combinations of agents Operate asynchronously.

13 Query Interpreter Q: “Who opposes the WTO?” (and (isa ?WHO Person) (thereExists ?EVENT (and (isa ?EVENT ActOfDissent) (performedBy ?EVENT ?WHO) (maleficiary ?EVENT WorldTradeOrganization))))

14 GuruQA Assistant CycL query => PERSON$ oppose(s/d)the WTO denounce(s/d) the World Trade Organization attacke(s/d) …

15 Cyc Inference Engine CycL Query => [(PersonNamedFn “Kai”) JUSTIFICATION-1] [(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2] … [(PersonNamedFn “Kai”)JUSTIFICATION-N] …

16 Cyc Justifications A? Afrom [B and C] (source 6743) Bfrom source 67430 Cfrom source 78539

17 Sources for Cyc Inference 1.4M+ CycL assertions already in Cyc’s Knowledge Base Virtual Assertions in DataBases Unsupervised Textract / CycL annotation of unstructured documents

18 Data Source Integration Data Normalization Data Fusion

19 Data Normalization Interpretation Search cat chat Katze gato gatto “felis felis” cat OR chat OR Katze OR gato OR gatto OR “felis felis”

20 Data Normalization …Zhang Mei Li, was born on January 1, 1927… NameDOB Zhang Mei Li01-01-1927 …… (birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

21 Data Normalization language independent representation of - entities - concepts - relationships CycL contains 100K+ primitives, can compositionally define infinitely many non-atomic terms.

22 Data Fusion Dr. Chen lives in Fresno Zhang Mei Li lives in Oakland Kai lives in Los Angeles California is in the Pacific Time Zone Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

23 Heterogeneous Sources Q: How old is Dr. Chen’s mother? …Zhang Mei Li, mother of Pamina’s Dr. Chen… NameDOB Zhang Mei Li01-01-1927 ……

24 Data Fusion Requires language independent connections/inferential links among - Entities - Concepts - Propositions (Facts, Rules) Cyc’s Ontology Cyc’s Knowledge Base

25 Consensus Reality Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

26 DBs as `virtual assertions’ stores (birthDate (PersonNamedFn “Zhang Mei Li) ?WHEN) SELECT: DOB FROM: PERSONAL_DATA WHERE: NAME = “Zhang Mei Li”

27 Unsupervised Textract / CycL Annotations IBM Textract relations: [Cycorp, Inc. : located-in : Austin, TX] mapped to CycL Assertions: (objectFoundInLocation Cycorp CityOfAustinTX)

28 Augmenting Textract Annotations Concept Annotation “Boston”  { CityOfBostonMA, BostonTheBand, … } Word Sense Disambiguation “I went to Boston”  CityOfBostonMA Analysis of nominal compounds “leather jacket”  (SubcollectionOfWithRelationToTypeFn Jacket mainConstituent Leather)

29 Unsupervised CycL Annotations IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ- VERB, VERB-OBJ) Map dependencies to CycL event structures.

30 Cyc-to-English generator (PersonNamedFn “Dr. Chen”) JUSTIFICATION-N “Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

31 Year 1 Tasks Get entire system to run robustly with integration of all the IBM and Cycorp components described Improve question understanding and refinement Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

32 Year 2 Tasks Add new agents to the blackboard to represent the user and session context Improve integration of answers obtained from GuruQA and Cyc Improve integrated IBM and Cycorp modules for unstructured document annotation


Download ppt "QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)"

Similar presentations


Ads by Google