Presentation is loading. Please wait.

Presentation is loading. Please wait.

Question Answering & Linked Data

Similar presentations


Presentation on theme: "Question Answering & Linked Data"— Presentation transcript:

1 Question Answering & Linked Data
Wang Yong

2 Content Overview of QA System Template-based Question Answering
Open Question Answering Over Multiple Knowledge Bases Structured data and inference in DeepQA Conclusion

3 General Structure of QA System[1]
Question Analysis Matching with Data Query Construction Answer retrieval Scoring Natural Language Question Answer(s) Linguistic Tools and Resources KDs corpora Data Sources Ontology Index

4 Main challenges Variability of Natural Language
How can you tell if you have the flu? What are signs of the flu? Complexity of Natural Language Of current U.N. member countries with 4-letter names, the one that is first alphabetically. Who produced the most films?

5 Main challenges Gap between Natural Language and Data Sources
String Differences wife of, husband of ---- dbo:spouse Structure Differences Who are the great-grandchildren of Bruce Lee? dbo:child Quality and Heterogeneity of Data Sources Completeness and accuracy Open Information Extraction Different Schemas dbo:location dbo:headquarter dbo:locationCity

6 Template-based Question Answering[2]

7 Motivation Traditional methods map a natural language question to a triple-based representation Who wrote The Neverending Story? <person; wrote; Neverending Story> Some question can be represented this way Which cities have more than three universities? <cities; more than; three universities> SELECT ?y WHERE { ?x rdf:type onto:University . ?x onto:city ?y . } HAVING (COUNT(?x) > 3)

8 Solution SPARQL template
syntactic structure of natural language question domain-independent expressions Which y p more than N x? SELECT ?y WHERE { ?x rdf:type ?c . ?x ?p ?y . } HAVING (COUNT(?x) > N)

9 Implementation Lexicalized Tree Adjoining Grammar (LTAG)
discourse representation Structure (DRS) Based on manual compiled grammars and rules parser Natural language input grammar LTAG derivation Tree syntactic construction DRS semantic construction formal query Scope resolution Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. initial trees ('α') and auxiliary trees ('β') substitution or adjunction DRT uses discourse representation structures (DRS) to represent a hearer's mental representation of a discourse as it unfolds over time. There are two critical components to a DRS: 1, A set of discourse referents representing entities which are under discussion. 2, A set of DRS conditions representing information that has been given about discourse referents. Consider Sentence (1) below: (1) A farmer owns a donkey. The DRS of (1) can be notated as (2) below: (2) [x,y: farmer(x), donkey(y), owns(x,y)]

10 Experiment 50 questions from the QALD benchmark
11 questions are not in the analysis scope 5 questions cannot be parsed unknown syntactic constructions uncovered domain-independent expressions Who has been the 5th president of the United States of America? 19 have correct answer, 2 are almost correct 13 are wrong or under the threshold Main problem entity identification (Give me all movies with Tom Cruise?) query selection

11 Open Question Answering Over Multiple Knowledge Bases[3]

12 Motivation One knowledge base can not answer all questions
Open Question Answering need information from different knowledge bases Natural language has high variability Different knowledge bases use different knowledge expression

13 Solution Scope: simple factoid questions
Paraphrase to overcome natural language variability Rewrite to match KB schema Express question as triples to utilize all KBs What fruits are a source of vitamin C? ?x : (?x, is-a, fruit) (?x, source of, vitamin c) SELECT t0.arg1 FROM triples AS t0, triples AS t1 WHERE keyword-match(t0.rel, "is-a") AND keyword-match(t0.arg2, "fruit") AND keyword-match(t1.rel, "source of") AND keyword-match(t1.arg2, "vitamin c") AND string-similarity(t0.arg1, t1.arg1) > 0.9

14 Implementation Question Paraphrase Parse Query Rewrite Execute Answer
How can you tell if you have the flu? What are signs of the flu? Query ?x: (?x, sign of, the flu) ?x: (the flu, symptoms, ?x) Answer (the flu, symptoms include, chills) Paraphrase 5 million mined Operators From wikiAnswers Parse 10 high-precision templates Manual created Rewrite 74 million mined operators Mined from corpora Execute 1 billion assertions

15 Experiment KBs: Training over Question and Answer Pairs
Freebase, Open IE, Probase and NELL Training over Question and Answer Pairs Linear scoring function latent-variable structured perceptron algorithm Question and Answer pairs WebQuestions, TREC, WikiAnswers

16 Experiment

17 Structured data and inference in DeepQA[4]

18 Motivation Unstructured data structured data Broad coverage
Low-precision structured data incomplete high-precision Has formal semantics logical reasoning (common sense reasoning/implicit evidence)

19 Temporal and geospatial reasoning
Detect time relations: TLink, birthDate, deathDate Compute temporally compatible birthdate < TLink < deathDate Detect spatial relations relative direction, border, containment, near, far Convert to geo-coordinates from Dbpedia to compute distance or other geospatial relations the symmetry of the borders relation transitivity of the containment relation Evaluation 1% to 2% improvement in accuracy

20 Taxonomic reasoning check candidate answer’s type
Data Source: Dbpedia, YAGO candidate answer – an entity resource question lexical answer type(LAT) - a class in the type system WordNet, domain-specific type-mapping file, statistical relatedness Soring Equivalent/subclass, Disjoint, Sibling, Superclass… Evaluation 3%–4% improvement in accuracy

21 Conclusion Analysis of complex problem is a nontrivial problem
manual compiled grammars and rules Mapping between natural language and KBs has significant impact on the accuracy Semantics light expression(in), structure differences(gf) Structured Data is incomplete, need help from unstructured data

22 Reference [1] Unger, C., Freitas, A., Cimiano P.: An Introduction to Question Answering over Linked Data. Reasoning Web. Reasoning on the Web in the Big Data Era, LNCS, pp (2014) [2] Unger, C., Bühmann, L., Lehmann, et al.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, pp. 639–648. ACM (2012) [3] Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, KDD (2014) [4] Kalyanpur, A., et al.: Structured data and inference in DeepQA. IBM Journal of Research & Development 56(3/4) (2012)


Download ppt "Question Answering & Linked Data"

Similar presentations


Ads by Google