Question Answering & Linked Data

Slides:

Advertisements

Similar presentations

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Advertisements

Knowledge Base Completion via Search-Based Question Answering

Improved TF-IDF Ranker

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

UIMA David Gondek Knowledge Capture and Learning DeepQA IBM Research.

Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.

Web search results clustering Web search results clustering is a version of document clustering, but… Billions of pages Constantly changing Data mainly.

Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

A Natural Language Interface for Crime-related Spatial Queries Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar Department of Computer Science.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.

Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.

Structured Use of External Knowledge for Event-based Open Domain Question Answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh National University.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.

Semantically Processing The Semantic Web Presented by: Kunal Patel Dr. Gopal Gupta UNIVERSITY OF TEXAS AT DALLAS.

TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando.

3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.

Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.

Using linked data to interpret tables Varish Mulwad September 14,

Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.

Supertagging CMSC Natural Language Processing January 31, 2006.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Conclusions Presenter: Manolis Koubarakis Extended Semantic Web Conference 2012.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.

Rule-based Reasoning in Semantic Text Analysis

Approaches to Machine Translation

PRESENTED BY: PEAR A BHUIYAN

Open question answering over curated and extracted knowledge bases

A Brief Introduction to Distant Supervision

From natural language to Bayesian Networks (and back)

Semantic Parsing for Question Answering

An Empirical Study of Learning to Rank for Entity Search

Reading Report on Hybrid Question Answering System

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Syntax Analysis Chapter 4.

Reading Report: Open QA Systems

Formal Language Theory

Reading Report Semantic Parsing: Sempre (自始至终)

Learning to Transform Natural to Formal Languages

Syntax Analysis Sections :.

Web IR: Recent Trends; Future of Web Search

Social Knowledge Mining

Semantic Network & Knowledge Graph

Extracting Semantic Concept Relations

Table Cell Search for Question Answering Huan Sun

Approaches to Machine Translation

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Summarization for entity annotation Contextual summary

ProBase: common Sense Concept KB and Short Text Understanding

Template-based Question Answering over RDF Data

Presentation transcript:

Question Answering & Linked Data Wang Yong

Content Overview of QA System Template-based Question Answering Open Question Answering Over Multiple Knowledge Bases Structured data and inference in DeepQA Conclusion

General Structure of QA System[1] Question Analysis Matching with Data Query Construction Answer retrieval Scoring Natural Language Question Answer(s) Linguistic Tools and Resources KDs corpora Data Sources Ontology Index

Main challenges Variability of Natural Language How can you tell if you have the flu? What are signs of the flu? Complexity of Natural Language Of current U.N. member countries with 4-letter names, the one that is first alphabetically. Who produced the most films?

Main challenges Gap between Natural Language and Data Sources String Differences wife of, husband of ---- dbo:spouse Structure Differences Who are the great-grandchildren of Bruce Lee? dbo:child Quality and Heterogeneity of Data Sources Completeness and accuracy Open Information Extraction Different Schemas dbo:location dbo:headquarter dbo:locationCity

Template-based Question Answering[2]

Motivation Traditional methods map a natural language question to a triple-based representation Who wrote The Neverending Story? <person; wrote; Neverending Story> Some question can be represented this way Which cities have more than three universities? <cities; more than; three universities> SELECT ?y WHERE { ?x rdf:type onto:University . ?x onto:city ?y . } HAVING (COUNT(?x) > 3)

Solution SPARQL template syntactic structure of natural language question domain-independent expressions Which y p more than N x? SELECT ?y WHERE { ?x rdf:type ?c . ?x ?p ?y . } HAVING (COUNT(?x) > N)

Implementation Lexicalized Tree Adjoining Grammar (LTAG) discourse representation Structure (DRS) Based on manual compiled grammars and rules parser Natural language input grammar LTAG derivation Tree syntactic construction DRS semantic construction formal query Scope resolution Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. initial trees ('α') and auxiliary trees ('β') substitution or adjunction DRT uses discourse representation structures (DRS) to represent a hearer's mental representation of a discourse as it unfolds over time. There are two critical components to a DRS: 1, A set of discourse referents representing entities which are under discussion. 2, A set of DRS conditions representing information that has been given about discourse referents. Consider Sentence (1) below: (1) A farmer owns a donkey. The DRS of (1) can be notated as (2) below: (2) [x,y: farmer(x), donkey(y), owns(x,y)]

Experiment 50 questions from the QALD benchmark 11 questions are not in the analysis scope 5 questions cannot be parsed unknown syntactic constructions uncovered domain-independent expressions Who has been the 5th president of the United States of America? 19 have correct answer, 2 are almost correct 13 are wrong or under the threshold Main problem entity identification (Give me all movies with Tom Cruise?) query selection

Open Question Answering Over Multiple Knowledge Bases[3]

Motivation One knowledge base can not answer all questions Open Question Answering need information from different knowledge bases Natural language has high variability Different knowledge bases use different knowledge expression

Solution Scope: simple factoid questions Paraphrase to overcome natural language variability Rewrite to match KB schema Express question as triples to utilize all KBs What fruits are a source of vitamin C? ?x : (?x, is-a, fruit) (?x, source of, vitamin c) SELECT t0.arg1 FROM triples AS t0, triples AS t1 WHERE keyword-match(t0.rel, "is-a") AND keyword-match(t0.arg2, "fruit") AND keyword-match(t1.rel, "source of") AND keyword-match(t1.arg2, "vitamin c") AND string-similarity(t0.arg1, t1.arg1) > 0.9

Implementation Question Paraphrase Parse Query Rewrite Execute Answer How can you tell if you have the flu? What are signs of the flu? Query ?x: (?x, sign of, the flu) ?x: (the flu, symptoms, ?x) Answer (the flu, symptoms include, chills) Paraphrase 5 million mined Operators From wikiAnswers Parse 10 high-precision templates Manual created Rewrite 74 million mined operators Mined from corpora Execute 1 billion assertions

Experiment KBs: Training over Question and Answer Pairs Freebase, Open IE, Probase and NELL Training over Question and Answer Pairs Linear scoring function latent-variable structured perceptron algorithm Question and Answer pairs WebQuestions, TREC, WikiAnswers

Experiment

Structured data and inference in DeepQA[4]

Motivation Unstructured data structured data Broad coverage Low-precision structured data incomplete high-precision Has formal semantics logical reasoning (common sense reasoning/implicit evidence)

Temporal and geospatial reasoning Detect time relations: TLink, birthDate, deathDate Compute temporally compatible birthdate < TLink < deathDate Detect spatial relations relative direction, border, containment, near, far Convert to geo-coordinates from Dbpedia to compute distance or other geospatial relations the symmetry of the borders relation transitivity of the containment relation Evaluation 1% to 2% improvement in accuracy

Taxonomic reasoning check candidate answer’s type Data Source: Dbpedia, YAGO candidate answer – an entity resource question lexical answer type(LAT) - a class in the type system WordNet, domain-specific type-mapping file, statistical relatedness Soring Equivalent/subclass, Disjoint, Sibling, Superclass… Evaluation 3%–4% improvement in accuracy

Conclusion Analysis of complex problem is a nontrivial problem manual compiled grammars and rules Mapping between natural language and KBs has significant impact on the accuracy Semantics light expression(in), structure differences(gf) Structured Data is incomplete, need help from unstructured data

Reference [1] Unger, C., Freitas, A., Cimiano P.: An Introduction to Question Answering over Linked Data. Reasoning Web. Reasoning on the Web in the Big Data Era, LNCS, pp. 100-140 (2014) [2] Unger, C., Bühmann, L., Lehmann, et al.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, pp. 639–648. ACM (2012) [3] Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, KDD (2014) [4] Kalyanpur, A., et al.: Structured data and inference in DeepQA. IBM Journal of Research & Development 56(3/4) (2012)