Template-based Question Answering over RDF Data

Slides:



Advertisements
Similar presentations
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Large-Scale Entity-Based Online Social Network Profile Linkage.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games Supervised by Dr. Noriko Tomuro Fall –
By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven question answering in: AQUALog 9 th International Conference.
Introduction to Machine Learning Approach Lecture 5.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Querying RDF Data with Text Annotated Graphs Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng SSDBM’15 
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Keyword Query Routing.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Using linked data to interpret tables Varish Mulwad September 14,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Supertagging CMSC Natural Language Processing January 31, 2006.
Natural Language Interfaces to Ontologies Danica Damljanović
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Automatically Labeled Data Generation for Large Scale Event Extraction
Building a Semantic Parser Overnight
Approaches to Machine Translation
Semantic Parsing for Question Answering
Reading Report on Hybrid Question Answering System
Question Answering over Linked Data
Associative Query Answering via Query Feature Similarity
Logics for Data and Knowledge Representation
A Schema and Instance Based RDF Dataset Summarization Tool
Lecture 12: Data Wrangling
Extracting Semantic Concept Relations
QA Systems in QALD Hybrid Task
G-CORE: A Core for Future Graph Query Languages
Approaches to Machine Translation
Reading Report on Question Answering
Question Answering & Linked Data
Text Mining & Natural Language Processing
Deep Cross-media Knowledge Transfer
CS246: Information Retrieval
Independent Project Natural Language to SQL
WSExpress: A QoS-Aware Search Engine for Web Services
Topic: Semantic Text Mining
Extracting Why Text Segment from Web Based on Grammar-gram
wikiKnows a Qustion Answering System based on Wikipedia Knowledge
Presentation transcript:

Template-based Question Answering over RDF Data Christina Unger , Lorenz Bühmann , Jens Lehmann Axel-Cyrille Ngonga Ngomo ,Daniel Gerber , Philipp Cimiano Yanan Zhang

background Intuitive ways of accessing RDF data become more and more important. Question answering approaches have been proposed as a good compromise between intuitiveness and expressivity. general way: a triple-based representation  e.g. Who wrote The Neverending Story? (PowerAqua): <[person,organization], wrote, Neverending Story>. <Writer, IS_A, Person> <Writer, author, The Neverending Story>

1. (a) Which cities have more than three universities? (b) <[cities], more than, universities three> (c) SELECT ?y WHERE { ?x rdf:type onto:University . ?x onto:city ?y . } HAVING (COUNT(?x) > 3) 2. (a) Who produced the most films? (b)<[person,organization], produced, most films> ?x rdf:type onto:Film . ?x onto:producer ?y . ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 0 the original semantic structure of the question can not be faithfully captured using triples.

contribution a domain-independent question answering approach the question (parse) a SPARQL template Identify domain specific entities SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 OFFSET 0

POS tagger

Who produced the most films? POS tagger (a) who/WP produced/VBD the/DT most/JJS films/NNS Parsing and template generation Domain independent lexicon: 107 entries: light verbs ,question words ,determiners, negation words, coordination and the like. (b) Covered tokens: who, the most, the, most Domain dependent lexicon: built on-the-fly. POS tag ——> syntactic and semantic properties. (c) Building entries for: produced/VBD, films/NNS

POS tag ——> syntactic and semantic properties. Heuristics: Named entities , resources. Nouns , classes, properties. Verbs, properties. If no contribution, instead by noun (Which cities have more than 2 million inhabitants?) syntactic representation semantic representation

Who produced the most films? SPARQL templates: Who produced the most films? (a) SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 OFFSET 0 Slots: <?c, class, films> <?p, property, produced> (b) SELECT ?x WHERE { <?p, property, films>

String s——knowledge base K ——similar entity Entity identification String s——knowledge base K ——similar entity Generic approach S Property detection Large number of expressions can be used to denote the same predicate. (X, the creator of Y and Y is a book by X ) BOA pattern library WordNet Label(e) Entities e S(s)

Sentences: …"label(x) *label(y)" or "label(y) * label(x)"… Pairs: I(p)={(x,y):(x p y)∈K} NLE Ѳ : the form ?D? representation ?R? or ?R? representation ?D? Distinguish patterns that are Specific to property p . Support Typicity Specificity pairs x p y Sentences: …"label(x) *label(y)" or "label(y) * label(x)"… NLE Ѳ Pairs (p, Ѳ) BOA patterns

the highest scored query with a non-empty result. Query ranking and selection String similarity, prominence of entities and the schema of the knowledge base to score a query. Entities score: type checks on queries . (?x p e e p ?x ) Return: the highest scored query with a non-empty result.

Evaluation and discussion The QALD benchmark on Dbpedia: 50 questions annotated with SPARQL queries and answers. Metric: Precision recall Preliminary remark: manually corrected erroneous POS tags in seven questions. 11 questions rely on namespaces which we did not incorporate for predicate detection: FOAF ,YAGO

Unknown domain-independent expressions Results: 19 p:1.0 r:1.0 2 P>0.8 r>0.8 Precision recall Mean 0.61 0.63

The key advantage : Incorrect templates The semantic structure of the natural language input is faithfully captured. e.g. Complex questions containing quantifiers , comparatives, superlatives. Don’t need any user feedback. Incorrect templates No sensible template is constructed. Is there a video game called Battle Chess? Property slot: title or name Rdfs:label   The structure of the templates is sometimes too rigid. Join the EU prop:accessioneudate The sporadic failure of named entity recognition. Battle of Gettysburg

Entity identification Class or property cannot be found on the basis of the slot. Give me all soccer clubs in the Premier League. Onto:league Give me all movies with Tom Cruise. Onto:starring   Hard to match Which cities have more than 2000000 inhabitants prop:populationTotal Who owns Aldi onto:keyPerson Which mountains are higher than the Nanga Parbat prop:elevation

Query selection others A query with the wrong entity instantiating the slot is picked. The slot contains too little information in order to decide among candidates. Founded: prop:foundation, prop:foundingYear, prop:foundingDate,onto:foundationPerson, onto:foundationPlace Which organizations were founded in 1950 When was Capcom founded Which software has been developed by organizations founded in California   others Namespace overlap and chosing one over the other often leads to different results of different quality. …….

Future work Rigid templates: a preprocessing step a more flexible fallback strategy Provide robust question answering for large scale heterogeneous knowledge bases.

Thanks for your listening!