Porting Natural Language Interfaces between Domains - An Experimental User Study with the ORAKEL System - Philipp Cimiano, Peter Haase, Jörg Heizmann Institute.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Opportunistic Reasoning for the Semantic Web: Adapting Reasoning to the Environment Carlos Pedrinaci Tim Smithers and Amaia Bernaras.
Natural Language Interfaces to Ontologies Danica Damljanović
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Toward Linguistically Grounded Ontologies by Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek (Ireland, Netherlands, Germany) presented.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Query Processing and Reasoning How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? Esther Kaufmann and Abraham Bernstein.
NLDB 2004 ORAKEL: A Natural Language Interface to an F-Logic Knowledge Base Philipp Cimiano Institute AIFB University of Karlsruhe NLDB 2004.
By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven question answering in: AQUALog 9 th International Conference.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Speech Analysing Component in Automatic Tutoring Systems Presentation by Doris Diedrich and Benjamin Kempe.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
НИУ ВШЭ – НИЖНИЙ НОВГОРОД EDUARD BABKIN NIKOLAY KARPOV TATIANA BABKINA NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS A method of ontology-aided.
November 2003CSA4050: Semantics I1 CSA4050: Advanced Topics in NLP Semantics I What is semantics for? Role of FOL Montague Approach.
June 12, 2008 The University of Mississippi Design Strategy for Knowledge Base Formation to Automate a Course Map Creation Susan Lukose
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Towards Linguistically Grounded Ontologies Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek Proceedings of the 6 th European Semantic Web.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Supertagging CMSC Natural Language Processing January 31, 2006.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Be.wi-ol.de User-friendly ontology design Nikolai Dahlem Universität Oldenburg.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Software Design and Development Development Methodoligies Computing Science.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
The Semantic Web By: Maulik Parikh.
Semantic Parsing for Question Answering
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Architecture Components
Ontology Evolution: A Methodological Overview
CSc4730/6730 Scientific Visualization
Chapter 11 user support.
CS246: Information Retrieval
Information Retrieval
Presentation transcript:

Porting Natural Language Interfaces between Domains - An Experimental User Study with the ORAKEL System - Philipp Cimiano, Peter Haase, Jörg Heizmann Institute AIFB, University of Karlsruhe (TH) Intelligent User Interfaces (IUI) January 28-31, 2007, Hawaii

Agenda Motivation Natural Language Interfaces The ORAKEL System Adaptation Methodology Experiments and Results Conclusion and Outlook

Motivation Electronic devices get smaller and smaller: limited I/O functionality need for intuitive ways of interacting with devices natural language might be an interesting option for querying knowledge Problems of using natural language: ambiguity at all levels of interpretation large coverage (grammar) robustness and precision adaptability

Natural Language Interfaces (NLIs) Definition: tool allowing users to query/update a database or knowledge base using (restricted/unrestricted) natural language Fashionable research topic in the 70s and 80s Problem too complex ? No businees models ? Renewed interest in the new millennium: Database Technology (mature) Semantic Web More and More Data Electronic Devices get smaller …

Is this a complex task? Intuitively easier than natural language understanding (as a whole): Very focused to a particular domain and KB Relatively short sentences (compared to newspaper text) No discourse phenomena (no anaphora, no ellipsis) More complex the more you move to `real´ dialogue…

Challenge & Research Question Challenge: domain-specific interpretation of a question Research Question: can we develop a model allowing non-NLP experts to easily port the system accross domains? Which river flows through more cities than the Rhein?

KB General Lexicon Domain Lexicon FrameMapper Query Interpreter Query Converter Answer Generation The ORAKEL System

Query Interpreter - Compositional Semantics - Standard compositional semantics approach, i.e. the meaning of a question is composed of the meanings of the words and the way they are connected Parse tree is used to guide the incremental semantics composition Meaning is captured through lambda expressions Three composition operators: Functional application (beta reduction) Renaming of bound variables (alpha reduction) Marked substitution

Query Interpreter (4) - Meaning Construction - S DP VP PP DPP Which river V flows Karlsruhe through

Lexicon (1-5) Tripartite Structure of Lexicon: Domain-specific Lexicon (created by user) Domain-independent Lexicon (pre-encoded) Ontology Lexicon (generated from ontology) All lexicons are actually lexicalized grammars (consisting of trees) used for parsing and construction of the query.

Ontology Lexicon Contains lexical representation of instances and concepts. Generated automatically from the ontology, relying on its labels Lexicon used to generated flected variants, e.g. plural forms No manual work needed by user (!)

Domain-independent Lexicon Contains closed-class words with constant meaning across domains: Determiners: every, most, the most, a, the only, the, all, no, … Prepositions: after, before, in (spatial), in (temporal), … Question pronouns: who, what, which, when, where, … Meaning is captured with respect to foundational categories, e.g. as provided by DOLCE (No manual work by user!)

Domain-specific lexicon Adaptation Mechanism Subcategorization Frames: linguistic predicate-argument stuctures e.g. flow(subj,pcomp(through)) Relations in the Knowledge Base: flowThrough(river,city) Basic idea: user performs mapping between arguments of a subcategorization frame and a relation in the knowledge base Domain-specific lexicon is generated on the background as a byproduct of the mappings performed by a lexicon engineer Research question: Can naive users (in the sense of being unfamiliar with computational linguistics) customize the system to work with a specific knowledge base?

FrameMapper GUI

Type hierarchies Subcategorization Frames Arity2Arity3Arity4 Relation Arity2Arity3 Arity4 TransitiveIntransitive+PPNoun+PPTransitive+PPNoun+PP+PP Binary Relation2x2 JoinTernary Relation2x2 Join´3x2Join

FrameMapper GUI

Adaptation Methodology FrameMapper ORAKEL Lexicon Questions Failed Questions

Evaluation:Goals First Claim: Users not trained in NLP are able to create domain-specific lexica comparable to those created by NLP experts. Second Claim: The coverage of the lexicon will improve proportionally to the number of iterations performed.

Evaluation: Measures Claims: Precision / Recall should be comparable for different users (NLP and non-NLP experts) Recall should increase over iterations

Experimental Settings Lexicon Engineers: NLP expert (no training needed) Master´s student (self-training, no NLP knowlede) Other users: short explanation of types supported by ORAKEL (10 min.) short training on FrameMapper (10 min) End Users: Academic (Researchers and Students), Industrial Received handout describing the task, the knowledge base and some restrictions on the allowed questions Were supposed to ask at least 10 questions to the system Were asked to confirm if the answer provided by the system was correct (yes/no) Lexicon engineers developed the lexicon in different iterations, refining lexica after being presented with the questions not answered by the system (End Users)

German Geography Knowledge Base Created by students at our department in Contains information about cities (+ the states where they are located), rivers (+ the cities they pass), highways (+ the cities they pass), states, capitals of states etc. KB represented in F- Logic type# Cities106 States16 Rivers18 Highways108 Countries9 Seas2

British Telecom´s digital library A digital library created and mantained by British Telecom Used as a case study within the SEKT Project Metadata stored in a database which was mapped to the Proton ontology (and thus accessible through KAON2) for querying! OWL/SPARQL instead of F-Logic Type# Authors Documents Topics17.174

Geography Knowledge Base Goal: compare lexica engineered by NLP and non-NLP experts with respect to ORAKEL performance in terms of precision and recall Setting: NLP expert (A) constructed lexicon from scratch two non-NLP experts (B +C) over two rounds (30min training, 2x30min) 24 end users (8 + 2*4 + 2*4), asking at least 10 questions Conclusions: Comparable results for A as well as B and C (after 2 iterations) Results (in terms of recall) clearly improve after lexicon modification Lexicon#UsersRec. (avg)Prec. (avg) A853.67%84.23% B (1st)444.39%74.53% B (2nd)445.15%80.95% C (1st)435.41%82.25% C (2nd)447.66%80.60%

BT´s digital library Master´s student as lexicon engineer constructed lexicon in three iterations (6h + 2*30m.) 12 Users (three querying rounds with 4 users) Conclusions: Average Recall shows clear improvement over the three rounds ORAKEL can in principle scale to much larger knowledge bases IterationsRec. (avg.)Prec. (avg.) 142%52% 249%71% 361%73%

Related Work No domain-adaptation needed: exploit lexical matches PRECISE [Popescu et al. 2003] Aqualog [Lopez et al. 2004] principled limits: Relation modeled as authorOf(x,y) Asked for „Who wrote what?“ Engineering expertise required: Quetal [Frank et al. 2006] ACE [Fuchs et al. 2006]

Conclusion Adaption mechanism and iterative methodology seems suitable for end users not familiar with natural language processing This has been corroborated by experimental validation showing that: Precision/Recall comparable for NLP and non-NLP experts Recall improves proportionally to the iterations Precision is quite reasonable (73-82%)

The longer-term vision Benefits: Lexica are reused and minor changes performed Not everybody has to develop a lexicon portal.owlportal.lex.owl ORAKEL