NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
Semantics Static semantics Dynamic semantics attribute grammars
Chapter 5: Introduction to Information Retrieval
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Object-Oriented Analysis and Design
Search Engines and Information Retrieval
Chapter 1 Software Development. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 1-2 Chapter Objectives Discuss the goals of software development.
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 7 Title : Object Oriented Analysis and Design Reading: I. Sommerville,
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Kari R. Schougaard, PhD Stud. Værktøjer og Teknikker, 2006 UNIVERSITY OF AARHUS Department of Computer Science Unified Modeling Language Visual language.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Automatic Data Ramon Lawrence University of Manitoba
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
4/20/2017.
International Conference on Concurrent Enterprising TU Dresden, Lst. für Computeranwendung im Bauwesen C Wagner, Katranuschkov, Scherer 27-June-2001 A.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Search Engines and Information Retrieval Chapter 1.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
XML in SQL Server Overview XML is a key part of any modern data environment It can be used to transmit data in a platform, application neutral form.
An XPath-based Preference Language for P3P IBM Almaden Research Center Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.
APPLICATIONS OF CONTEXT FREE GRAMMARS BY, BRAMARA MANJEERA THOGARCHETI.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Object Oriented Analysis & Design & UML (Unified Modeling Language)1 Part V: Design The Design Workflow Design Classes Refining Analysis Relationships.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Querying Structured Text in an XML Database By Xuemei Luo.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Schematron Tim Bornholtz. Schema languages Many people turn to schema languages when they want to be sure that an XML instance follows certain rules –DTD.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
1-1 Software Development Objectives: Discuss the goals of software development Identify various aspects of software quality Examine two development life.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Behavioral Patterns CSE301 University of Sunderland Harry R Erwin, PhD.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Object storage and object interoperability
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
Authors: Magesh Jayapandian and H.V. Jagadish Chris Truszkowski.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Presentation transcript:

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008

Authors  YUNYAO LI IBM Almaden Research Center  HUAHAI YANG University at Albany, State University of New York  H. V. JAGADISH University of Michigan

Overview  Goal is to provide a generic natural language query interface to an XML database  Use English language as the natural language  Translate the natural language query into a meaningful query for a XML database

Schema Free XQuery  Not necessary to map the query into the database schema  Target language of the translation from the natural language  Beginning with a given collection of keywords, each of which identifies a candidate XML element to relate to, the MQF (meaningful query focus) of these elements, if one exists, automatically finds relationships between these elements, if any, including additional related elements as appropriate

Three Step Process 1.Token Classification 2.Parse tree validation 3.Translation into XQuery  These three key steps are independent of one another.  Improvements can be made to any one without impacting the other two.

Token Classification  MINIPAR State of the art dependency processor Free, off of the shelf software Identify words or phrases in the query that can be mapped to components of XQuery  Token – matches XQuery component  Marker – no matching XQuery component

Tokens

Markers

Translation into XQuery  Parse tree must be valid (Step #2)  Any reasonably designed XML document should reflect certain semantic structure isomorphous to human conceptual structure, and hence expressible by human natural language  A natural language query may contain multiple name tokens, each corresponding to an element or an attribute in the database  Name tokens “related” to each other should be mapped into the same mqf function in Schema-Free XQuery and hence found in structurally related elements in the database  Elements/attributes specified in the same XQuery statement are related either via structural join or value join.  Determining grouping and nesting for aggregation functions is difficult, because the scope of an aggregation function is not always obvious from the token it directly attaches to.

Interactive Query Formulation  Makes no attempt at superior understanding of natural language  Approach is to get the user to rephrase the query into terms that we can understand  The linguistic capability of the system is constrained by the expressiveness of XQuery  Semantics extracted by NaLIX  Tokens that can be directly mapped in XQuery  Semantic relationships between tokens  Error message are dynamically generated and provide suggestion on how to fix the error  No additional rules are needed to deal with incorrect English. If the parse tree of a grammatically incorrect English sentence is valid, the sentence can still be translated into XQuery

Iterative Search Theory  Users often iteratively modify their queries based on the results obtained  Forcing the user to fully specify each follow-up query gets in the way of normal iterative search  In NaLIX, the basic unit of iterative search is a query tree  The user can easily return to any point of recent search history to take a different search direction

Iterative Search Issues  Identification of equivalent objects between a follow-up query and its prior queries  Reference resolution to determine the semantic meaning of references to prior queries in the follow-up query  Limited linguistic capability remains an issue when handling follow-up queries; we thus need to design interactive facilities to guide users to formulate follow-up queries

Query Context  Tokens in the query and the patterns of tokens that correspond to XQuery fragments, and the query context of its parent query  Context center - is the lowest name token among those whose corresponding basic variables are not included in a WHERE clause. If no such name token exists, then a context center is a name token whose corresponding basic variable is included in a RETURN clause  Limit a query tree to have only one context center at any time

Reference Resolution  Reference resolution in NaLIX is therefore equivalent to the task of finding the corresponding name tokens in the parent query context for a reference token.  The concept of query context inheritance allows our system to be relatively robust against errors in reference resolution  Reference resolution errors have no negative impact on query translation results, unless the reference token is involved in the situations that cause changes to query context

Follow-up Query Formulation  A follow-up query is often an incomplete sentence  Since only a valid query is allowed to have follow-up queries, the parent query of any follow-up query is valid.  Therefore, a valid follow-up query is still confined by XQuery syntax but only needs to provide valid XQuery fragments, instead of a valid full XQuery expression

Query Tree Detection  Users often simply type in a query as a follow-up query even though the query is self-contained and essentially starts a new query tree  If a given query entered as a follow- up query does not need any information provided by its prior queries, then it can be regarded as a new root query

Experiment  NaLIX vs. keyword search interface  Measurements  Ease of use  Search Quality  Search time was considered, but results were not of sufficient interest to be included in the paper

Results - Time

Results - Iterations

Results - Precision

Results - Recall

Related Works  Schema-Free XQuery  Natural Language Interface to Databases  Dependency Parsers  Support for Iterative Database Search  Communication Models  Automatic Topic Discovery  Reference Resolution

Conclusion  Working implementation – not just a theory  Supports comparison predicates, conjunctions, simple negation, quantification, nesting, aggregation, value joins, and sorting  Future support disjunction, multi sentence queries, complex negation, and composite result construction  Have a request for production deployment by a group outside of computer science  Expect it to lead to a whole new generation of query interfaces for databases

Screenshot