Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Slides:



Advertisements
Similar presentations
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv.
Advertisements

XML: Extensible Markup Language
CS162 Week 2 Kyle Dewey. Overview Continuation of Scala Assignment 1 wrap-up Assignment 2a.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Great Theoretical Ideas in Computer Science for Some.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Oblivious Querying of Data with Irregular Structure.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
1 COS 425: Database and Information Management Systems XML and information exchange.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Fall 2001Arthur Keller – CS 18017–1 Schedule Nov. 27 (T) Semistructured Data, XML. u Read Sections Assignment 8 due. Nov. 29 (TH) The Real World,
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
5-1 Facilitating Business over the Internet: The XML language CR (2004) Prentice Hall, Inc. The xml goals The main objects of xml: Diagrams: Blocks and.
4/20/2017.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Automatic methods for functional annotation of sequences Petri Törönen.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Cooperative Query Answering for Semistructured data Michael Barg Raymond K. Wong Reviewed by SwethaJack Christian (Absent) Chris.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj ER Model Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003, 2004.
CS 415 – A.I. Slide Set 5. Chapter 3 Structures and Strategies for State Space Search – Predicate Calculus: provides a means of describing objects and.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Database Systems Part VII: XML Querying Software School of Hunan University
COMP 208/214/215/216 – Lecture 8 Demonstrations and Portfolios.
Not only mark-up languages! There are other many other grammar formalisms and tools than XML. Some of them standardized (ASN). Even XML does not always.
1 Relational Algebra and Calculas Chapter 4, Part A.
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Operations in the Relational Model COP 4720 Lecture 8 Lecture Notes.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
SCIM conference call 4 September Issue #2 Add pagination capability to plural Resource attributes User Group retrieval could be resource intensive,
1 Information Retrieval LECTURE 1 : Introduction.
Chapter 7 Complex Similarity Topix. About this chapter Extends previous discussed methods The reader can choose to read about only specific methods, depending.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Linking XML Documents Ellen Pearlman Eileen Mullin Programming the Web.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Chapter 5 The Semantic Web 1. The Semantic Web  Initiated by Tim Berners-Lee, the inventor of the World Wide Web.  A common framework that allows data.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Incomplete Answers over Semistructured Data Kanza, Nutt, Sagiv PODS 1999 Slides by Yaron Kanza.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
Conceptual Modeling for XML Data
Entity-Relationship Model
Computing Full Disjunctions
No Path Queries for XML Hwan-Seung Yong
Associative Query Answering via Query Feature Similarity
Information Retrieval
Managing XML and Semistructured Data
Early Profile Pruning on XML-aware Publish-Subscribe Systems
2/18/2019.
MCN: A New Semantics Towards Effective XML Keyword Search
Anthony Okorodudu CSE Answering Imprecise Queries over Autonomous Web Databases By Ullas Nambiar and Subbarao Kambhampati Anthony Okorodudu.
Introduction to XML IR XML Group.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Inexact Querying of XML

XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values of attributes in elements –Data has structural variations: Relationships between elements are represented differently in different parts of the document –Data has ontology variations: Different labels are used to describe nodes of the same type (Note: In some of the upcoming slides, we have labels on edges instead of on nodes.)

Movie Database Movie Actor T.V. Series Film Actor TitleName Title Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 36 Year Year 21 Actor Name 30 Mark Hamill Léon Movie 13 Title 33 Magnolia The movie has a year attribute Incomplete Data The year of the movie is missing

Movie Database Movie Actor T.V. Series Film Actor TitleName Title Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 36 Year Year Actor Name 30 Mark Hamill Léon Movie 13 Title 33 Magnolia Variations in Structure 11 Movie below Actor Actor below Movie

Movie Database Movie Actor T.V. Series Film Actor TitleName Title Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 35 Year Year 21 Actor Name 30 Mark Hamill Léon Movie 13 Title 34 Magnolia A movie labelA film label Ontology Variations

The description of the schema is large (e.g., a DTD of XML) The description of the schema is large (e.g., a DTD of XML) It is difficult to use the schema when formulating queries It is difficult to use the schema when formulating queries Data is contributed by many users in a variety of designs Data is contributed by many users in a variety of designs The query should deal with different structures of data The query should deal with different structures of data The structure of the database is changed frequently The structure of the database is changed frequently Queries should be rewritten frequently Queries should be rewritten frequently Need to allow the user to write an “approximate query” and have the query processor deal with it

The Problem In many different domains, we are given the option to query some source of information Usually, the user only gets results if the query can be completely answered (satisfied) In many domains, this is not appropriate, e.g., –The user is not familiar with the database –The database does not contain complete information –There is a mismatch between the ontology of the user and that of the database

Example 1 ישוב: באר שבע איזור חיוג : 03

היישוב הנבחר אינו מופיע באיזור החיוג הנבחר!

עלייה: חיפה – טכניון ירידה: אילת

אין קו ישיר המחבר בין הנקודות הנבחרות

עלייה: ירידה: אילת

פרטי המקצוע: בסיסי נתונים

לא נמצאו מקצועות מתאימים

What Do Users Need? Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist These partial answers should contain maximal information Problem: –It is easy to define when an answer satisfies a query –Hard to say when an answer that does not satisfy a query is of interest –Hard to say which incomplete answers are better than others

Modeling a Database and a Query It is useful to model both databases and queries as labeled directed graphs –Clean mathematical modeling! –Captures the essentials of XPath, XQuery

University Database Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology

Query University Dept Faculty Name Exact answers are defined by exact matchings, i.e., subgraph homorphisms This query asks for the names of all faculty members (of any type) How would you write this in XPath?

Exact Answers Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology University Dept Faculty Name

Exact Answers Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology University Dept Faculty Name

Slightly More Complex Query University Dept Faculty Name Returns faculty members only from the Biology Department Biology

Exact Answers Are Not Always Useful Problems with exact answers: –labels are not always known –content may be unknown, misspelled, etc. –structure may be unknown, or may vary from one representation to another –we may actually want to perform a search, since the query is a vague hypothesis –do not allow users to get partial/vague answers where none better exist

Manually Adding Inexactness One can use language constructs in order to get more flexible queries Example: Suppose we want to find courses, with teachers that teach them but we don’t know which hierarchy exists in the database: –for each teacher, there is a list of courses or –for each course, there is a list of teachers –or both…

Technion University Name Dept Name Faculty Name Faculty Teacher Name Course Teacher Name Course Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology Teacher Course Query Needed:

Technion University Name Dept Name Faculty Name Faculty Course Name Teacher Course Name Computer Science Bioinformatics Chana Israeli Avi Levy Biology Molecular Biology Course Teacher Query Needed:

Manually Adding Inexactness (cont.) If we don’t know the hierarchy, we need Teacher Course Teacher Union

Manually Adding Inexactness (cont.) If we don’t know the hierarchy, we need: If we don’t know what exactly the labels are, we might need: Teacher Course Teacher Union Teacher or Lecturer or Professor Course or Seminar or Lab Union Teacher or Lecturer or Professor Course or Seminar or Lab

Help!

Intuition Users write regular queries, stating what they are looking for The query processor uses a built-in strategy to find answers that exactly satisfy the query or inexactly satisfy the query Burden is on the query processor, not on the user

Inexact Answers Many different definitions have been given –For each definition, query processing algorithms have been defined Examples: –Allow some of the nodes of the query to be unmatched –Allow edges in the query to be matched to paths in the database –Allow nodes to be matched to nodes with labels that have a similar meaning Be careful so that answers are meaningful!

Name Area Code City Allow Unmatched Nodes: Bezeq Query Phone Number שמולביץ באר שבע 03

Eilat Matching Edges to Paths: Egged Query Source Destination Technion-Haifa

Similar Meaning Labels Course NameDetails בסיסי נתונים

Other Types of Inexactness Many other definitions have been given, e.g., –allow permutations of nodes in the query –allow child nodes to be promoted –interconnection Summary: Inexactness basically means that we relax some of the query requirements!