Download presentation
Presentation is loading. Please wait.
Published byEmerald Farmer Modified over 9 years ago
1
Inexact Querying of XML
2
XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values of attributes in elements –Data has structural variations: Relationships between elements are represented differently in different parts of the document –Data has ontology variations: Different labels are used to describe nodes of the same type (Note: In some of the upcoming slides, we have labels on edges instead of on nodes.)
3
1 111214 Movie Database Movie Actor 222325 26 27 28 29 T.V. Series Film Actor TitleName Title 3132 34 35 Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 36 Year 1984 24 Year 21 Actor Name 30 Mark Hamill Léon Movie 13 Title 33 Magnolia The movie has a year attribute Incomplete Data The year of the movie is missing
4
1 111214 Movie Database Movie Actor 222325 26 27 28 29 T.V. Series Film Actor TitleName Title 3132 34 35 Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 36 Year 1984 24 Year Actor Name 30 Mark Hamill Léon Movie 13 Title 33 Magnolia Variations in Structure 11 Movie below Actor 29 14 21 Actor below Movie
5
1 111213 Movie Database Movie Actor 222325 26 27 28 29 T.V. Series Film Actor TitleName Title 3132 33 34 Kyle MacLachlan Natalie Portman Harrison Ford 1977 Dune Star Wars Twin Peaks 35 Year 1984 24 Year 21 Actor Name 30 Mark Hamill Léon Movie 13 Title 34 Magnolia A movie labelA film label Ontology Variations
6
The description of the schema is large (e.g., a DTD of XML) The description of the schema is large (e.g., a DTD of XML) It is difficult to use the schema when formulating queries It is difficult to use the schema when formulating queries Data is contributed by many users in a variety of designs Data is contributed by many users in a variety of designs The query should deal with different structures of data The query should deal with different structures of data The structure of the database is changed frequently The structure of the database is changed frequently Queries should be rewritten frequently Queries should be rewritten frequently Need to allow the user to write an “approximate query” and have the query processor deal with it
7
The Problem In many different domains, we are given the option to query some source of information Usually, the user only gets results if the query can be completely answered (satisfied) In many domains, this is not appropriate, e.g., –The user is not familiar with the database –The database does not contain complete information –There is a mismatch between the ontology of the user and that of the database
8
Example 1 ישוב: באר שבע איזור חיוג : 03
9
היישוב הנבחר אינו מופיע באיזור החיוג הנבחר!
10
עלייה: חיפה – טכניון ירידה: אילת
11
אין קו ישיר המחבר בין הנקודות הנבחרות
12
עלייה: ירידה: אילת
13
פרטי המקצוע: בסיסי נתונים
14
לא נמצאו מקצועות מתאימים
15
What Do Users Need? Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist These partial answers should contain maximal information Problem: –It is easy to define when an answer satisfies a query –Hard to say when an answer that does not satisfy a query is of interest –Hard to say which incomplete answers are better than others
16
Modeling a Database and a Query It is useful to model both databases and queries as labeled directed graphs –Clean mathematical modeling! –Captures the essentials of XPath, XQuery
17
University Database Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology
18
Query University Dept Faculty Name Exact answers are defined by exact matchings, i.e., subgraph homorphisms This query asks for the names of all faculty members (of any type) How would you write this in XPath?
19
Exact Answers Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology University Dept Faculty Name
20
Exact Answers Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology University Dept Faculty Name
21
Slightly More Complex Query University Dept Faculty Name Returns faculty members only from the Biology Department Biology
22
Exact Answers Are Not Always Useful Problems with exact answers: –labels are not always known –content may be unknown, misspelled, etc. –structure may be unknown, or may vary from one representation to another –we may actually want to perform a search, since the query is a vague hypothesis –do not allow users to get partial/vague answers where none better exist
23
Manually Adding Inexactness One can use language constructs in order to get more flexible queries Example: Suppose we want to find courses, with teachers that teach them but we don’t know which hierarchy exists in the database: –for each teacher, there is a list of courses or –for each course, there is a list of teachers –or both…
24
Technion University Name Dept Name Faculty Name Faculty Teacher Name Course Teacher Name Course Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology Teacher Course Query Needed:
25
Technion University Name Dept Name Faculty Name Faculty Course Name Teacher Course Name Computer Science Bioinformatics Chana Israeli Avi Levy Biology Molecular Biology Course Teacher Query Needed:
26
Manually Adding Inexactness (cont.) If we don’t know the hierarchy, we need Teacher Course Teacher Union
27
Manually Adding Inexactness (cont.) If we don’t know the hierarchy, we need: If we don’t know what exactly the labels are, we might need: Teacher Course Teacher Union Teacher or Lecturer or Professor Course or Seminar or Lab Union Teacher or Lecturer or Professor Course or Seminar or Lab
28
Help!
29
Intuition Users write regular queries, stating what they are looking for The query processor uses a built-in strategy to find answers that exactly satisfy the query or inexactly satisfy the query Burden is on the query processor, not on the user
30
Inexact Answers Many different definitions have been given –For each definition, query processing algorithms have been defined Examples: –Allow some of the nodes of the query to be unmatched –Allow edges in the query to be matched to paths in the database –Allow nodes to be matched to nodes with labels that have a similar meaning Be careful so that answers are meaningful!
31
Name Area Code City Allow Unmatched Nodes: Bezeq Query Phone Number שמולביץ באר שבע 03
32
Eilat Matching Edges to Paths: Egged Query Source Destination Technion-Haifa
33
Similar Meaning Labels Course NameDetails בסיסי נתונים
34
Other Types of Inexactness Many other definitions have been given, e.g., –allow permutations of nodes in the query –allow child nodes to be promoted –interconnection Summary: Inexactness basically means that we relax some of the query requirements!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.