Download presentation
Presentation is loading. Please wait.
Published byBailey Knop Modified over 10 years ago
1
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong
2
Outline Motivations Overview Basic Concepts Cooperative Query Processing Experiment
3
Motivations XML data – same semantic content – very different structures
4
Example: same semantics, diff structures insurance claims related to smoking for woman User Query: Court Transcript: insurance claim plaintiff woman smoking Insurance Record: insurance claim insurer woman smoking
5
Motivations No exact query result phone number of Bob Who is the new sales manager User Query: personnel sales manager Joe phone number assistant sales manager Bob phone number salesman Data:
6
Overview Goal: – Return approximate answers for XML queries – approximate: semantic + structural similar Solution: – Return a set of results – ranked by an overall score score: indicates how well the subgraph containing the result satisfies the query criteria.
7
Basic Concepts:Query Tree Query:/restaurant[.//Soho]/phone_number Result Term For each edge: head: the end which is closer to nearest result term end: the other end In case of tie, head is the end closer to root Query Tree: restaurant soho phone_number r h t h t
8
Basic Concepts: Converging Order Order of edges considered in query processing Converge on a result term
9
Basic Concepts:Similarity Semantically similar topologies restaurant address soho restaurant soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (a) (c)(e)(b)(d)
10
Basic Concepts: Similarity (cont.) Deviation Proximity (DP) – Measure how far one structure deviates from a desired structure – Given: r a : data node with value a r b : data node with value b Q(a,b): query tree edge – DP: the actual position of r b to the nearest position, r b, which satisfies the topological relationship specified by Q(a,b) Topological relationship: parent-child, ancestor-descendent
11
Deviation Proximity restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant 02313 Q (restaurant, soho) requires parent-child relationship (soho) soho restaurant (soho) DP(restauarent, soho):
12
Deviation Proximity 02303 Q (restaurant, soho) requires anc-desc relationship restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (soho) soho restaurant (soho) DP(restauarent, soho):
13
Cooperative Query Processing Input: a Query Tree Q T, an XML Document Tree D T Output: ordered list of Cooperative Query Processing – Structural proximity calculation – Progressive Score
14
Cooperative Query Processing (cont.) Progressively matching edges in Q T with D T – Consider edges in converging order – For each edge Q T (a,b), where a is head and b is tail, get a list of r a is a node in D T with value a score is the progressive score of r a w.r.t the nearest r b use graph encoding to calculate structural proximity of r a and r b
15
Structural Proximity Calculation Encodings and Compressed Arrays – Compact – Preserve relationship to a larger graph – Facilitate distance calculations Proximity Searching
16
Encodings and Compressed Arrays Basic Concepts: – Common Node – Terminal Node – Annotated Node Path representation – Representing Single Path – Representing Multiple Paths – Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes
17
Encodings and Compressed Arrays
18
Representing Single Path 1.1.1 y 1 1.2.1.1.1.1 y 2
19
Representing Multiple Paths 1.3 B.B.2.1.1 C.3 C.C.2 y 3
20
Representing Multiple Elements 1 A.A.1.1 y 1.2.1.1.1.1 y 2.3 B.B.2.1.1 C.3 C.C.2 y 3
21
Compressed Arrays
22
Drawback of Encoding 1 A.A.1 B.B.1 D.2 E. ?.2 C.C.1 F.2 G
23
Proximity Searching Multi-Element Comparison – Input: A compressed array, caN, containing the multi-element encoding of the Near Set. A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. – output: dist, the shortest path from EF to the closest element in Near Set
24
Proximity Searching MinDist=5MinDist = 4MinDist = 2
25
Progressive Score Accumulative Deviation Proximity (DP) – Calculated from structural proximity Boolean operator at Query Tree branches a b c a b c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))
26
Experiment Query: //restaurant/soho XML: Query Result:
27
Thank you!
28
Questions & Answers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.