Download presentation
Presentation is loading. Please wait.
Published byElijah Hawkins Modified over 9 years ago
1
Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago SUNY at Binghamton University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago SUNY at Binghamton ICDE 2006, Atlanta, USA
2
Page 2 E. Dragut et al - Merging Source Query Interfaces on Web Databases orbitz.com A Motivating Scenario: aa.com Looking for a ticket Chicago – Atlanta, April 3 rd – April 9 th A user looking for the “ best ” price for a ticket: Has to explore multiple sources It is tedious, frustrating and time-consuming delta.com
3
Page 3 E. Dragut et al - Merging Source Query Interfaces on Web Databases The goal Provide a unified way to query multiple sources in the same domain priceline.com nwa.com delta.com united.com Unified query interface Airfare.com The Web Formulate the query
4
Page 4 E. Dragut et al - Merging Source Query Interfaces on Web Databases Auto Overview Integrating Query Interfaces Extract query interfaces He05, Zhang04 Various formats e.g. ASCII files (Deep) Web Merge Query Interfaces H.He03 Cluster query interfaces Peng04 Match query interfaces B.He03, Dhamankar04, Doan02, Madvan05, Wu04 The topic of this presentation Car Rental Books Airfare
5
Page 5 E. Dragut et al - Merging Source Query Interfaces on Web Databases Merge Algorithm The input A set of query interfaces in the same domain E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity Each query interface is represented hierarchically [Wu04] And a mapping, globally characterizing the semantic correspondences between the fields in the query interfaces. Organized in clusters (e.g. [Wu04 et al, B.He03 et al]) vacations.net
6
Page 6 E. Dragut et al - Merging Source Query Interfaces on Web Databases An Example c_DepCityc_DestCityc_DepMonthc_DepDayc_DepTimec_DepYear (Travel,3)(Travel,4)(Travel,7)(Travel,6)(Travel,8)(Travel,null) (PriceLine,2)(PriceLine,3)(PriceLine,5)(PriceLine,6)(PriceLine,null)(PriceLine,7) (British,2)(British,3)(British,9)(British,8)(British,null) c_Adutsc_Infantsc_Childrenc_Seniorsc_Airlinesc_Class (Travel,14)(Travel,null)(Travel,15)(Travel,16)(Travel,12)(Travel,null) (PriceLine,12)(PriceLine,14)(PriceLine,13)(PriceLine,null) (British,5)(British,null)(British,6)(British,null) (British,13) Three fragments of query interfaces represented hierarchically The mapping between them, i.e. the set of clusters
7
Page 7 E. Dragut et al - Merging Source Query Interfaces on Web Databases Merge Algorithm The output A unified query interface that consists of all the fields of individual interfaces, i.e. it has a field for each of the clusters in the mapping definition preserves all the constraints enforced by the interfaces being merged The constraints to be satisfied by the global interface are: the grouping constraints (to be described) and the ancestor-descendant relationships among the elements within individual interfaces.
8
Page 8 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Within a domain of discourse (e.g. Airfare) we observe: A spatial locality property among the fields of query interfaces Designers tend to place related fields close to each other Hence, in the integrated interface these fields should be placed in adjacent positions, too
9
Page 9 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem The goal (requirement) Groups of fields that occur together in the source query interfaces to appear together in the integrated interface The actual order of elements is immaterial The problem Find a partition over the set of fields of a given domain characterizing the way fields are grouped in the integrated interface.
10
Page 10 E. Dragut et al - Merging Source Query Interfaces on Web Databases Capture Grouping Constrains Introduce the notion of potential groups Informally, it is a maximal set of adjacent sibling leaves whose parent is not the root Capture the way fields are organized within source query interfaces Underline designer’s perspective that these fields should be together so that users can easily understand what is required and fill in the desired information with ease. The set of all potential groups induced by query interface Travel Example
11
Page 11 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups Use these structural information collected from multiple source interfaces to infer the way fields are organized in the integrated interface Introduce the notion of a group of fields Informally, it is a sequence of fields that preserves the adjacency constraints within related potential groups Two potential groups are related if their intersection is nonempty. A group represents the desired organization of the fields in an integrated interface An example: Set of related potential groups: {Depday, DepMonth, DepTime}, {Departure month, Departure day, Departure Year}, {depDay, depMonth} The resulted group: [DepTime, Departure day, Departure month, Departure Year]
12
Page 12 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem as C1P The grouping problem can be cast into the Consecutive Ones Property (C1P) problem [Booth76 et al, Fulkerson65 at al]. For an universal set U and a subset, B, of the power set of U we want a permutation п of the elements of U such that all the elements in each set in B appear as a consecutive sequence in п. In our grouping problem Potential groups correspond to the set B U is the union of the fields in the potential groups П is the desired permutation of the fields Several algorithms to obtain the groups in the integrated schema E.g. PQ-tree algorithm [Meidanis98 et al] Used in our implementation
13
Page 13 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem as C1P An example of applying the PQ-tree algorithm Set of related potential groups: B = {{c_DepDay, c_DepMonth, c_DepTime}, {c_DepMonth, c_DepDay, c_DepYear}, {c_DepDay, c_DepMonth}} U = {c_DepDay, c_DepMonth, c_DepYear, c_DepTime} Frontier gives the group A permutation satisfying all related potential groups cannot always be derived Minimize the number of violations
14
Page 14 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups On the running example The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants]
15
Page 15 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups On the running example The set of all groups [c_DepCity, c_DestCity] [c_DepTime, c_DepDay, c_DepMonth, c_DepYear] [c_Seniors, c_Adults, c_Children, c_Infants] They were not considered (children of the root)
16
Page 16 E. Dragut et al - Merging Source Query Interfaces on Web Databases Pairwise merge For a set of query interfaces: Iteratively merge two at a time Traversing the schema trees bottom-up Placing of group elements Preserving ancestor-descendant relationships in the source schemas On the running example First iteration
17
Page 17 E. Dragut et al - Merging Source Query Interfaces on Web Databases Pairwise merge Second iteration Note, the fields are naturally placed in the merged interface
18
Page 18 E. Dragut et al - Merging Source Query Interfaces on Web Databases Experiment Setup Five real world domain: Mapping consists of clusters [Wu04 et al] Domain # interfaces Avg. # fields per interface Avg. # internal nodes per interface Avg. depth of interfaces Airfare2010.75.13.6 Automobile205.11.72.4 Book205.41.32.3 Job204.61.12.1 Real Estate206.52.42.7
19
Page 19 E. Dragut et al - Merging Source Query Interfaces on Web Databases Experiment The characteristics of the integrated interfaces. Domain # potential groups # groups# Violations # Fields on the integ. interface Depth of the integ. interface Airfare4682245 Automobile2240183 Book3440193 Job1210192 Real Estate4770284 All group constraints are satisfied with the exception of two potential groups in the airline domain [Seniors, Adults, Children, Infants] and [Airline, Class, NonStop].
20
Page 20 E. Dragut et al - Merging Source Query Interfaces on Web Databases Example Integrated Interfaces Airfare domain integrated interface Note that fields are placed naturally
21
Page 21 E. Dragut et al - Merging Source Query Interfaces on Web Databases Example Integrated Interfaces Auto domain integrated interface Note that fields are placed naturally
22
Page 22 E. Dragut et al - Merging Source Query Interfaces on Web Databases End Please visit the project web site http://www.cs.uic.edu/~edragut/QIProject.html http://www.cs.uic.edu/~edragut/QIProject.html Thank you for your time and patience!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.