Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker)

Slides:



Advertisements
Similar presentations
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Advertisements

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
A N I NTERACTIVE C LUSTERING - BASED A PPROACH TO I NTEGRATING S OURCE Q UERY I NTERFACES ON THE D EEP W EB Wensheng Wu Clement Yu AnHai Doan Weiyi Meng.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Aki Hecht Seminar in Databases (236826) January 2009
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
Extracting Structured Data from Web Page Arvind Arasu, Hector Garcia-Molina ACM SIGMOD 2003.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Trees CMSC 433 Chapter 8.1 Nelson Padua-Perez Bill Pugh.
Priority Queues1 Part-D1 Priority Queues. Priority Queues2 Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is.
Fundamentals of Python: From First Programs Through Data Structures
Scheduling Master - Slave Multiprocessor Systems Professor: Dr. G S Young Speaker:Darvesh Singh.
Stop Word and Related Problems in Web Interface Integration Eduard C. Dragut (speaker) Fang Clement Yu Prasad Sistla Weiyi Meng Eduard C. Dragut (speaker)
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
COSC2007 Data Structures II
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Stop Word and Related Problems in Web Interface Integration Eduard C. Dragut (speaker)‏ Fang Clement Yu Prasad Sistla Weiyi Meng University of Illinois.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Querying Structured Text in an XML Database By Xuemei Luo.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj RM/SQL Lecture 1 ©Akhilesh Bajaj, 2000, 2002, 2003, All.
Characterizing Matrices with Consecutive Ones Property
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Chapter 9 View Design and Integration. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Motivation for view design.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
WebIQ: Learning from the Web to Match Deep-Web Query Interfaces Wensheng Wu Database & Information Systems Group University of Illinois, Urbana Joint work.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
PC-Trees & PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Object storage and object interoperability
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Integrating Web Query Results: Holistic Schema Matching Shui-Lung Chuang.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
PC-Trees vs. PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Apriori Algorithm and the World Wide Web Roger G. Doss CIS 734.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Database Systems: Design, Implementation, and Management Tenth Edition
Lecture 1 (UNIT -4) TREE SUNIL KUMAR CIT-UPES.
Meaningful Labeling of Integrated Query Interfaces
Part-D1 Priority Queues
INSTRUCTOR: MRS T.G. ZHOU
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

Merging Source Query Interfaces on Web Databases Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago SUNY at Binghamton University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago SUNY at Binghamton ICDE 2006, Atlanta, USA

Page 2 E. Dragut et al - Merging Source Query Interfaces on Web Databases orbitz.com A Motivating Scenario: aa.com  Looking for a ticket  Chicago – Atlanta, April 3 rd – April 9 th  A user looking for the “ best ” price for a ticket:  Has to explore multiple sources  It is tedious, frustrating and time-consuming delta.com

Page 3 E. Dragut et al - Merging Source Query Interfaces on Web Databases The goal  Provide a unified way to query multiple sources in the same domain priceline.com nwa.com delta.com united.com Unified query interface Airfare.com The Web Formulate the query

Page 4 E. Dragut et al - Merging Source Query Interfaces on Web Databases Auto Overview Integrating Query Interfaces Extract query interfaces He05, Zhang04 Various formats e.g. ASCII files (Deep) Web Merge Query Interfaces H.He03 Cluster query interfaces Peng04 Match query interfaces B.He03, Dhamankar04, Doan02, Madvan05, Wu04  The topic of this presentation Car Rental Books Airfare

Page 5 E. Dragut et al - Merging Source Query Interfaces on Web Databases Merge Algorithm  The input  A set of query interfaces in the same domain  E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity  Each query interface is represented hierarchically [Wu04]  And a mapping, globally characterizing the semantic correspondences between the fields in the query interfaces.  Organized in clusters (e.g. [Wu04 et al, B.He03 et al]) vacations.net

Page 6 E. Dragut et al - Merging Source Query Interfaces on Web Databases An Example c_DepCityc_DestCityc_DepMonthc_DepDayc_DepTimec_DepYear (Travel,3)(Travel,4)(Travel,7)(Travel,6)(Travel,8)(Travel,null) (PriceLine,2)(PriceLine,3)(PriceLine,5)(PriceLine,6)(PriceLine,null)(PriceLine,7) (British,2)(British,3)(British,9)(British,8)(British,null) c_Adutsc_Infantsc_Childrenc_Seniorsc_Airlinesc_Class (Travel,14)(Travel,null)(Travel,15)(Travel,16)(Travel,12)(Travel,null) (PriceLine,12)(PriceLine,14)(PriceLine,13)(PriceLine,null) (British,5)(British,null)(British,6)(British,null) (British,13)  Three fragments of query interfaces represented hierarchically  The mapping between them, i.e. the set of clusters

Page 7 E. Dragut et al - Merging Source Query Interfaces on Web Databases Merge Algorithm  The output  A unified query interface that  consists of all the fields of individual interfaces, i.e. it has a field for each of the clusters in the mapping definition  preserves all the constraints enforced by the interfaces being merged  The constraints to be satisfied by the global interface are:  the grouping constraints (to be described) and  the ancestor-descendant relationships among the elements within individual interfaces.

Page 8 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping  Within a domain of discourse (e.g. Airfare) we observe:  A spatial locality property among the fields of query interfaces  Designers tend to place related fields close to each other  Hence, in the integrated interface these fields should be placed in adjacent positions, too

Page 9 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem  The goal (requirement)  Groups of fields that occur together in the source query interfaces to appear together in the integrated interface  The actual order of elements is immaterial  The problem  Find a partition over the set of fields of a given domain characterizing the way fields are grouped in the integrated interface.

Page 10 E. Dragut et al - Merging Source Query Interfaces on Web Databases Capture Grouping Constrains  Introduce the notion of potential groups  Informally, it is a maximal set of adjacent sibling leaves whose parent is not the root  Capture the way fields are organized within source query interfaces  Underline designer’s perspective that these fields should be together so that users can easily understand what is required and fill in the desired information with ease. The set of all potential groups induced by query interface Travel  Example

Page 11 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups  Use these structural information collected from multiple source interfaces to infer the way fields are organized in the integrated interface  Introduce the notion of a group of fields  Informally, it is a sequence of fields that preserves the adjacency constraints within related potential groups  Two potential groups are related if their intersection is nonempty.  A group represents the desired organization of the fields in an integrated interface  An example:  Set of related potential groups:  {Depday, DepMonth, DepTime}, {Departure month, Departure day, Departure Year}, {depDay, depMonth}  The resulted group:  [DepTime, Departure day, Departure month, Departure Year]

Page 12 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem as C1P  The grouping problem can be cast into the Consecutive Ones Property (C1P) problem [Booth76 et al, Fulkerson65 at al].  For an universal set U and a subset, B, of the power set of U we want a permutation п of the elements of U such that all the elements in each set in B appear as a consecutive sequence in п.  In our grouping problem  Potential groups correspond to the set B  U is the union of the fields in the potential groups  П is the desired permutation of the fields  Several algorithms to obtain the groups in the integrated schema  E.g. PQ-tree algorithm [Meidanis98 et al]  Used in our implementation

Page 13 E. Dragut et al - Merging Source Query Interfaces on Web Databases Grouping Problem as C1P  An example of applying the PQ-tree algorithm  Set of related potential groups:  B = {{c_DepDay, c_DepMonth, c_DepTime}, {c_DepMonth, c_DepDay, c_DepYear}, {c_DepDay, c_DepMonth}}  U = {c_DepDay, c_DepMonth, c_DepYear, c_DepTime} Frontier gives the group  A permutation satisfying all related potential groups cannot always be derived  Minimize the number of violations

Page 14 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups  On the running example  The set of all groups  [c_DepCity, c_DestCity]  [c_DepTime, c_DepDay, c_DepMonth, c_DepYear]  [c_Seniors, c_Adults, c_Children, c_Infants]

Page 15 E. Dragut et al - Merging Source Query Interfaces on Web Databases Constructing Groups  On the running example  The set of all groups  [c_DepCity, c_DestCity]  [c_DepTime, c_DepDay, c_DepMonth, c_DepYear]  [c_Seniors, c_Adults, c_Children, c_Infants] They were not considered (children of the root)

Page 16 E. Dragut et al - Merging Source Query Interfaces on Web Databases Pairwise merge  For a set of query interfaces:  Iteratively merge two at a time  Traversing the schema trees bottom-up  Placing of group elements  Preserving ancestor-descendant relationships in the source schemas  On the running example  First iteration

Page 17 E. Dragut et al - Merging Source Query Interfaces on Web Databases Pairwise merge  Second iteration  Note, the fields are naturally placed in the merged interface

Page 18 E. Dragut et al - Merging Source Query Interfaces on Web Databases Experiment  Setup  Five real world domain:  Mapping consists of clusters [Wu04 et al] Domain # interfaces Avg. # fields per interface Avg. # internal nodes per interface Avg. depth of interfaces Airfare Automobile Book Job Real Estate

Page 19 E. Dragut et al - Merging Source Query Interfaces on Web Databases Experiment  The characteristics of the integrated interfaces. Domain # potential groups # groups# Violations # Fields on the integ. interface Depth of the integ. interface Airfare Automobile Book Job Real Estate  All group constraints are satisfied with the exception of two potential groups in the airline domain  [Seniors, Adults, Children, Infants] and [Airline, Class, NonStop].

Page 20 E. Dragut et al - Merging Source Query Interfaces on Web Databases Example Integrated Interfaces  Airfare domain integrated interface  Note that fields are placed naturally

Page 21 E. Dragut et al - Merging Source Query Interfaces on Web Databases Example Integrated Interfaces  Auto domain integrated interface  Note that fields are placed naturally

Page 22 E. Dragut et al - Merging Source Query Interfaces on Web Databases End  Please visit the project web site  Thank you for your time and patience!