Querying Heterogeneous Information Sources Using Source Descriptions Authors: Alon Y. Levy Anand Rajaraman Joann J. Ordille Presenter: Yihong Ding.

Slides:



Advertisements
Similar presentations
SWaNI Project Update Report April Project Outcomes Under review, might not all be possible in conjunction with Skillnet or SITS Interoperability.
Advertisements

1 Data Integration June 3 rd, What is Data Integration? uniform accessmultiple autonomousheterogeneousdistributed Provide uniform access to data.
CS 245Notes 141 CS 245: Database System Principles Notes 14: Coping with Limited Capabilities of Sources Hector Garcia-Molina.
Md. Mahbub Hasan University of California, Riverside.
Name: ___________________________________ I filled _______________________’s bucket today. I filled a bucket by _________________________ _________________________________________.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
ICDT'2001, London, UK1 Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman.
CSE 636 Data Integration Data Integration Approaches.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
Lecture 1 Introduction to the ABAP Workbench
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Model Personalization (1) : Data Fusion Improve frame and answer (of persistent query) generation through Data Fusion (local fusion on personal and topical.
MiniCon: A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger and Alon Levy Affiliates Meeting February 24, 2000.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
Converting an NFA into an FSA Proving LNFA is a subset of LFSA.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Louvain-la-Neuve, 27 th June 2011 Nguyen Thanh-Diane.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Filtering & Selecting Semantic Web Services with Interactive Composition Techniques By Evren Sirin, Bijan Parsia, and James Hendler Presenting By : Mirza.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Integration of Spatial Information Sources Based on Source Description Framework Yoshiharu Ishikawa, Gihyong Ryu, and Hiroyuki Kitagawa University of Tsukuba.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
By: Pramod Jagtap Aniket Bochare. Agenda Introduction to dataset Web service description Service architecture Project plan Intended clients.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved CHAPTER 9 DECISION MAKING.
Facilitating Document Annotation using Content and Querying Value.
1 Composing Web Services on the Semantic Web by Brahim Medjahed Presented by Dohan Kim Lichun Zhu.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Utilizing Databases to Manage Precision Ag Data Candice Johnson BAE 4213 Spring 2004.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
1 Context-Aware Internet Sharma Chakravarthy UT Arlington December 19, 2008.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Post-Ranking query suggestion by diversifying search Chao Wang.
Parameters and Data Driving Parameters and Data Driving USINGQTP65-STUDENT-01A.
Paper Title Authors names Conference and Year Presented by Your Name Date.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.
Data Integration Approaches
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
System A system is a set of elements and relationships which are different from relationships of the set or its elements to other elements or sets.
Facilitating Document Annotation Using Content and Querying Value.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Test Title Test Content.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
CPMGT 300 Week 3 Learning Team Planning Process Groups and Developing the Scope Check this A+ tutorial guideline at
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
A Restaurant Recommendation System Based on Range and Skyline Queries
International Marketing and Output Database Conference 2005
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Context-Aware Internet
Requirements “Content Guide”
Presentation transcript:

Querying Heterogeneous Information Sources Using Source Descriptions Authors: Alon Y. Levy Anand Rajaraman Joann J. Ordille Presenter: Yihong Ding

Challenges for Information Integration Interrelated data over multiple information sources Large number of the sources Limited size of data in many of the sources Greatly variant details of interacting with each source

IM Architecture Bucket algorithm

IM World View Product( Model ) Automobile( Model, Year, Category ) Motorcycle( Model, Year ) Car( Model, Year, Category ) NewCar( Model, Year, Category ) UsedCar( Model, Year, Category ) CarForSale( Model, Year, Category, Price, SellerContact ) Automobile CarMotorcycle Car UsedCarCarForSale Product Automobile Virtual Relations: Classes: NewCar

Source Descriptions For each source: Content Record Capability Record Web Sources for Automobile Application

Content Records of Auto Sources

Capability Records of Auto Sources desired input setpossible output set capable selection set

Query Reformulation Containing instead of equivalent –Incomplete source –Useful subset Utilizes Plan Generator to: –Prune irrelevant sources –Split query into subgoals –Generate conjunctive query plans –Find executable ordering of subgoals

The Bucket Algorithm Given: user query q, source descriptions {V i } 1.Find relevant source (fill buckets) For each relation g in query q Find V j that contains relation g Check that constraints in V j are compatible with q 2.Combine source relations {V j } from each bucket into a conjunctive query q’ and check for containment (q’  q)

The Bucket Algorithm: Example q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

1. Filling the Buckets q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) V 1 (c 1 ) V 2 (c 2 ) V 3 (c 3 ) V 1 (c 1,t 1 ) V 2 (c 2,t 2 ) V 3 (c 3,t 3 ) V 1 (c 1,y 1 ) V 2 (c 2,y 2 ) V 3 (c 3,y 3 ) V 1 (c 1,m 1 ) V 2 (c 2,m 2 ) V 3 (c 3,m 3 ) V 1 (c 1,p 1 ) V 2 (c 2,p 2 ) V 3 (c 3,p 3 ) V 5 (m 5,y 5,r 5 ) CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar

2. Checking Containment User Query q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y  1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r)  V 1 (c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)  1992, Category(c)=sportscar}), V 5 (m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).  ?  Expanded Query q’(m,p,r)  CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y  1992 

Finding an Executable Ordering CarForSale(c), Category(c,t),Year(c,y),Model(c,m),Price(c,p),ProductReview(m,y,r) y  1992 t=sportscar V 1 (c)V 1 (c,t)V 1 (c,y)V 1 (c,m)V 1 (c,p)V 5 (m,y,r) BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail 1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y  1992} 

Experimental Results Query 1: Find titles and years of movies featuring Tom Hanks Query 2: Find titles and reviews of movies featuring Tom Hanks Query 3: Find telephone number(s) for Alaska Airlines

Conclusions Source descriptions as content record and capability record Bucket algorithm for query reformulation