ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.

Slides:



Advertisements
Similar presentations
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.
Advertisements

eClassifier: Tool for Taxonomies
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
A Tool to Support Ontology Creation Based on Incremental Mini- Ontology Merging Zonghui Lian Data Extraction Research Group Supported by Spring Conference.
ITrails: Pay-as-you-go Information Integration in Dataspaces Authors: Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Chapter 19: Information Retrieval
September 26, 2007 iTrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier GirardLukas Blunschi.
CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different.
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
AOSE-2003, Melbourne July 15 th 1 Agent Oriented modeling by interleaving formal and informal analysis Anna Perini 1, Marco Pistore 2,1, Marco Roveri 1,
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Intégration Sémantique de l'Information par des Communautés d'Intelligence en Ligne ISICIL.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 iTrails: Pay-as-you-go Information Integration in Datasapces Authors: Salles, Dittrich et al. (ETH Zurich) Published in VLDB2007 Presenter: Jim 7 Dec.
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
ISURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Prof. Dr. Asuman Dogac METU-SRDC Turkey METU.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Dimitrios Skoutas Alkis Simitsis
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment International Conference of Embedded and Ubiquitous Computing.
Modeling Context Information in Pervasive Computing System Presented by Karen Henricksen, Jadwiga Indulska, and Andry Raktonirany From University of Queensland.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Query Processing over Incomplete Autonomous Databases Presented By Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Intensional Associations in Dataspaces Marcos Vaz Salles Cornell University Jens Dittrich Saarland University Lukas Blunschi ETH Zurich ICDE 2010.
Inference-based Semantic Mediation and Enrichment for the Semantic Web AAAI SSS-09: Social Semantic Web: Where Web 2.0 Meets Web 3.0 March 25, 2009 Dan.
Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic Presented By Dumitru Roman, Michael Kifer University of Innsbruk,
Searching for the Best Engine Presented by Gong GI Hyun, IDS Lab., Seoul National University.
A DDING S TRUCTURE TO T OP -K: F ORM I TEMS TO E XPANSIONS Date : Source : CIKM’ 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh 1.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Facilitating Document Annotation Using Content and Querying Value.
Presented by: Shahab Helmi Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Retrieval
Research on Personal Dataspace Management
A Framework for Testing Query Transformation Rules
Chaitali Gupta, Madhusudhan Govindaraju
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

iTrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich Summerized By Sungchan Park

Copyright  2008 by CEBT Problem: Querying Several Sources Center for E-Business Technology

Copyright  2008 by CEBT Solution #1: Use a Search Engine Center for E-Business Technology

Copyright  2008 by CEBT Solution #2: Use an Information Integration System Center for E-Business Technology

Copyright  2008 by CEBT iTrail Core Idea  Is there an integration solution in-between these two extremes? Center for E-Business Technology

Copyright  2008 by CEBT iTrail Core Idea Center for E-Business Technology  Is there an integration solution in-between these two extremes?  Declaratively add lightweight ‘hints’ to a search engine thus allowing gradual enrichment of loosely integrated data sources

Copyright  2008 by CEBT Example Scenario  Query “pdf yesterday”  Hints(Trails) 1.The date attribute is mapped to modified attribute 2.The date attribute is mapped to received attribute 3.The yesterday keyword is mapped to a query for values of the date attribute equal to the date of yesterday 4.The pdf keyword is mapped to a query for elements whose names end in pdf Center for E-Business Technology

Copyright  2008 by CEBT Where hints come from?  Given by the user Explicitly Via Relevance Feedback  (Semi-)Automatically Information extraction techniques Automatic schema matching Ontologies and thesauri (e.g., wordnet) User communities (e.g., trails on gene data, bookmarks)  All these aspects are beyond the scope of this paper Center for E-Business Technology

Copyright  2008 by CEBT Data and Query Model  Data Model Assume that all data is represented by a logical graph G Query also represented by graph Center for E-Business Technology

Copyright  2008 by CEBT Query Syntax Center for E-Business Technology

Copyright  2008 by CEBT Query Example  “//Home/projects//*[“Mike”]” Center for E-Business Technology

Copyright  2008 by CEBT Basic Form of a Trail  An unidirectional trail  An bidirectional trail Center for E-Business Technology

Copyright  2008 by CEBT Trail Example  Trails in an example scenario Trails Given query – “pdf yesterday” Transformed query – “//*.pdf[modified=yesterday() OR received=yesterday() ].” Center for E-Business Technology

Copyright  2008 by CEBT iTrail Query Processing 1.Matching 2.Transforming 3.Merging Center for E-Business Technology

Copyright  2008 by CEBT iTrail Query Processing Example  Given Query Q 1 = //home/projects//* [“Mike”]  Trail Ψ 8 := //home/*.name -> //calendar//*.tuple.category  Resulting Query Q 1 { Ψ 8 } = //home/projects/*[“Mike”] U //calendar//*[category=“project”]//*.[“Mike”] Center for E-Business Technology  Utilizing G. Miklau and D. Suciu. Containment and Equivalence for an Xpath Fragment. In PODS, 2002.

Copyright  2008 by CEBT Applying Multiple Trail  MMCA(Multiple Match Colouring Algorithm) algorithm Trail can be applied infinitely To prevent infinite recursion, a trail should not be rematched to nodes in a logical plan generated by itself Center for E-Business Technology

Copyright  2008 by CEBT Other Issues  Trail Pruning Problem: MMCA is exponential in number of levels Solution: Trail Pruning – Prune by number of levels – Prune by top-K trails matched in each level Give weight and prob. to trails – Prune by both top-K trails and number of levels  Trail Indexing Precompute trail expressions in order to speed up query processing Trail materialization Center for E-Business Technology

Copyright  2008 by CEBT Experiments  Setting Configured iMeMex to act in three modes – Baseline: Graph / IR search engine – iTrails: Rewrite search queries with trails – Perfect Query: Semantics-aware query Data Center for E-Business Technology

Copyright  2008 by CEBT Experiment, Quality  Compare with baseline Center for E-Business Technology

Copyright  2008 by CEBT Experiment, overhead  Compare with perfect query Overhead is not negligible However, this can be fixed by exploiting trail materializations Center for E-Business Technology

Copyright  2008 by CEBT Experiment, Scalability #1 Center for E-Business Technology  Rewrite Time Query-rewrite time can be controlled with pruning

Copyright  2008 by CEBT Experiment, Scalability #2  Quality Pruning improves precision Center for E-Business Technology

Copyright  2008 by CEBT Conclusion  Our Contributions iTrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri,attribute matches,...) We propose a framework and algorithms for Pay-as-you-go Information Integration Smooth transition between search and data integration  Future Work Trail Creation – Use collections (ontologies, thesauri, wikipedia) – Work on automatic mining of trails from the dataspace Other types of trails Center for E-Business Technology