Presentation is loading. Please wait.

Presentation is loading. Please wait.

ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.

Similar presentations


Presentation on theme: "ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi."— Presentation transcript:

1 iTrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich 2008-02-22 Summerized By Sungchan Park

2 Copyright  2008 by CEBT Problem: Querying Several Sources Center for E-Business Technology

3 Copyright  2008 by CEBT Solution #1: Use a Search Engine Center for E-Business Technology

4 Copyright  2008 by CEBT Solution #2: Use an Information Integration System Center for E-Business Technology

5 Copyright  2008 by CEBT iTrail Core Idea  Is there an integration solution in-between these two extremes? Center for E-Business Technology

6 Copyright  2008 by CEBT iTrail Core Idea Center for E-Business Technology  Is there an integration solution in-between these two extremes?  Declaratively add lightweight ‘hints’ to a search engine thus allowing gradual enrichment of loosely integrated data sources

7 Copyright  2008 by CEBT Example Scenario  Query “pdf yesterday”  Hints(Trails) 1.The date attribute is mapped to modified attribute 2.The date attribute is mapped to received attribute 3.The yesterday keyword is mapped to a query for values of the date attribute equal to the date of yesterday 4.The pdf keyword is mapped to a query for elements whose names end in pdf Center for E-Business Technology

8 Copyright  2008 by CEBT Where hints come from?  Given by the user Explicitly Via Relevance Feedback  (Semi-)Automatically Information extraction techniques Automatic schema matching Ontologies and thesauri (e.g., wordnet) User communities (e.g., trails on gene data, bookmarks)  All these aspects are beyond the scope of this paper Center for E-Business Technology

9 Copyright  2008 by CEBT Data and Query Model  Data Model Assume that all data is represented by a logical graph G Query also represented by graph Center for E-Business Technology

10 Copyright  2008 by CEBT Query Syntax Center for E-Business Technology

11 Copyright  2008 by CEBT Query Example  “//Home/projects//*[“Mike”]” Center for E-Business Technology

12 Copyright  2008 by CEBT Basic Form of a Trail  An unidirectional trail  An bidirectional trail Center for E-Business Technology

13 Copyright  2008 by CEBT Trail Example  Trails in an example scenario Trails Given query – “pdf yesterday” Transformed query – “//*.pdf[modified=yesterday() OR received=yesterday() ].” Center for E-Business Technology

14 Copyright  2008 by CEBT iTrail Query Processing 1.Matching 2.Transforming 3.Merging Center for E-Business Technology

15 Copyright  2008 by CEBT iTrail Query Processing Example  Given Query Q 1 = //home/projects//* [“Mike”]  Trail Ψ 8 := //home/*.name -> //calendar//*.tuple.category  Resulting Query Q 1 { Ψ 8 } = //home/projects/*[“Mike”] U //calendar//*[category=“project”]//*.[“Mike”] Center for E-Business Technology  Utilizing G. Miklau and D. Suciu. Containment and Equivalence for an Xpath Fragment. In PODS, 2002.

16 Copyright  2008 by CEBT Applying Multiple Trail  MMCA(Multiple Match Colouring Algorithm) algorithm Trail can be applied infinitely To prevent infinite recursion, a trail should not be rematched to nodes in a logical plan generated by itself Center for E-Business Technology

17 Copyright  2008 by CEBT Other Issues  Trail Pruning Problem: MMCA is exponential in number of levels Solution: Trail Pruning – Prune by number of levels – Prune by top-K trails matched in each level Give weight and prob. to trails – Prune by both top-K trails and number of levels  Trail Indexing Precompute trail expressions in order to speed up query processing Trail materialization Center for E-Business Technology

18 Copyright  2008 by CEBT Experiments  Setting Configured iMeMex to act in three modes – Baseline: Graph / IR search engine – iTrails: Rewrite search queries with trails – Perfect Query: Semantics-aware query Data Center for E-Business Technology

19 Copyright  2008 by CEBT Experiment, Quality  Compare with baseline Center for E-Business Technology

20 Copyright  2008 by CEBT Experiment, overhead  Compare with perfect query Overhead is not negligible However, this can be fixed by exploiting trail materializations Center for E-Business Technology

21 Copyright  2008 by CEBT Experiment, Scalability #1 Center for E-Business Technology  Rewrite Time Query-rewrite time can be controlled with pruning

22 Copyright  2008 by CEBT Experiment, Scalability #2  Quality Pruning improves precision Center for E-Business Technology

23 Copyright  2008 by CEBT Conclusion  Our Contributions iTrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri,attribute matches,...) We propose a framework and algorithms for Pay-as-you-go Information Integration Smooth transition between search and data integration  Future Work Trail Creation – Use collections (ontologies, thesauri, wikipedia) – Work on automatic mining of trails from the dataspace Other types of trails Center for E-Business Technology


Download ppt "ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi."

Similar presentations


Ads by Google