Download presentation
Presentation is loading. Please wait.
Published byJanice Anderson Modified over 8 years ago
1
March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi Markus Färber Olivier Girard Shant Karakashian Marcos Vaz Salles BTW 2007
2
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 2 A World of Data Silos > 80% of data outside of relational databases Documents, spreadsheets, presentations Web pages Email, instant messages, news feeds Images, audio, video Specialized systems for many of the data types (filesystems, web/email servers, DBMSs) Lack of unified services over ALL the data
3
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 3 Dataspace The complete set of information (documents, emails, images, etc) belonging to one organization or task Examples: Personal dataspaces your messages, your family photos Enterprise dataspaces all information about a key customer Scientific dataspaces all information about one given research project Includes a set of data sources and relationships among pieces of information in the sources
4
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 4 Dataspace Management System New system abstraction A hybrid of Search Engine Database Management System Information Integration System Data Sharing System Offers services on ALL the data Keyword and structural search to start with (baseline) Provides pay-as-you-go information integration Model data relationships and their evolution However, does not acquire full control of data System does not “own” the data
5
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 5 Projects on Dataspaces Vision Paper on Dataspaces Mike Franklin (UC Berkeley), Alon Halevy (U Wash / Google), David Maier (U Portland). From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record, December 2005. ETH Zürich: iMeMex UC Berkeley (Shawn Jeffrey) and Google (Alon Halevy) U Portland (David Maier) Purdue U (Nehme, Elke Rundensteiner, et. al.)
6
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 6 Our Focus: Personal Dataspaces Data Sources Applications User Great applications, but information integration is done by the user PC Email Server Web Server iPod PDSMS iMeMex System
7
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 7 So far... Vision: Dataspaces (VLDB 2005, SIGIR PIM 2006) To come... Data model: single framework for different types of data (VLDB 2006) System Architecture: Mediation / Warehousing (CIDR 2007, BTW 2007) Pay-as-you-go information integration (ongoing work)
8
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 8 Characteristics of Personal Data Non-schematic Heterogeneous collections, no formally defined schema Several possible serializations Hundreds of file formats, different encodings Contains arbitrary graphs References within documents (LaTeX/Word), filesystem links Distributed among different data sources Filesystem, email servers, web servers, databases, iPod Infinite RSS, ATOM, email streams
9
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 9 Data Model Options Support for Personal Data Data Models Bag of WordsRelationalXMLiDM Non- schematic data Serialization independent Support for Graph data Support for Lazy Computation Support for Infinite data Specific schema Extension: XLink/ XPointer View mechanism Extension: ActiveXML Extension: Document streams Extension: Relational streams Extension: XML streams
10
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 10 Data Models for Personal Information Physical Level Relational XML Document / Bag of Words Personal Information iDM Abstraction Level lower higher
11
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 11 iDM: iMeMex Data Model Our approach: get the data model closer to personal information – not the other way around Supports: Unstructured, semi-structured and structured data, e.g., files&folders, XML, relations Clearly separation of logical and physical representation of data Arbitrary directed graph structures, e.g., section references in LaTeX documents, links in filesystems, etc Lazily computed data, e.g., ActiveXML (Abiteboul et. al.) Infinite data, e.g., media and data streams See VLDB 2006
12
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 12 iDM: Lazily Computed Graph Nodes and edges are lazily computed Each node is a Resource View
13
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 13 iDM: Lazily Computed Graph Behind the scenes, obtaining the content may: Read a file on the filesystem Access a page on the web Fetch the data from an index structure Behind the scenes, obtaining the group may: Get the children of a folder in the filesystem Look up an edge replica Obtain the sections of a document
14
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 14 How to implement iDM: Architectural Perspective Indexes&Replicas access (warehousing) Data source access (mediation) Complex operators (query algebra)
15
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 15 Further Research Challenges in Dataspace Management Systems Pay-as-you-go information integration Model relationships in the dataspace Examples: semantic equivalences, lineage relationships Distributed Dataspaces Query language specification (iQL)
16
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 16 iMeMex Prototype Implementation iMeMex Prototype ~ 780 classes ~ 70,900 LOC Java-based: supported on Linux, Mac and Windows OSGi-based: Everything is a Plug-in (~ 52 bundles) Open-source (Apache 2.0): http://www.imemex.orghttp://www.imemex.org Team Advisor Two Ph.D. students Three M.Sc. students Thirteen Semester Project students
17
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 17 Conclusions Dataspace Management Systems are a new system abstraction iMeMex is among the first implementations of this new breed of systems – our focus: Personal Dataspaces Dataspace Management Systems call for: New data model New system architecture New capabilities for pay-as-you-go information integration More information: http://www.imemex.orghttp://www.imemex.org
18
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 18 Questions? Thanks in Advance for your Feedback!
19
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 19 Backup Slides
20
March 8, 2007 Marcos Vaz Salles/ETH Zurich/marcos.vazsalles@inf.ethz.ch 20 Personal Dataspaces Literature Dittrich, Vaz Salles, Kossmann, Blunschi.iMeMex: Escapes from the Personal Information Jungle (Demo Paper). VLDB, September 2005. Dittrich, Vaz Salles. iDM: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB, September 2006 Dittrich. iMeMex: A Platform for Personal Dataspace Management. SIGIR PIM, August 2006. Blunschi, Dittrich, Girard, Karakashian, Vaz Salles. A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo Paper). CIDR, January 2007. Dittrich, Blunschi, Färber, Girard, Karakashian, Vaz Salles. From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System. BTW, March 2007
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.