Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi.

Similar presentations

Presentation on theme: "March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi."— Presentation transcript:

1 March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi Markus Färber Olivier Girard Shant Karakashian Marcos Vaz Salles BTW 2007

2 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 2 A World of Data Silos  > 80% of data outside of relational databases  Documents, spreadsheets, presentations  Web pages  Email, instant messages, news feeds  Images, audio, video  Specialized systems for many of the data types (filesystems, web/email servers, DBMSs)  Lack of unified services over ALL the data

3 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 3 Dataspace  The complete set of information (documents, emails, images, etc) belonging to one organization or task  Examples:  Personal dataspaces  your messages, your family photos  Enterprise dataspaces  all information about a key customer  Scientific dataspaces  all information about one given research project  Includes a set of data sources and relationships among pieces of information in the sources

4 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 4 Dataspace Management System  New system abstraction  A hybrid of  Search Engine  Database Management System  Information Integration System  Data Sharing System  Offers services on ALL the data  Keyword and structural search to start with (baseline)  Provides pay-as-you-go information integration  Model data relationships and their evolution  However, does not acquire full control of data  System does not “own” the data

5 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 5 Projects on Dataspaces  Vision Paper on Dataspaces Mike Franklin (UC Berkeley), Alon Halevy (U Wash / Google), David Maier (U Portland). From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record, December 2005.  ETH Zürich: iMeMex  UC Berkeley (Shawn Jeffrey) and Google (Alon Halevy)  U Portland (David Maier)  Purdue U (Nehme, Elke Rundensteiner, et. al.)

6 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 6 Our Focus: Personal Dataspaces Data Sources Applications User Great applications, but information integration is done by the user PC Email Server Web Server iPod PDSMS iMeMex System

7 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 7 So far...  Vision: Dataspaces (VLDB 2005, SIGIR PIM 2006)  To come...  Data model: single framework for different types of data (VLDB 2006)  System Architecture: Mediation / Warehousing (CIDR 2007, BTW 2007)  Pay-as-you-go information integration (ongoing work)

8 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 8 Characteristics of Personal Data  Non-schematic  Heterogeneous collections, no formally defined schema  Several possible serializations  Hundreds of file formats, different encodings  Contains arbitrary graphs  References within documents (LaTeX/Word), filesystem links  Distributed among different data sources  Filesystem, email servers, web servers, databases, iPod  Infinite  RSS, ATOM, email streams

9 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 9 Data Model Options Support for Personal Data Data Models Bag of WordsRelationalXMLiDM Non- schematic data Serialization independent Support for Graph data Support for Lazy Computation Support for Infinite data Specific schema Extension: XLink/ XPointer View mechanism Extension: ActiveXML Extension: Document streams Extension: Relational streams Extension: XML streams

10 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 10 Data Models for Personal Information Physical Level Relational XML Document / Bag of Words Personal Information iDM Abstraction Level lower higher

11 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 11 iDM: iMeMex Data Model  Our approach: get the data model closer to personal information – not the other way around  Supports:  Unstructured, semi-structured and structured data, e.g., files&folders, XML, relations  Clearly separation of logical and physical representation of data  Arbitrary directed graph structures, e.g., section references in LaTeX documents, links in filesystems, etc  Lazily computed data, e.g., ActiveXML (Abiteboul et. al.)  Infinite data, e.g., media and data streams See VLDB 2006

12 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 12 iDM: Lazily Computed Graph  Nodes and edges are lazily computed  Each node is a Resource View

13 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 13 iDM: Lazily Computed Graph  Behind the scenes, obtaining the content may:  Read a file on the filesystem  Access a page on the web  Fetch the data from an index structure  Behind the scenes, obtaining the group may:  Get the children of a folder in the filesystem  Look up an edge replica  Obtain the sections of a document

14 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 14 How to implement iDM: Architectural Perspective Indexes&Replicas access (warehousing) Data source access (mediation) Complex operators (query algebra)

15 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 15 Further Research Challenges in Dataspace Management Systems  Pay-as-you-go information integration  Model relationships in the dataspace  Examples: semantic equivalences, lineage relationships  Distributed Dataspaces  Query language specification (iQL)

16 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 16 iMeMex Prototype Implementation  iMeMex Prototype  ~ 780 classes  ~ 70,900 LOC  Java-based: supported on Linux, Mac and Windows  OSGi-based: Everything is a Plug-in (~ 52 bundles)  Open-source (Apache 2.0): http://www.imemex.org  Team  Advisor  Two Ph.D. students  Three M.Sc. students  Thirteen Semester Project students

17 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 17 Conclusions  Dataspace Management Systems are a new system abstraction  iMeMex is among the first implementations of this new breed of systems – our focus: Personal Dataspaces  Dataspace Management Systems call for:  New data model  New system architecture  New capabilities for pay-as-you-go information integration  More information: http://www.imemex.org

18 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 18 Questions? Thanks in Advance for your Feedback!

19 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 19 Backup Slides

20 March 8, 2007 Marcos Vaz Salles/ETH Zurich/ 20 Personal Dataspaces Literature  Dittrich, Vaz Salles, Kossmann, Blunschi.iMeMex: Escapes from the Personal Information Jungle (Demo Paper). VLDB, September 2005.  Dittrich, Vaz Salles. iDM: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB, September 2006  Dittrich. iMeMex: A Platform for Personal Dataspace Management. SIGIR PIM, August 2006.  Blunschi, Dittrich, Girard, Karakashian, Vaz Salles. A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo Paper). CIDR, January 2007.  Dittrich, Blunschi, Färber, Girard, Karakashian, Vaz Salles. From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System. BTW, March 2007

Download ppt "March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi."

Similar presentations

Ads by Google