March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi.

Slides:



Advertisements
Similar presentations
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference.
Advertisements

IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Understanding Networked Applications: A First Course Chapter 15 by David G. Messerschmitt.
ITrails: Pay-as-you-go Information Integration in Dataspaces Authors: Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
VCT May 20, 2009 Sapna Blesson Advisor: Dr.Christopher Pollett.
Integration and Insight Aren’t Simple Enough Laura Haas IBM Distinguished Engineer Director, Computer Science Almaden Research Center.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
September 26, 2007 iTrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier GirardLukas Blunschi.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom.
1 Distance education : What could technology offer ? Gérard CHOLLET ENST/CNRS-LTCI 46 rue Barrault PARIS cedex 13
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Platform for Personal Information Management and Integration Xin (Luna) Dong and Alon Halevy University of Washington.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
SAKAI 3 MICHAEL KORCUSKA March 2009 Why Sakai 3?  Changing expectations  Google docs/apps, Social Networking, Web 2.0  Success of project sites =
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Exploring Personal CoreSpace For DataSpace Management Li Yukun and Xiaofeng Meng WAMDM Lab Renmin University of China.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
1 iTrails: Pay-as-you-go Information Integration in Datasapces Authors: Salles, Dittrich et al. (ETH Zurich) Published in VLDB2007 Presenter: Jim 7 Dec.
Business Software What is database software? p. 145 Allows you to create, access, and manage data Add, change, delete, sort, and retrieve data Next.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Introduction to the Semantic Web and Linked Data
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Intensional Associations in Dataspaces Marcos Vaz Salles Cornell University Jens Dittrich Saarland University Lukas Blunschi ETH Zurich ICDE 2010.
ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
CIDR 2007, Asilomar California1 Predicate-Based Indexing of Enterprise Web Applications Cristian Duda, David Graf, Donald Kossmann ETH Zurich.
The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Information Retrieval in Practice
CS 405G: Introduction to Database Systems
Bielefeld Academic Search Engine
Microsoft Office SharePoint Server 2007 Enterprise Search
Cloud based linked data platform for Structural Engineering Experiment
Using E-Business Suite Attachments
CS122B: Projects in Databases and Web Applications Winter 2017
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Proposal for Term Project
Knowledge Management Systems
THE WORLDS OF DATABASE SYSTEMS
Tools for Memory: Database Management Systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Project tracking system for the structure solution software pipeline
11/23/2018 8:30 AM BRK3037 BRK3037: Dive deep on building apps and services with the Office 365 Communications Platform David Newman Senior Program Manager.
A Platform for Personal Information Management and Integration
Chapter 1: The Database Environment
Analysis models and design models
Database Systems Instructor Name: Lecture-3.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Chapter 1: The Database Environment
The Database Environment
Research on Personal Dataspace Management
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Jiwon Kim Steve Seitz Maneesh Agrawala
Introducing MagicInfo 6
Fading Schemas… Alon Y. Halevy.
Dataspace: a new concept of data management
Presentation transcript:

March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi Markus Färber Olivier Girard Shant Karakashian Marcos Vaz Salles BTW 2007

March 8, 2007 Marcos Vaz Salles/ETH 2 A World of Data Silos  > 80% of data outside of relational databases  Documents, spreadsheets, presentations  Web pages  , instant messages, news feeds  Images, audio, video  Specialized systems for many of the data types (filesystems, web/ servers, DBMSs)  Lack of unified services over ALL the data

March 8, 2007 Marcos Vaz Salles/ETH 3 Dataspace  The complete set of information (documents, s, images, etc) belonging to one organization or task  Examples:  Personal dataspaces  your messages, your family photos  Enterprise dataspaces  all information about a key customer  Scientific dataspaces  all information about one given research project  Includes a set of data sources and relationships among pieces of information in the sources

March 8, 2007 Marcos Vaz Salles/ETH 4 Dataspace Management System  New system abstraction  A hybrid of  Search Engine  Database Management System  Information Integration System  Data Sharing System  Offers services on ALL the data  Keyword and structural search to start with (baseline)  Provides pay-as-you-go information integration  Model data relationships and their evolution  However, does not acquire full control of data  System does not “own” the data

March 8, 2007 Marcos Vaz Salles/ETH 5 Projects on Dataspaces  Vision Paper on Dataspaces Mike Franklin (UC Berkeley), Alon Halevy (U Wash / Google), David Maier (U Portland). From Databases to Dataspaces: A New Abstraction for Information Management. SIGMOD Record, December  ETH Zürich: iMeMex  UC Berkeley (Shawn Jeffrey) and Google (Alon Halevy)  U Portland (David Maier)  Purdue U (Nehme, Elke Rundensteiner, et. al.)

March 8, 2007 Marcos Vaz Salles/ETH 6 Our Focus: Personal Dataspaces Data Sources Applications User Great applications, but information integration is done by the user PC Server Web Server iPod PDSMS iMeMex System

March 8, 2007 Marcos Vaz Salles/ETH 7 So far...  Vision: Dataspaces (VLDB 2005, SIGIR PIM 2006)  To come...  Data model: single framework for different types of data (VLDB 2006)  System Architecture: Mediation / Warehousing (CIDR 2007, BTW 2007)  Pay-as-you-go information integration (ongoing work)

March 8, 2007 Marcos Vaz Salles/ETH 8 Characteristics of Personal Data  Non-schematic  Heterogeneous collections, no formally defined schema  Several possible serializations  Hundreds of file formats, different encodings  Contains arbitrary graphs  References within documents (LaTeX/Word), filesystem links  Distributed among different data sources  Filesystem, servers, web servers, databases, iPod  Infinite  RSS, ATOM, streams

March 8, 2007 Marcos Vaz Salles/ETH 9 Data Model Options Support for Personal Data Data Models Bag of WordsRelationalXMLiDM Non- schematic data Serialization independent Support for Graph data Support for Lazy Computation Support for Infinite data Specific schema Extension: XLink/ XPointer View mechanism Extension: ActiveXML Extension: Document streams Extension: Relational streams Extension: XML streams

March 8, 2007 Marcos Vaz Salles/ETH 10 Data Models for Personal Information Physical Level Relational XML Document / Bag of Words Personal Information iDM Abstraction Level lower higher

March 8, 2007 Marcos Vaz Salles/ETH 11 iDM: iMeMex Data Model  Our approach: get the data model closer to personal information – not the other way around  Supports:  Unstructured, semi-structured and structured data, e.g., files&folders, XML, relations  Clearly separation of logical and physical representation of data  Arbitrary directed graph structures, e.g., section references in LaTeX documents, links in filesystems, etc  Lazily computed data, e.g., ActiveXML (Abiteboul et. al.)  Infinite data, e.g., media and data streams See VLDB 2006

March 8, 2007 Marcos Vaz Salles/ETH 12 iDM: Lazily Computed Graph  Nodes and edges are lazily computed  Each node is a Resource View

March 8, 2007 Marcos Vaz Salles/ETH 13 iDM: Lazily Computed Graph  Behind the scenes, obtaining the content may:  Read a file on the filesystem  Access a page on the web  Fetch the data from an index structure  Behind the scenes, obtaining the group may:  Get the children of a folder in the filesystem  Look up an edge replica  Obtain the sections of a document

March 8, 2007 Marcos Vaz Salles/ETH 14 How to implement iDM: Architectural Perspective Indexes&Replicas access (warehousing) Data source access (mediation) Complex operators (query algebra)

March 8, 2007 Marcos Vaz Salles/ETH 15 Further Research Challenges in Dataspace Management Systems  Pay-as-you-go information integration  Model relationships in the dataspace  Examples: semantic equivalences, lineage relationships  Distributed Dataspaces  Query language specification (iQL)

March 8, 2007 Marcos Vaz Salles/ETH 16 iMeMex Prototype Implementation  iMeMex Prototype  ~ 780 classes  ~ 70,900 LOC  Java-based: supported on Linux, Mac and Windows  OSGi-based: Everything is a Plug-in (~ 52 bundles)  Open-source (Apache 2.0):  Team  Advisor  Two Ph.D. students  Three M.Sc. students  Thirteen Semester Project students

March 8, 2007 Marcos Vaz Salles/ETH 17 Conclusions  Dataspace Management Systems are a new system abstraction  iMeMex is among the first implementations of this new breed of systems – our focus: Personal Dataspaces  Dataspace Management Systems call for:  New data model  New system architecture  New capabilities for pay-as-you-go information integration  More information:

March 8, 2007 Marcos Vaz Salles/ETH 18 Questions? Thanks in Advance for your Feedback!

March 8, 2007 Marcos Vaz Salles/ETH 19 Backup Slides

March 8, 2007 Marcos Vaz Salles/ETH 20 Personal Dataspaces Literature  Dittrich, Vaz Salles, Kossmann, Blunschi.iMeMex: Escapes from the Personal Information Jungle (Demo Paper). VLDB, September  Dittrich, Vaz Salles. iDM: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB, September 2006  Dittrich. iMeMex: A Platform for Personal Dataspace Management. SIGIR PIM, August  Blunschi, Dittrich, Girard, Karakashian, Vaz Salles. A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo Paper). CIDR, January  Dittrich, Blunschi, Färber, Girard, Karakashian, Vaz Salles. From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System. BTW, March 2007