The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Slides:



Advertisements
Similar presentations
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Advertisements

Database System Concepts and Architecture
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
1 Introduction to Database Systems CSE444 Instructor: Scott Vandenberg University of Washington Winter 2000.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Introduction to Databases
1 Pertemuan 02 Database environment Matakuliah: >/ > Tahun: > Versi: >
Chapter 2 Database Environment.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Introduction to Databases Transparencies
Chapter 1 INTRODUCTION TO DATABASE.
Chapter 2 Database Environment Pearson Education © 2014.
1 Chapter 2 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data u User’s view immune to changes.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Introduction to Databases
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Chapter 1 Overview of Databases and Transaction Processing.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
CSC2012 Database Technology & CSC2513 Database Systems.
1 Serge Abiteboul - Monitoring 1 Monitoring of distributed applications (in P2P) Serge Abiteboul, Pierre Bourhis, Bogdan Marinoiu, INRIA Saclay and Université.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Chapter 1 Introduction to Databases Pearson Education ©
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Database System Concepts and Architecture
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Models of Models: Digital Forensics and Domain-Specific Languages Daniel A. Ray and Phillip G. Bradford The University of Alabama Tuscaloosa, AL
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
File Systems and Databases Lecture 1. Files and Databases File: A collection of records or documents dealing with one organization, person, area or subject.
Distributed Database Systems Overview
1 Chapter 1 Introduction to Databases Transparencies Last Updated: Pebruari 2010 By M. Arief Updated by RSO Feb 2011
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Chapter 1 Introduction to Databases. 1-2 Chapter Outline   Common uses of database systems   Meaning of basic terms   Database Applications  
Database Systems DBMS Environment Data Abstraction.
1 DocFlow - kick off Monitoring 1 Distributed Monitoring in P2P Systems Serge Abiteboul, Bogdan Marinoiu INRIA-Futurs and Univ. Paris 11.
8/31/2012ISC329 Isabelle Bichindaritz1 Database Environment.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
Database Environment Session 2 Course Name: Database System Year : 2013.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
1 Introduction to Databases. 2 Examples of Database Applications u Purchases from the supermarket u Purchases using your credit card u Booking a holiday.
1 Chapter 1 Introduction to Databases Transparencies.
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Introduction to Databases Transparencies © Pearson Education Limited 1995, 2005.
Database Administration Basics. Basic Concepts and Definitions  Data Facts that can be recorded and stored  Metadata Data that describes properties.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 2 Database Environment.
1 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data. u A user’s view is immune to changes.
1 Chapter 2 Database Environment Pearson Education © 2009.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 1.
Chapter 1 Overview of Databases and Transaction Processing.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Introduction to Databases
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Database Environment Transparencies
Lecture 1 File Systems and Databases.
Introduction to Databases
Chapter 2 Database Environment Pearson Education © 2014.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Motivation Content sharing community: A group of users that share and query information within some domain –Examples: UCSC genome browser, Flickr Interesting data management problem –Shared information is heterogeneous, distributed, and dynamic –Large body of previous research Distinguishing point: users are not database savvy Challenge: Enable non-experts to easily create and maintain content sharing communities

The Data Ring P2P DBMS for content sharing communities –Each peer exports data or services –The ring supports declarative queries over the shared resources Goal: build communities in a “declarative” fashion The data ring is responsible for the indexing/replication/organization of the shared information Happy user

The Data Ring v0.1 Topological layer –Repository of XML views and services –Declarative queries Physical layer –Physical structures –Distributed query plans –Autonomic administration

Outline 1.A formalism for distributed query optimization 2.Autonomic administration Outlook on research problems Outrageous statements

Problem #1: A formalism for distributed query optimization

Motivation What made the relational model successful: –A logic for describing tables –An algebra for query optimization We need the equivalent for trees and services in a distributed context A logic for describing distributed XML data and services An algebra for optimizing queries

Desiderata for description logic Seamless transition between data and services –Example: what is the phone number of CIDR’s PC chair? Look up Gerhard Weikum in MPI’s phonebook Support for streams –Streams are essential for subscription services –They are also necessary to support recursion

Desiderata for algebra Be amenable to rewrites Capture the topology of distributed computation Allow transition between logical and physical state –Re-optimization or partial optimization –Error recovery

Starting point: AXML AXML: XML tree with embedded web service calls AXML can serve as the description logic –It combines intentional (XML) with extensional (services) data –It supports (push and pull) streams as a core concept AXML can also provide the foundation for the algebra –A distributed plan is a workflow of services => an AXML doc –Rewrite rules are transformations on AXML documents Disclaimer: AXML is not a complete solution

Problem #2: Autonomic administration

Motivation Users are not database experts Users are averse to too many “knobs” There is no central authority that can be responsible for administration The data ring is self-administrated

What should be automated Monitoring –Logs and statistics on system operation –Models of system performance Tuning –Enrichment of physical layer with access structures –Automatic maintenance of meta-data Healing –Recovery from peer and network failures –Recovery from unexpected anomalies

Some issues System integration Distribution –The tunable state is distributed –There is no central synchronization for the tuning On-line tuning Distributed vs. local tuning Data activation for files –Data lives in its natural habitat –Meta-data and physical schema evolves in the DB

Is there any hope? There is no alternative! –Self-administration is not a gadget but a necessity Some technology already exists –E.g., self-tuning for relational databases, machine-learning The power of parallelism

Conclusions Realizing the data ring involves several challenging and interesting problems A lot of existing technology to leverage and lots of open issues to tackle Some progress already being made –On-line tuning –Algebra for distributed queries –P2P indexing We hope to find more help!

Questions?

Data abstraction in the data ring Physical Layer Topological Layer External Layer

Data abstraction in the data ring Every peer exports a set of resources –A resource is a data item or a service –We use XML+WSDL to describe resources Peers can issue declarative queries (one-shot and continuous) over the shared resources Topological Layer

Data abstraction in the data ring Physical structures for query processing –Eg., data catalog, indices, views, replicas Support for distributed query plans Physical Layer

Data abstraction in the data ring Semantically richer data models and query languages –E.g., a la dataspaces [FHM05] External Layer

Data abstraction in the data ring Motivation: data independence Our initial focus is on topological plus physical –Necessary for a basic set of services –Essential for the external layer We hope to leverage on-going research on the external layer Topological Layer External Layer Physical Layer

Data activation for files Scientists prefer to keep data on the file system –Convenience vs overhead of using a database One approach: in-situ query processing –Data lives in the file system, processing logic lives in DBMS Use data activation to speed up processing –E.g., instantiate indices or store contents in a relational DB –Similar to relational database tuning but more complex

An algebraic rewrite

Algebraic plans