Download presentation
Presentation is loading. Please wait.
1
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
2
Data Sharing Communities Examples: UCSC genome browser, SwissProt, Flickr Interesting data management problem –Shared information is heterogeneous –Data is distributed and dynamic –Lack of central administration –Users are not database savvy Data sharing community: a group of users that share and query information within some domain
3
The Data Ring P2P middleware system that provides: –Monitoring –Querying –…and other database-like services over the distributed information Main goal: simplicity of use
4
Data abstraction in the data ring Topological layer Physical layer External layer
5
Data abstraction in the data ring Declarative query services Data and query model based on XML Topological Layer
6
Data abstraction in the data ring Basic service is distributed query evaluation Comprises the overlay network (DHT), physical access structures (indices, replicas, views), and the catalog. Physical Layer
7
Data abstraction in the data ring Provides semantically richer data models External Layer
8
Data abstraction in the data ring Our focus is on the topological and physical layer External layer is equally important and an active research area Physical Layer Topological Layer
9
Thesis #1: formalism for distributed XML data and queries
10
Distributed XML data and queries What made the relational model successful: –A logic for describing tables –An algebra for query optimization We need the equivalent for trees in a distributed context: –A logic for describing distributed XML data –An algebra for optimizing distributed XML queries
11
Desiderata for description logic Seamless transition between data and services –Important for loose data integration Support for XML streams –Streams are essential for subscription services –They are also necessary to support recursion
12
Starting point: AXML AXML: XML tree with embedded web service calls –Seamless transition between intentional and extensional data –Provides a simple mechanism for loose data integration Core concept: XML streams –A web service call returns a stream of elements –Support for both push and pull semantics
13
Desiderata for algebra Be amenable to rewrites Capture the topology of distributed computation Allow seamless transition between logical and physical state –Plans may need to be re-optimized in mid-flight –It may be necessary to perform partial optimization –Error recovery
14
A proposal based on AXML A distributed plan is a workflow of web services … which is exactly a AXML tree Components: –An encoding of distributed plans in AXML –Rewrite rules A nice bonus: plans can be readily exchanged between nodes
15
Disclaimer AXML is a starting point, not a panacea Bottom line: we need formalisms for distributed XML queries
16
Thesis #2: autonomic administration
17
Autonomic administration Users are not database experts –Typically, scientists with computer experience Users are averse to too many “knobs” No central authority that is responsible for administration Autonomic administration is a necessity -- not a gadget
18
Facets of autonomy Self-monitoring Self-tuning Self-healing
19
Some issues System integration Distribution On-line tuning Pro-active tuning
20
Distributed vs. local tuning Distributed tuning –Based on the global workload –Catalog organization, replication Local tuning –Based on local workload –Physical design tuning
21
Data activation for files A large portion of the data is expected to be in files We need to develop query processors for data residing in files File activation: optimize access to the file based on the local workload –E.g., instantiate an index on file contents or materialize a relational view Local tuning is essential in this context
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.