Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece
2 Peer-to-Peer Systems Users issue queries in order to discover data or services of their interest Queries are routed in the system in order to find relevant data Decentralization-autonomy Scalability Dynamicity
3 Motivation Evolution of peer-to-peer systems as an effective way of sharing data Users data unstructured and heterogeneous Wide use of XML for data representation and exchange in the Internet Service and data descriptions in XML-based languages
4 The Problem Peer-to-Peer system where each node stores XML documents A query (path query) issued at a node may need results from multiple nodes in the system Use data summaries (filters) at each node to assist query routing Challenge: How to efficiently discover the appropriate data based on their content? A B C SumB SumC
5 1. Selection of Appropriate Filters Lossless- Small probability of false positives Scalability: scale to a large number of nodes and documents within a small space overhead. Support updates (adding/removing documents) incremental Merge: new node with documents D’, given previoulsy computed F(D) and F(D’) produce filter that summarize D D’ – delete nodes
6 Multi-Level Bloom Filters Bloom filters (hash based structures designed for supporting membership queries) suitable for distributed environments Main drawback: Unable to represent hierarchies Extend to multi-level Bloom Filters in order to support path queries Two approaches: Breadth Bloom Filters(one Bloom filter for each level of the XML tree) Depth Bloom Filters(all subpaths of the same length inserted in the same Bloom filter)
7 Breadth Bloom Filters BBF 0 BBF 1 BBF 3 BBF 2 (device printer camera color postscript digital) device printer camera (color postscript digital) Queries: $device/printer/color /printer/postscript camera printer device color postscript digital
8 2. Content-Based Clustering Connect nodes based on the similarity of their content, so that a node in the overlay network is linked to its most similar node Similar node: node with “similar” documents Motivation: cluster similar nodes, to minimize the number of irrelevant queries that are processed by a node (maximize recall) Content Similarity is derived from the corresponding filter similarity (cost efficient)
9 Distribution: Hierarchical Organization Node C: Local filter Merged filter :E F G H Root filters: A, B, D
10 Interest in Mobility Hierarchical organization suitable for nodes with different stability properties and storage and processing capabilities Root: non-mobile peer –tree: mobile peers with limited responsibilities Proximity-based organization based on geographical proximity Choose to attach to your “closest” neighbor Updates effect only the local hierarchy