Download presentation
Presentation is loading. Please wait.
Published byMyles McCoy Modified over 9 years ago
1
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway Christos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece
2
June 28, 2006ICPS'20062 Outline Motivation and example application Taxonomies and taxonomy-based querying Taxonomy-based query routing Taxonomy caching: architecture and maintenance Experimental results Summary and further work
3
June 28, 2006ICPS'20063 Motivation Mobile devices high storage capacity & wireless support Contain multimedia documents that can be shared Possibly other data/services: –Temperature or other environmental data Important challenge: find the files & services! Problem: –Dynamic contents, location, and visibility –Limited bandwidth Centralized indexing/search engines not applicable P2P network & search
4
June 28, 2006ICPS'20064 Example application: MobiShare Devices share resources by hosting web services Device connected to a CAS CASs connected P2P [More details in Valavanis et al., Web Intelligence’2003]
5
June 28, 2006ICPS'20065 Outline of basic idea 1) Describe contents according to taxonomy 2) Taxonomy info cached at remote peers 3) Use cached knowledge to route queries to appropriate peers Why? 1) Should reduce latency 2) Increase recall with same cost
6
June 28, 2006ICPS'20066 Resource description Taxonomy-based resource description Also applicable for audio/video More than one taxonomy might exist in system Resource description: Taxonomy ID and set of categories
7
June 28, 2006ICPS'20067 Taxonomy-based querying Query: 1) Request for all resources belonging to category C j or 2) Request for all resources belonging to category C j and satisfying some additional property Example properties: Text contents, metadata
8
June 28, 2006ICPS'20068 Searching in unstructured P2P networks Basic search technique: Local execution of query then forwarding if TTL>0 –Naïve flooding (all neighbors) –Normalized flooding (only K neighbors) –Random walks: only one random neighbor, but W walks initiated Problem: Only a limited # of peers can be searched (query horizon) Possible improvements: –Routing indices –Summary indexing (bloom filters etc) –Result caching However: Still limited scalability and coverage
9
June 28, 2006ICPS'20069 Taxonomy caching Basic idea: –Maintain taxonomic of remote contents in a taxonomy cache (TCache) Mapping from taxonomic concept to set of peers Advantages: –Cheaper to maintain than full-text index –More applicable to multimedia data –More robust wrt. changes in contents Used to improve query routing Higher recall and reduced latency
10
June 28, 2006ICPS'200610 Query routing using taxonomy cache (TCache) 1) Basis: one of traditional routing strategies 2) Query forward peers: P F 3) Starting point: P F = neighbors=P N ={P N1,…,P Nn } 4) Lookup in TCache: Lookup(category) P C ={P C1,…,P Cm } 5) P F = P N +P C 6) Query forwarded to (subset of) P F
11
June 28, 2006ICPS'200611 Query forwarding alternatives (1) Query forward peers: P F # of neighbors (excl. previous): N n # matches from lookup: N c Ranking of peers in P C : –Based on # of resources within a category –High # of resources: considered experts TCB: –Highest ranked in P C + the N n neighbors in {P N1,…,P Nn } –Forwarding to peer in P C called jump –Jump can be to peer beyond query horizon! TCA: –If N c ≥ N n : forward to N n highest ranked peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) randomly selected neighbors
12
June 28, 2006ICPS'200612 Query forwarding alternatives (2) TCCN: –If N c ≥ N n : forward to all N c peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) neighbors TCDN: –If N c ≥ N n : forward to N n /2 highest ranked peers in P C + random selection of N n /2 other peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) neighbors
13
June 28, 2006ICPS'200613 Distributing taxonomic information Basic mechanism: piggyback matching category with query result –Rsult returned through original path, possibly involving jumps –Makes revalidation of contents intermediate TCaches possible –Coverage will be gradually extended (beyond query horizon) Lazy distribution by gossiping also possible
14
June 28, 2006ICPS'200614 TCache architecture and maintenance Aim: Provide efficient mapping C {P C1,…,P Cm } For each category: Peers, # of resources, and TTL TTL: –Regularly decremented –Reset to start value at revalidation Caching policy: Aggressive vs. selective Compacting techniques: Peer upgrade & non-expert pruning
15
June 28, 2006ICPS'200615 Experimental setup Simulations Excerpts of DMOZ taxonomy Synthetic network topologies Resource allocation: 80/20 rule Queries are taxonomic categories A number of peers have role as querying peers Measured: Contacted peers, messages, recall and latency In this presentation: Results using flooding and TCDN query routing
16
June 28, 2006ICPS'200616 Improvements in recall N M (F) N M (TC) Recall (F) Recall (TC) TTL=17.87.00.00220.0019 TTL=3166.7166.00.01170.0149 TTL=5524.7523.90.02820.0717 TTL=71058.61057.70.05060.1835 TTL=91721.01719.60.07730.2930 TTL=112566.32566.00.11040.4012 TTL=133536.53535.80.14770.4891 TTL=154560.24558.70.18640.5755
17
June 28, 2006ICPS'200617 Primary reason for improvement: More intelligent query forwarding N C (F) N C (TC) Recall (F) Recall (TC) TTL=17.86.70.00220.0019 TTL=345.353.40.01170.0149 TTL=5110.6158.00.02820.0717 TTL=7199.9346.80.05060.1835 TTL=9305.6583.10.07730.2930 TTL=11437.7840.30.11040.4012 TTL=13586.71120.60.14770.4891 TTL=15741.61372.40.18640.5755
18
June 28, 2006ICPS'200618 Improvement and scalability
19
June 28, 2006ICPS'200619 Latency reduction TCache results in very fast retrieval of first results Finding all results approximately similar performance because flooding in both techniques
20
June 28, 2006ICPS'200620 Summary and further work Presented motivation and context Taxonomy-based querying and query routing TCache architecture and maintenance Experimental results proving our claims Future/ongoing work: –Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006) –Integration of different taxonomies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.