Tagging with DHARMA A DHT-based Approach for Resource Mapping through Approximation Luca Maria Aiello, Marco Milanesio Giancarlo Ruffo, Rossano Schifanella.

Slides:



Advertisements
Similar presentations
Luca Maria Aiello. Università degli Studi di Torino – Dipartimento di Informatica – SecNet Group 1 Secure distributed applications: a case study Luca Maria.
Advertisements

Embedding identity in DHT systems: security, reputation and social networking management 1 Embedding Identity in DHT Systems: Security, Reputation and.
Luca Maria Aiello, Università degli Studi di Torino, Computer Science department 1 Tempering Kademlia with a robust identity based system.
BiG-Align: Fast Bipartite Graph Alignment
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Secure and Flexible Framework for Decentralized Social Network Services Luca Maria Aiello, Giancarlo Ruffo Università degli Studi di Torino Computer Science.
6/2/ An Automatic Personalized Context- Aware Event Notification System for Mobile Users George Lee User Context-based Service Control Group Network.
Search Engines and Information Retrieval
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Research Topics of Potential Interest to Geography COMPUTER SCIENCE Research Away Day, 29 th April 2010 Thomas Erlebach.
Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
| Computer Science Department | Ubiquitous Knowledge Processing Lab | © Prof. Dr. Iryna Gurevych | 1 del.icio.us Knowledge Management in Web.
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Overview of Search Engines
A glimpse on social influence and link prediction in OSNs
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Social scope: Enabling Information Discovery On Social Content Sites
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Visualizing Large Dynamic Digraphs Michael Burch.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Outline Introduction State-of-the-art solutions
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
CHAPTER 3 Architectures for Distributed Systems
Structure learning with deep autoencoders
Spatial Online Sampling and Aggregation
Random Sampling over Joins Revisited
Sarthak Ahuja ( ) Saumya jain ( )
Efficient Subgraph Similarity All-Matching
Lecture 6: Counting triangles Dynamic graphs & sampling
Presentation transcript:

Tagging with DHARMA A DHT-based Approach for Resource Mapping through Approximation Luca Maria Aiello, Marco Milanesio Giancarlo Ruffo, Rossano Schifanella Università degli Studi di Torino Computer Science Department Keywords : navigational search, folksonomy, DHT, approximated graph mapping, last.fm Seventh International Workshop on Hot Topics in Peer-to-Peer Systems Speaker: Luca Maria Aiello, PhD student

Overview Goal: enrich the p2p layer with a tag-based navigational search engine Task: mapping a folksonomy on a DHT Problem: mapping dense graphs on distributed layer is very inefficient Solution: approximated mapping using a complexity-bounded algorithm Evaluation: the approximated solution does not upset the structure/semantic of the folksonomy 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino2

Motivations Direct search Navigational search ◦ Taxonomies (e.g. Yahoo! directory): not successful ◦ Folksonomies: successfully emerging thanks to the social web Tag-based search engine on DHTs is profitable ◦ Applications layered on DHTs use exact match search (e.g. eMule)  folksonomic search adds flexibility ◦ Recent research activity on P2P for privacy-aware online social networks  Folksonomic search engine is an important OSN feature 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino3

Folksonomies structure Represented as a tripartite hypergraph 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino4 Metallica Iron Maiden metal classic John

Tag-Resource Graph (TRG) Projection on user dimension 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino5 Metallica Iron Maiden metal classic John Bipartite graph Edges are weighted depending on number of tag assignments 3 5 1

Folksonomy graph (FG) Models tag-to-tag similarity with co-occurrence 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino6 Metallica Iron Maiden metal classic =8 1+2=3 8 3 Co-occurrence similarity is widely known and used Local calculation Directional

Folksonomic search 1. User selects a tag 2. Related resources and tags are displayed ◦ Ranking based on arc weights 3. User can shift to another tag or select a resource 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino7 metal thrash powergrind classic Iron Maiden MetallicaManowar

Folksonomy evolution The folskonomy grows quickly due to massive user activity ◦ Resource insertion ◦ Tag insertion 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino8 Both FG and TRG should be updated properly!

Folksonomy maintenance 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino9

Mapping on a DHT Map FG and TRG on the DHT and implement the update operations on the distributed layer Idea: ◦ Splitting the FG and TRG into small structural parts or modules ◦ Each module contains a node (tag or resource) and its outgoing edges/arcs ◦ Each module is mapped on a different p2p index node 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino10

Mapping FG on a DHT 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino11

Mapping TRG on a DHT 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino12

Folksonomy maintenance on the DHT When inserting a new tag or resource, proper modules have to be modified Complexity of resource insertion and search are linear with the input size  OK! Tagging operation is linear with |Tags(r)| ◦ Tags(r) is the set of tags labeling the resource r How many tags for a resource? ◦ Let’s see a real-world example… 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino13 Insert(r, t 1,…, t m )Tag(t,r)Search step #lookups2 + 2m4 +|Tags(r)|2

Last.fm folksonomy sample 99,405 users, 1,413,657 items and 285,182 different tags 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino14 |Tags(r)| Can be huge!

An approximate solution DHARMA ◦ DHT-based Approach for Resource Mapping through Approximation Idea: ◦ Put a constant upper bound to the number of lookups for tagging operation ◦ The resulting FG will be approximated… 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino15

Approximate tagging: k=1 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino16 metal thrashclassic cool Lookup count :0 Metallica 134 Now, select k tags at random in Tags(Metallica) and link them with the new tag 5

Approximate graph vs real graph With DHARMA approximation, the complexity is affordable for very small k The smaller k is, the more the structure of the Folksonomy Graph is upset! We need to compare the approximate and the real Folksonomy Graph 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino17 Insert(r, t 1,…, t m )Tag(t,r)Search step #lookups2 + 2m4 + k2

Approximation in Last.fm We simulate the evolution of FG with the approximated protocol ◦ Simulated resource insertion and tagging activity Goal: draw a comparison between the real Last.fm FG and the approximated FG, for different values of k ◦ Nodal degree ◦ Arcs weight 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino18

Nodal degree comparison 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino19

Arc weights comparison 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino20

Keeping proportions Simulated and real graph are different but… …they should not necessarily be equal Only proportions must be kept 1.Arcs weight ordering must be kept 2.Proportion between weights is not lost (no flattening) 3.Only the “less informative” arcs should be deleted Same proportions grant the preservation of the navigational semantic 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino21

Keeping proportions: k = 1 1.Arcs weight ordering is kept  Kendall’s tau measure between arcs weight rankings is very high (0.7)  OK! 2.Proportion between weights is not lost (no flattening)  Cosine similarity between arcs weight sets is very high (0.8)  OK! 3.Only “less informative” arcs are deleted  40% of arcs gets lost, but 99% of them has weight <= 3  OK! (we wipe out only noisy links) 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino22

Query convergence rapidity: results StepsLastRandFirst Original graph μ σ μ 1/ Approximated graph μ σ μ 1/ /04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino23 A query converges through subsequent filtering on the resource set Estimation of convergence with simulation Convergence is quick Simulated search has no semantic notion Approximation speeds up the search

23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino24

23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino25

23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino26

23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino27

Conclusions Approximated folksonomy mapping allows an efficient P2P implementation The information lost in approximation is prevalently noise (automatic filtering) Search navigation realizes vocabulary specialization, converging to narrow semantic categories Convergence is quick (and even quicker with approximation) DHARMA (Java implementation) is available at: 23/04/2010HOTP2P Luca Maria Aiello, Università degli Studi di Torino28

Related works Other attempts in mapping folksonomies on p2p systems: ◦ E.g. Tagster [1] Privacy-aware P2P online social networks ◦ Safebook [2], PeerSon[3], Likir[4,5] 29/03/2010SESOC Luca Maria Aiello, Università degli Studi di Torino29 [1] Görlitz, Sizov, Staab – ESWC 2008 [2] Cutillo, Molva, Strufe – WONS 2009 [3] Buchegger, Schöiberg, Vu, Datta – SocialNets 2009 [4] Aiello, Milanesio, Ruffo, Schifanella – P2P 2008 [5] Aiello, Ruffo – SESOC 2010

Speaker: Luca Maria Aiello, PhD student Thank you for your attention! Seventh International Workshop on Hot Topics in Peer-to-Peer Systems