P2P RECOMMENDER SYSTEMS: A (SMALL) SURVEY Giulio Rossetti.

Slides:



Advertisements
Similar presentations
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
Advertisements

Research Issues in Web Services CS 4244 Lecture Zaki Malik Department of Computer Science Virginia Tech
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
A P2P REcommender system based on Gossip Overlays (PREGO) ‏ R.Baraglia, P.Dazzi M.Mordacchini, L.Ricci A P2P REcommender system based on Gossip Overlays.
20/10/2006ALPAGE1 Ordered slicing of very large scale overlay networks Mark Jelasity University of Bologna, Italy Anne-Marie Kermarrec INRIA Rennes/IRISA,
Cognitive Publish/Subscribe for Heterogeneous Clouds Šarūnas Girdzijauskas, Swedish Institute of Computer Science (SICS) Joint work with:
Distributed Slicing in Dynamic Systems A. Fernández, V. Gramoli, E. Jiménez, A-M. Kermarrec, M. Raynal.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Democratizing personalization Anne-Marie Kermarrec Joint work with A. Boutet, D. Frey, R. Guerraoui, A. Jégou, H. Ribeiro.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Distributed Slicing in Dynamic Systems A. Fernández, V. Gramoli, E. Jiménez, A-M. Kermarrec, M. Raynal.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
Recommender systems Ram Akella November 26 th 2008.
Building Knowledge-Driven DSS and Mining Data
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
© Y. Zhu and Y. University of North Carolina at Charlotte, USA 1 Chapter 1: Social-based Routing Protocols in Opportunistic Networks Ying Zhu and.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Network Kernel Architectures and Implementation ( ) Naming and Addressing Chaiporn Jaikaeo Department of Computer Engineering.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Recommender systems Drew Culbert IST /12/02.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Recommendation system MOPSI project KAROL WAGA
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
A Graph-based Friend Recommendation System Using Genetic Algorithm
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Efficient Labeling Scheme for Scale-Free Networks The scheme in detailsPerformance of the scheme First we fix the number of hubs (to O(log(N))) and show.
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Computer Science 1 Using Clustering Information for Sensor Network Localization Haowen Chan, Mark Luk, and Adrian Perrig Carnegie Mellon University
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.
CS 590 Term Project Epidemic model on Facebook
Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
User Modeling and Recommender Systems: recommendation algorithms
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
An overlay for latency gradated multicasting Anwitaman Datta SCE, NTU Singapore Ion Stoica, Mike Franklin EECS, UC Berkeley
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Presented by Edith Ngai MPhil Term 3 Presentation
Recommender Systems & Collaborative Filtering
Pastry Scalable, decentralized object locations and routing for large p2p systems.
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
DHT Routing Geometries and Chord
Group Based Management of Distributed File Caches
Korea University of Technology and Education
Presentation transcript:

P2P RECOMMENDER SYSTEMS: A (SMALL) SURVEY Giulio Rossetti

Talk Outline Decentralized RS Why P2P?A (small) survey Centralized RS Well-known families of approaches Collaborative Filtering (Idea) Problem Definition What is a recommender System? Why recommender Systems?

What are Recommender Systems? RSs are a class of information filtering system that seek to predict: the rating or, preference that user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered, using a model built from the characteristics of items (content-based approaches) or user's social environment (collaborative filtering approaches)

Why recommender sistems? Nowadays the amount of information we are retrieving have become increasingly enormous (Big Data) What we really need is a technology that can assist us find resources of interest among the overwhelming data available “[…] a personalized information filtering used to either predict whether a particular user will like a particular item (prediction problem) or to identify a set of N items that will be of interest to a certain user.”

Well-known families of approaches randomly choosing of items from the set of available ones to recommends them to the user Random Prediction if a customer frequently rates items we can exploit his frequent rating pattern to recommend other items (with similar rate) to him. Frequent sequences requires the recommendation seekers to express their preferences by rating items: more users rate items (or categories) more accurate the recommendation becomes. Collaborative filtering algorithms attempt to recommend items that are similar to items the user liked in the past. Content based algorithms

Centralized Approaches Two main family of metodologies were studied in recent years: User-based CF are CF algorithms that work on the assumption that each user belongs to a group of similar behaving users. The basis for the recommendation is composed by items that are liked by users. Items are recommended based on users tastes. The algorithm considers that users who are similar (have similar attributes) will be interested on same items. Item-based CF are a CF algorithms that look at the similarity between items to make a prediction. The idea is that users are most likely to purchase items that are similar to the ones already bought in the past; so by analyzing the purchasing information we can have an idea about what he may want in the future.

P2P: Motivations The need for efficient decentralized recommender systems has been appreciated for some time, both for the intrinsic advantages of decentralization and the necessity of integrating recommender systems into P2P applications. The two main advantages gathered are: 1. the predictions can be distributed among all users, removing the need for a costly central server and enhancing scalability 2. a decentralized recommender improves the privacy of the users for there is no central entity storing owning the private information of the users.

P2P Recommender systems: a small survey User-based CF: Buddicast, kNN (Random Samples & T-MAN) P2PRec: a social based recommender system SoCS: Social Graph Embedding Random Walks

User-Based Collaborative Filtering Ormándi, I. Hegedas and M. Jelasity Node Balancing issue: Overlay topologies defined by node similarity have often highly unbalanced degree distributions (i.e. power-law). Overlay management: how can be builded and maintained the best possible overlay for computing recommendation scores (taking care bandwith of usage at the nodes)? Desiderata: a minimal, uniform load from overlay management even when the in-degree distribution of the expected overlay graph is unbalanced Approaches: BuddiCast, kNN (Random Sampling & T-MAN)

BuddiCast Each node local view contains a full descriptor of the node’s neighbors (i.e. ratings). Computing reccomendations do not load the network (local information approach). Load balancing: Block list: If a node communicates with another peer, it is put on the block list for few hours. Candidate list: contains close peers for potential communication Random list: contains random samples from the network. For overlay maintenance, each node connects to the best node from the candidate list with probability α, and to a random list with probability 1−α, and exchanges its buddy list with the selected peer.

kNN: Random Samples Every node has a local view of size k that contains node descriptors. Each node is initialized with k random samples from the network, which iteratively approximate the kNN graph. The convergence is based on an iterative random sampling process. Random nodes are inserted into the view (which is implemented as a bounded priority queue) The queue’s priority is based on the similarity function provided by the recommender module.

kNN: T-Man sampling Overlay managed with the T-MAN algorithm: T-MAN periodically updates the node’s view (of size k) by: 1.selecting a peer node to communicate with 2.exchanging its view with the peer 3.merging the two views and keeping the closest k descriptors Peer (communitication) selection methods: Global: selects the node from the whole network randomly View: selects the node from the view uniformly at random Proportional: selects a node from view but with different probability distribution Best: selects the most similar node without any restriction

User-based CF: Observations 1. In unbalanced distribution cases is not optimal to use the kNN (T-Man Best) view (a more relaxed one can give better recommendation performance) 2. Overlay construction converges reasonably fast even in the case of random updates or with T-MAN 3. T-MAN with Global selection is a good choice: 1. it has a fully uniform load distribution combined with an acceptable convergence speed, which is better than that of the random view update

P2PRec: a social based P2P recommender system Draidi and Pacitti T he idea: recommend high quality documents related to query topics and contents hold by friends (or FOAF), who are expert on the topics related to the query. Assumptions: each node represents a peer labelled with the contents it stores and its topics of interests; expertise is deduced based on the contents stored by a user; the topics each peer is interested in are calculated by analyzing the documents he holds; to disseminate information about experts is adopted a semantic-based gossip algorithms that provide scalability, robustness and load balancing.

How P2Prec works 1. Latent Dirichlet Allocation (LDA) is used to automatically model the topics in the system 1. Training - Global level: identification of the complete set of topics 2. Inference - local (node) level: extraction of the topics of interest for the user 2. Dissemination of local information by a gossip algorithm 1. FOAF descriptor: topics of interest, trust level 2. At each gossip exchange, each user u checks its local-view for relevant similar peer with respect topics of interests and friendship networks: If founded, a demand of friendship is launched. 3. Querying 1. A key-word query q is associated a TTL and is routed recursively in a P2P top-k manner

Social Graph Embedding A. Kermarrec, V. Leroy and G. Trédan A proximity metric between users enable to predict potential relevant future relationships (Link Prediction) SoCS (Social Coordinate System) Fully distribuited algorithm that embeds a social graph in an Eucliedean space Nodes gets assigned coordinate w.r.t. their social position Community structure is preserved Force-based embedding (FBE): Edges represent springs and nodes represent electrically equally charged particles. Edges (springs) attract the vertices they link, whereas vertices (particles) repulse each other. The embedding is achieved once the system reaches an equilibrium.

SoCS Algorithm Social Neighbors: Nodes that have close social positions. Graph neighbors and social neighbors of a node are not necessarily the same. Each node regularly updates its position in the social space: 1. first gathers the positions of its graph and social neighbors 2. using these positions computes the forces that are applied to it, and derives its updated social position 3. a gossip protocol provides to the node a list of its new social neighbors 4. this list is then used to compute new positions Similarity metrics: SoCS will recommend to a node its closest social neighbors that are not already graph neighbors. Common Neighbors, Jaccard, Adamic\Adar, Path Length, Katz…

SoCS Algorithm (2) SoCS relies on gossip to discover the social neighbors. Each node runs a clustering algorithm (Neighbors Peer Sampling - NPS) in order to maintain and update its social neighbors list. Gossip protocols have been shown to be cheap, robust against churn, and to converge quickly

Decentralized Random Walks A. Kermarrec, V. Leroy, A. Moin and C. Thraves The application of random walks to decentralized environments is different from the centralized version. Centralized RS: Random walks are used as clustering mechanism (e.g. community discovery) Decentralized RS: CD infeasible: the knowledge of each peer about the P2P network is limited to its neighborhood. Proposed Approach 1. Each peer is provided with a neighborhood composed of a small set of similar peers by means of an epidemic (gossip) protocol; 2. Ratings for unknown items are estimated by a random walk on the neighborhood. Once peers have stabilized their neighborhood they can calculate recommendations indipendently Similarity measure: Pearson Correlation, Jaccard

Random Walks observed properties The users in the neighborhood are modeled as Markov Chain graph vertices, and a random walk is applied on this graph. A Markov chain can be represented by a directed graph where vertices are the states of the chain and edges represent the transition probabilities from one state to another. Results: Random walk works well when the data is so sparse that classic similarity measures fail to detect meaningful relation between users; Increasing the neighborhood size the accuracy increase; decentralized user-based approaches perform better (low complexyty, high precision) than their item-based counterparts in P2P recommender applications; Cosine similarity performed better in decentralized item-based algorithms, while Pearson correlation worked better for decentralized user-based algorithms

Conclusions P2P Recommender systems are needed in order to overcome scalability and privacy issues Several approaches were analyzed Each one relying (to some extent) to gossip algorithm in order to maintain and update the overlay network Allmost all the discussed approaches takle the problem with a user-based similarity strategy exploiting classical network theory approaches; Unsupervised Link Prediction Community Discovery Force directed embedding

Bibliography D. Almazro and G. Shahatah. A survey paper on recommender systems (2010) F. Draidi and E. Pacitti. Demo of P2Prec: a Social-based P2P Recommendation System. (2011) A. Kermarrec, V. Leroy, A. Moin and C. Thraves. Application of random walks to decentralized recommender systems. (2010) A. Kermarrec, V. Leroy and G. Trédan. Distributed social graph embedding. (2011) R. Ormándi, I. Hegedas and M. Jelasity. Overlay management for fully distributed user-based collaborative filtering. (2010) …questions?