Preference-Aware Publish/Subscribe Delivery with Diversity Marina Drosou Master Thesis Supervisor: Evaggelia Pitoura.

Slides:



Advertisements
Similar presentations
AI Pathfinding Representing the Search Space
Advertisements

Design of the fast-pick area Based on Bartholdi & Hackman, Chpt. 7.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Preference-Aware Publish/Subscribe Delivery with Diversity Marina Drosou Department of Computer Science University of Ioannina, Greece Joint work with.
Playback delay in p2p streaming systems with random packet forwarding Viktoria Fodor and Ilias Chatzidrossos Laboratory for Communication Networks School.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
On Novelty in Publish/Subscribe Delivery Kostas Stefanidis * Dept. of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong *Work.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Preferential Publish/Subscribe Marina Drosou Department of Computer Science University of Ioannina, Greece Joint work with Evaggelia Pitoura and Kostas.
Small-world Overlay P2P Network
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Evaluating Search Engine
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Constraint Satisfaction Problems
Design and Evaluation of a Wide-Area Event Notification Service Antonio Carzaniga David S. Rosenblum Alexander L. Wolf.
1 AINA 2006 Wien, April th 2006 DiVES: A DISTRIBUTED SUPPORT FOR NETWORKED VIRTUAL ENVIRONMENTS The IEEE 20th International Conference on Advanced.
1 Efficient Retrieval of User Contents in MANETs Marco Fiore, Claudio Casetti, Carla-Fabiana Chiasserini Dipartimento di Elettronica, Politecnico di Torino,
Simulation.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
P2P Course, Structured systems 1 Skip Net (9/11/05)
P2P Course, Structured systems 1 Introduction (26/10/05)
Background Notification services in LAN Provides Notification Selection Notification Delivery Done on a centralized server (hence not scalable) Challenge.
Marina Drosou Department of Computer Science University of Ioannina, Greece Thesis Advisor: Evaggelia Pitoura
Randomized Algorithms - Treaps
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network David Arinzon Supervisor: Gil Einziger April
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Socially-aware pub-sub system for human networks Yaxiong Zhao Jie Wu Department of Computer and Information Sciences Temple University Philadelphia
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
User Cooperation via Rateless Coding Mahyar Shirvanimoghaddam, Yonghui Li, and Branka Vucetic The University of Sydney, Australia IEEE GLOBECOM 2012 &
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Dynamic Diversification of Continuous Data Marina Drosou and Evaggelia Pitoura Computer Science Department University of Ioannina, Greece
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Marina Drosou, Evaggelia Pitoura Computer Science Department
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.
Content-based Publish-Subscribe Over Structured P2P Networks Peter Triantafillou and Ioannis Aekaterinidis Presented by Jesse Chen 4/10/07.
Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.
An Energy Efficient MAC Protocol for Wireless LANs, E.-S. Jung and N.H. Vaidya, INFOCOM 2002, June 2002 吳豐州.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Information-Centric Networks Section # 10.2: Publish/Subscribe Instructor: George Xylomenos Department: Informatics.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Database Management System
Information Retrieval in Practice
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Presentation transcript:

Preference-Aware Publish/Subscribe Delivery with Diversity Marina Drosou Master Thesis Supervisor: Evaggelia Pitoura

Publish/Subscribe Systems Publish/Subscribe is an alternative to typical searching Users do not need to repeatedly search for new interesting data They specify their interests once and the system automatically notifies them whenever relevant data is made available Example: ▫Google Alerts (Google) ▫Herald (Microsoft) 2 DMOD Laboratory, University of Ioannina 29/09/2008

Publish/Subscribe Systems Parts of a Publish/Subscribe system: ▫Subscribers: consumers of events ▫Publishers: generators of events ▫Event-notification service:  Store subscriptions (user interests)  Match events to subscriptions  Deliver event notifications DMOD Laboratory, University of Ioannina 29/09/2008 3

Typical Publish/Subscribe Example DMOD Laboratory, University of Ioannina 4 title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 21:15 title = Ratatouille genre = comedy showing time = 21:15 title = Fight Club genre = drama showing time = 23:00 title = Fight Club genre = drama showing time = 23:00 title = Casablanca genre = drama showing time = 23:10 title = Casablanca genre = drama showing time = 23:10 title = Vertigo genre = horror showing time = 23:20 title = Vertigo genre = horror showing time = 23:20 title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Fight Club genre = drama showing time = 23:00 title = Fight Club genre = drama showing time = 23:00 title = Casablanca genre = drama showing time = 23:10 title = Casablanca genre = drama showing time = 23:10 title = Vertigo genre = horror showing time = 23:20 title = Vertigo genre = horror showing time = 23:20 Published events Matching events User subscriptions genre = drama genre = horror

Motivation Typically, all subscriptions are considered equally important Users may receive overwhelming amounts of notifications In such cases, users would like to receive only a fraction of those notifications, the most interesting ones Say John is more interested in horror movies than dramas John would like to receive notifications about horror only if there are no (or just a few) notifications about drama movies Current publish/subscribe systems do not allow John to express this different degree of interest DMOD Laboratory, University of Ioannina 29/09/2008 5

Motivation To express some form of ranking among subscriptions, we define priorities among them To do this, we use preferences among subscriptions (preferential subscriptions) Based on preferential subscriptions, we deliver to users only the k most interesting events We also take into account the diversity of delivered events John probably wants to receive some horror movies once in a while and not just dramas DMOD Laboratory, University of Ioannina 29/09/2008 6

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/2008 7

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/2008 8

Flavors of Publish/Subscribe There are two kind of schemes for specifying interesting events: ▫Topic-based  Each event belongs to a number of topics (e.g. “music”, “sport”)  Users subscribe to topics and receive all relevant events ▫Content-based  Users subscribe to the actual content of the events  More expressive In this work we use the content-based scheme DMOD Laboratory, University of Ioannina 29/09/2008 9

Events & Subscriptions An event is a set of typed attributes consisting of: ▫A type ▫A name ▫A value DMOD Laboratory, University of Ioannina 29/09/ string title = The Return of the King string director = Peter Jackson time release date= 1 Dec 2003 string genre = fantasy integer oscars = 11 string title = The Return of the King string director = Peter Jackson time release date= 1 Dec 2003 string genre = fantasy integer oscars = 11 A subscription is a set of typed attribute constraints consisting of: ▫A type ▫A name ▫A binary operator ▫A value string director = Peter Jackson time release date > 1 Jan 2003 string director = Peter Jackson time release date > 1 Jan 2003

Cover Relation Given an event e and a subscription s, s covers e (or e matches s) if and only if every attribute constraint of s is satisfied by some attribute of e DMOD Laboratory, University of Ioannina 29/09/ string title = The Return of the King string director = Peter Jackson time release date= 1 Dec 2003 string genre = fantasy integer oscars = 11 string title = The Return of the King string director = Peter Jackson time release date= 1 Dec 2003 string genre = fantasy integer oscars = 11 Published Event string director = Steven Spielberg string genre = fantasy string release date > 1 Jan 2003 string director = Steven Spielberg string genre = fantasy string release date > 1 Jan 2003 Subscriptions string director = Peter Jackson time release date > 1 Jan 2003 string director = Peter Jackson time release date > 1 Jan 2003

Outline Publish/Subscribe Preliminaries Preference Model Preferential Subscriptions Computing Event Ranks Timing Policies Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

To define priorities among subscriptions: add preferences Two ways to express preferences: ▫Quantitative approach  Preferences are expressed by using scoring functions ▫Qualitative approach  Preferences are expressed by using preference relations between pairs of subscriptions Preferential Subscriptions DMOD Laboratory, University of Ioannina 13 genre = drama genre = horror ≻

Preferential Subscriptions DMOD Laboratory, University of Ioannina 29/09/ A preferential subscription is a subscription enhanced with a degree of interest, called prefrank, which is a real number within the range [0, 1] ps i X = (s i, prefrank i X ) Higher preference ranks denote more important subscriptions

Extracting Preference Ranks Quantitative approach: ▫Preference rank equal to user assigned score Qualitative approach: ▫We assume that priorities follow a strict partial order ▫We use a multilevel variation of the winnow operator to extract the most preferable subscriptions DMOD Laboratory, University of Ioannina 29/09/

Extracting Preference Ranks DMOD Laboratory, University of Ioannina 29/09/ genre = drama genre = comedy ≻ genre = horror genre = romance ≻ genre = action ≻ genre = drama genre = horror genre = romance genre = comedy genre = action User preferences Extracted preference ranks prefrank = 1 prefrank = 1/2 prefrank = 1/3

Computing Event Ranks To measure the importance (or relevance) of a published event e for a user X, we define the event’s rank as a function ℱ of the preference ranks of the subscriptions that cover it rank(e, X) = ℱ(prefrank 1 X, …, prefrank m X ) DMOD Laboratory, University of Ioannina 29/09/ string title = King Kong string director = Peter Jackson time release date = 14 Dec 2005 string genre = adventure string title = King Kong string director = Peter Jackson time release date = 14 Dec 2005 string genre = adventure ℱ = max 0.9

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Continuous Periodic Sliding Window Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

Top-k results Our goal is to deliver only the k highest-ranked events to each subscriber Since events are continuously published, we need to specify the timing period over which the top-k events are computed We examine three timing policies ▫Continuous ▫Periodic ▫Sliding Window The three policies differ in the number and freshness of delivered events DMOD Laboratory, University of Ioannina 29/09/

Continuous Timing Policy A newly published event e is delivered to a user X, if and only if: ▫It is covered by some subscription s issued by X and ▫X has not already received k events more preferable to e Drawback: It is possible for very old but highly preferable event notifications to prevent newer ones from reaching the user ▫Users end up with highly-ranked but stale information DMOD Laboratory, University of Ioannina 29/09/

Stale Information Example DMOD Laboratory, University of Ioannina 29/09/ title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 22:40 title = Ratatouille genre = comedy showing time = 22:40 title = Fight Club genre = drama showing time = 23:00 title = Fight Club genre = drama showing time = 23:00 title = Vertigo genre = horror showing time = 23:20 title = Vertigo genre = horror showing time = 23:20 title = Casablanca genre = drama showing time = 23:10 title = Casablanca genre = drama showing time = 23:10 Published events Top-2 events User preferential subscriptions title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 22:40 title = Ratatouille genre = comedy showing time = 22:40 20:00 20:10 20:15 22:00 22:10 22:

Time-Valid Events Old but highly preferable events may prevent newer ones from reaching the user To overcome this problem, we use the notion of time validity: ▫Events are associated with an expiration time e.exp ▫Only valid events can prevent others from reaching the user A newly published event e is delivered to a user X, if and only if: ▫It is covered by some subscription s issued by X and ▫X’s top-k results do not contain k valid events more preferable to e DMOD Laboratory, University of Ioannina 29/09/

Expiration Example DMOD Laboratory, University of Ioannina 23 title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 22:40 title = Ratatouille genre = comedy showing time = 22:40 title = Fight Club genre = drama showing time = 23:00 title = Fight Club genre = drama showing time = 23:00 title = Vertigo genre = horror showing time = 23:20 title = Vertigo genre = horror showing time = 23:20 title = Casablanca genre = drama showing time = 23:10 title = Casablanca genre = drama showing time = 23:10 Published events Top-2 events title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Apartment genre = comedy showing time = 22:15 title = The Godfather genre = drama showing time = 21:10 title = The Godfather genre = drama showing time = 21:10 title = Ratatouille genre = comedy showing time = 22:40 title = Ratatouille genre = comedy showing time = 22:40 20:00 20:10 20:15 22:00 22:10 22:25 title = Casablanca genre = drama showing time = 23:10 title = Casablanca genre = drama showing time = 23: Event notifications expire at the showing time User preferential subscriptions

Continuous Timing Policy Events are immediately forwarded to the users The number of delivered events depends on the relative order of their publication ▫e.g. if they are published in ascending order in regard to their rank, all of them will be delivered Events are either forwarded or not at the time of their publication DMOD Laboratory, University of Ioannina 29/09/

Periodic Timing Policy Time is divided into periods of duration T ▫Top-k events are computed at the end of each period ▫Ties are resolved by picking the most recently published events The number of delivered events is bounded ▫For a time interval c, the number of delivered events is k  ⌊c/T ⌋ The order of the published events within each period does not affect the results But, high-ranked events appearing in periods with a many other high- ranked ones may fail to be delivered DMOD Laboratory, University of Ioannina 29/09/

Periodic Sliding Policy DMOD Laboratory, University of Ioannina 29/09/ title = Seven genre = horror showing time = 21:00 title = Seven genre = horror showing time = 21:00 title = The Godfather genre = drama showing time = 21:25 title = The Godfather genre = drama showing time = 21:25 title = Jaws genre = horror showing time = 21:30 title = Jaws genre = horror showing time = 21:30 title = Vertigo genre = horror showing time = 21:45 title = Vertigo genre = horror showing time = 21:45 Published events title = The Apartment genre = comedy showing time = 21:10 title = The Apartment genre = comedy showing time = 21:10 20:00 20:20 20:25 20:30 20:45 20:50 title = Psycho genre = horror showing time = 21:50 title = Psycho genre = horror showing time = 21: title = The Apartment genre = comedy showing time = 21:10 title = The Apartment genre = comedy showing time = 21:10 title = Psycho genre = horror showing time = 21:50 title = Psycho genre = horror showing time = 21:50

Sliding Window Timing Policy Top-k events are computed over the events of sliding event-windows of length w ▫An event remains in the window until w newer events arrive ▫If at any time it belongs to the top-k results, it is forwarded to the user Between two consequent windows, at most one event is forwarded The number of delivered events depends on the order of their publication, as in the continuous timing policy DMOD Laboratory, University of Ioannina 29/09/

Sliding Window Timing Policy DMOD Laboratory, University of Ioannina 29/09/ title = Seven genre = horror showing time = 21:00 title = Seven genre = horror showing time = 21:00 title = The Godfather genre = drama showing time = 21:25 title = The Godfather genre = drama showing time = 21:25 title = Jaws genre = horror showing time = 21:30 title = Jaws genre = horror showing time = 21:30 title = Vertigo genre = horror showing time = 21:45 title = Vertigo genre = horror showing time = 21:45 Published events title = The Apartment genre = comedy showing time = 21:10 title = The Apartment genre = comedy showing time = 21:10 20:00 20:20 20:25 20:30 20:45 20:50 title = Psycho genre = horror showing time = 21:50 title = Psycho genre = horror showing time = 21: title = The Apartment genre = comedy showing time = 21:10 title = The Apartment genre = comedy showing time = 21:10 title = The Godfather genre = drama showing time = 21:25 title = The Godfather genre = drama showing time = 21:25 title = Psycho genre = horror showing time = 21:50 title = Psycho genre = horror showing time = 21:50 Top-1

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Event Distance Diverse top-k results Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

Diversity Top-k results are often very similar to each other Goal: Increase user satisfaction by diversifying forwarded events DMOD Laboratory, University of Ioannina 29/09/ title = Vertigo genre = horror showing time = 21:45 title = Vertigo genre = horror showing time = 21:45 title = The Godfather genre = drama showing time = 21:25 title = The Godfather genre = drama showing time = 21:25 title = Pulp Fiction genre = drama showing time = 21:50 title = Pulp Fiction genre = drama showing time = 21:50 title = Goodfellas genre = drama showing time = 21:20 title = Goodfellas genre = drama showing time = 21:20 Published events

Event Distance We measure the distance of two events based on the number of their common attributes: DMOD Laboratory, University of Ioannina 29/09/ Then, the diversity of a list L of m events is given as:

Choosing Diverse Events To identify the k most diverse ones out of a set of q events ▫Find all combinations of k events ▫Measure their diversity ▫Choose the one with the larger diversity This method has a high complexity, which makes it impractical to use We use a heuristic to lower the complexity of locating diverse events DMOD Laboratory, University of Ioannina 29/09/

Heuristic Given a set of events {e 1, …, e q ), we aim to produce a list of k diverse ones, q > k We use the Event-List Distance: First, we choose a random event to be added to L Then, until we have added k events ▫We measure the Event-List Distances of all candidate events ▫We add to L the event with the maximum one DMOD Laboratory, University of Ioannina 29/09/

Heuristic Evaluation To compare the quality of our heuristic to the optimal case, we generate sets of q random numbers in [0, 100] and select k of them using ▫The optimal method ▫The heuristic ▫A random choice DMOD Laboratory, University of Ioannina 29/09/ qkRandomHeuristicOptimal

Diverse Top-k Events We aim to deliver diverse top-k results, i.e. events that are both: ▫Highly-ranked ▫Diverse We modify our heuristic to choose events based on a combination of rank and distance from other chosen events: First, we choose the highest-ranked event to be added to L Then, until we have added k events ▫We measure the Event-List Distances of all candidate events ▫We add to L the event with the maximum value of divrank DMOD Laboratory, University of Ioannina 29/09/

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Ranking in Publish/Subscribe Preferential Subscription Graph Forwarding events Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

Matching Events to Subscriptions Whenever a new event e is published: ▫Locate users with matching subscriptions ▫Check whether e belongs to their top-k valid results Events need to be checked against user subscriptions Exploit cover relations between subscriptions DMOD Laboratory, University of Ioannina 29/09/ string director = Peter Jackson string genre = fantasy string director = Peter Jackson string genre = fantasy

Preferential Subscription Graph We organize preferential subscriptions using a directed acyclic graph Preferential subscription graph (PSG): ▫Nodes correspond to subscriptions ▫Edges correspond to cover relations between subscriptions ▫A PrefRank Set is assigned to each subscription  Pairs of the form (user, prefrank) DMOD Laboratory, University of Ioannina 29/09/

Forwarding Events (Continuous Timing Policy) A server maintains: ▫A Preferential Subscription Graph ▫A list of k elements of the form (rank, expiration) for each user X (list X ) Each such element represents an event previously forwarded to the user: ▫rank is the event’s rank ▫expiration is the event’s expiration time DMOD Laboratory, University of Ioannina 29/09/

Forwarding Events (Continuous Timing Policy) Initially, all lists are empty Upon receiving an event e Walk through PSG and locate all subscriptions that cover e For each user X associated with at least one such subscription Compute e’s event rank rank(e,X) for X Remove expired elements from list X If | list X | minimum rank in list X Add (rank(e,X), e.exp) to list X Forward e to X DMOD Laboratory, University of Ioannina 29/09/

Forwarding Events (Continuous Timing Policy) DMOD Laboratory, University of Ioannina 41 cinema = ster John, 0.5 genre = drama showing time > 21:00 John, 0.7 Anna, 0.6 cinema = ster genre = drama showing time > 21:00 John, 0.9 Anna, 0.7 genre = drama Anna, 0.5 John rankexp 0.910: :00 Anna rankexp 0.711: :00 Current time: 08:00 Published event e: title = The Godfather genre = drama cinema = odeon showing time = 21:10 expiration 21:10 rank(e, John) = 0.7 rank(e, Anna) = :10 Top-2 lists

Forwarding Events (Periodic & Sliding Window) In the Periodic Timing Policy a server maintains: ▫A Preferential Subscription Graph ▫A buffer of matching events published in the current period ▫At the end of the period, we compute the top-k events for each user and forward them In the Sliding Window Timing Policy a server maintains: ▫A Preferential Subscription Graph ▫A buffer of matching events per subscriber ▫Whenever a window slides, we compute the top-k events for the corresponding user and forward any of those that have not been forwarded before DMOD Laboratory, University of Ioannina 29/09/

Applying Diversity In the continuous policy, we do not maintain any information about the content of past event, therefore no diversity is applied In the periodic and sliding window policies, whenever we compute top- k results, we choose events based on their divranks DMOD Laboratory, University of Ioannina 29/09/

Topology of Servers The event-notification service can be implemented over a network of servers ▫Servers are organized in a hierarchy ▫Each server sends to its parent the subscriptions in the top level of its PSG  Only events matching those need to be propagated to the server ▫To reduce the number of those subscriptions, clients are not connected to a random server, but instead probe a number of servers during bootstraping and choose one based on  The number of new nodes that will be created in the top level of the PSG  The total number of nodes in the PSG (load balance) DMOD Laboratory, University of Ioannina 29/09/

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

Dataset We have fully implemented our approach as an extension of SIENA, a multi-threaded, distributed publish/subscribe system (PrefSIENA) We use a real movie dataset, derived from IMDB. Data about movies: ▫Title, year, budget, length, rating, MPAA-rating, genre Publishers generate events by randomly selecting movies. Subscriptions are generated by choosing a random number of attributes and attribute values based on a zipf distribution DMOD Laboratory, University of Ioannina 29/09/

Experiments We perform three sets of experiments: ▫Number of delivered events ▫Quality of delivered events  Average rank  Diversity  Freshness ▫Overhead evaluation  Time We examine a number of publication scenarios: ▫Best-First ▫Best-Last ▫Burst ▫Random DMOD Laboratory, University of Ioannina 29/09/

Number of Delivered Events DMOD Laboratory, University of Ioannina 29/09/ Matching/Published events: 800/2500 Continuous:e.g. Random: 9%-29% Periodic: 7% - 35% Sliding:e.g. Random: 25% Larger period/window: fewer delivered events C ONTINUOUS T IMING P OLICY P ERIODIC T IMING P OLICY S LIDING W INDOW T IMING P OLICY

Number of Delivered Events (Diversity) DMOD Laboratory, University of Ioannina 29/09/ S LIDING W INDOW T IMING P OLICY P ERIODIC T IMING P OLICY Periodic ▫Same number (bounded by k) Sliding ▫Number of delivered notifications increases slightly  more than one event may be delivered at each window

Quality of Delivered Events (Average Rank) DMOD Laboratory, University of Ioannina 29/09/ C ONTINUOUS T IMING P OLICY P ERIODIC T IMING P OLICY S LIDING W INDOW T IMING P OLICY Average rank of all matching events: 0.59 The average rank increases in all cases, even though in the presence of many high-ranked events some may be dismissed. When diversifying, average rank decreases slightly.

Quality of Delivered Events (Diversity) DMOD Laboratory, University of Ioannina 29/09/ S LIDING W INDOW T IMING P OLICY R ANDOM S CENARIO List-Distance of delivered events ▫ σ = 1 ▫ σ = 0.2 List-Distance indeed increases with our diversity heuristic

Quality of Delivered Events (Freshness) DMOD Laboratory, University of Ioannina 29/09/ C ONTINUOUS T IMING P OLICY P ERIODIC T IMING P OLICY S LIDING W INDOW T IMING P OLICY Freshness: Time between publication and delivery Continuous: independent from scenario Periodic:depends on scenario and T Sliding:depends on w

Quality of Delivered Events (Freshness) DMOD Laboratory, University of Ioannina 29/09/ S LIDING W INDOW T IMING P OLICY P ERIODIC T IMING P OLICY When diversity is applied, freshness is reduced ▫extra time needed for top-k computation

Performance DMOD Laboratory, University of Ioannina 29/09/ M ATCHING T IME (P ER E VENT ) D IVERSIFICATION T IME (P ER W INDOW ) Two substantial overheads: ▫To compute event relevance we have to locate all matching subscriptions instead of just one  On average, we have to check twice as many nodes of the PSG ▫Maintaining state and perform top-k computations, esp. when diversifying  Depends on buffer size and k

Outline Publish/Subscribe Preliminaries Preference Model Timing Policies Event Diversity Ranking in Publish/Subscribe Evaluation Summary DMOD Laboratory, University of Ioannina 29/09/

Summary We extend publish/subscribe systems to incorporate ranking ▫Increase quality of delivered events Events are ranked based on preferential subscriptions Diversity of delivered events is also taken into account We examine different timing policies for top-k computation We have fully implemented our approach (PrefSIENA) DMOD Laboratory, University of Ioannina 29/09/

Related Work Ranking in Publish/Subscribe: Ashwin Machanavajjhala, Erik Vee, Minos Garofalakis, Jayavel Shanmugasundaram “Scalable Ranked Publish/Subscribe” 34th International Conference on Very Large Data Bases (VLDB), Auckland, August 23-28, 2008 Kresimir Pripuzic, Ivana Podnar Zarko, Karl Aberer “Top-k/w Publish/Subscribe: Finding k Most Relevant Publications in Sliding Time Window w” 2nd International Conference on Distributed Event-Based Systems, (DEBS) Rome, July 1-4, 2008 Christian Zimmer, Christos Tryfonopoulos, Klaus Berberich, Gerhard Weikum, Manolis Koubarakis “Node Behavior Prediction for LargeScale Approximate Information Filtering” 1st Workshop on Large-Scale Distributed Systems for Information Retrieval, Amsterdam, July 27, 2007 Diversity: Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen “Improving Recommendation Lists Through Topic Diversification” 14th International World Wide Web Conference (WWW), Chiba, Japan, May 10-14, 2005 Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram,, Prashant Bhat, Sihem Amer Yahia “Efficient Computation of Diverse Query Results” IEEE 24th International Conference on Data Engineering (ICDE), Cancun, Mexico,, 7-12 April 2008 DMOD Laboratory, University of Ioannina 29/09/

Thank you! DMOD Laboratory, University of Ioannina 58

Average Rank (+ Diversity) Average rank of events when diversity is used ▫Average rank drops slightly, as diversity increases DMOD Laboratory, University of Ioannina 29/09/ S LIDING W INDOW T IMING P OLICY P ERIODIC T IMING P OLICY

Complexities (required DIS(e 1,e 2 ) operations) Optimal: Heuristic: DMOD Laboratory, University of Ioannina 29/09/