Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preference and Diversity-based Ranking in Network-Centric Information Management Systems PhD defense Marina Drosou Computer Science & Engineering Dept.

Similar presentations


Presentation on theme: "Preference and Diversity-based Ranking in Network-Centric Information Management Systems PhD defense Marina Drosou Computer Science & Engineering Dept."— Presentation transcript:

1 Preference and Diversity-based Ranking in Network-Centric Information Management Systems PhD defense Marina Drosou Computer Science & Engineering Dept. University of Ioannina

2 Why diversify? Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 2 Car Animal Sports Team “Mr. Jaguar’’

3 Thesis Goal This PhD thesis concerns the development, implementation and evaluation of models, algorithms and techniques for the ranking of information being presented to users of network-centric information management systems This ranking is based on the importance of each piece of information. We consider that importance is influenced by both relevance to user information needs and diversity:  Relevance is important so that users are only presented with the most useful results according to their needs  Diversity ensures that the received results do not all contain similar information. Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 3

4 Outline  Search Result Diversification: Introduction & Related Work  Content Diversification using Indices  DisC Diversity: Diversification based on Dissimilarity and Coverage  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 4

5 Outline  Search Result Diversification: Introduction & Related Work  Problem Definition  Variations  Algorithms  Content Diversification using Indices  DisC Diversity: Diversification based on Dissimilarity and Coverage  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 5

6 Problem Definition Given: 1.P = {p 1, …, p n } 2.k ≤ n 3.d: a distance metric 4. f: a diversity function Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 6 Find: Given a set P of items and a number k, select a subset S* of P with the k most diverse items of P

7 What it means Given a set P of query results we want to select a representative diverse subset S* of P What does diverse mean?  Content: dissimilar items  e.g., distant location on a map, different attribute values in tuples  Coverage: different aspects, perspectives, concepts  e.g., different interpretations of a keyword in web search, different topics  Novelty: items not seen in the past  e.g., novel results in a notification service Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 7

8 Content-based diversity Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 8

9 Coverage-based diversity Basic idea: Find a set of results that cover different interpretations of the query Common assumptions:  A taxonomy exists  Both queries and results may belong to many categories  Statistics on the distribution of user intents have been collected  Result independence Probabilistic view of the problem Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 9

10 Novelty-based diversity Novelty: the need to avoid redundancy (vs. Diversity: the need to resolve ambiguity)  Intuitively: an item should be returned in the i th position of the list if  it is relevant  the previous (i-1) items do not contain the same information Information is partitioned into “nuggets”  Often, human judges decide what is relevant or not for each nugget (IR approach) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 10

11 Adding relevance in the mix We must not forget: Relevance to the query is also important!  Results must be both relevant and diverse Two alternatives:  Select the k most diverse results out of the top-m most relevant ones, m > k  Include diversity into the ranking criterion  Augmenting diversity function with relevance  Adapting IR criteria, e.g., discounted cumulative gain(DGC) at position i Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 11

12 Adding relevance in the mix Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 12

13 Problem Complexity The problem of choosing diverse items is NP-hard  This follows from the MAX COVERAGE/SET COVER problems  Intuitively:  To find the most diverse subset S* of all items P we have to compute all possible combinations of k items out of |P| and keep the one with the maximum diversity Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 13

14 Solving the problem Thus, we use heuristics for approximate solutions  Greedy heuristics:  Selecting items one by one until we have k of them  Interchange heuristics:  Start with a random solution and interchange items that improve the objective function  Also:  Neighborhood heuristics: Disqualify items close to the ones already selected  Simulated Annealing: Apply simulated annealing to avoid local maxima  and others Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 14

15 Related Work Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 15 ContentCoverageNovelty Greedy Jain, PAKDD 2004 Ziegler, WWW 2005 Gollapudi, WWW 2009 Drosou, DEBS 2009 Tao, ICDE 2009 Haritsa, IEEE Data Eng. Bull. 2009 Vieira, ICDE 2011 Bozzon, ICWE 2012 Santos, SSBDM 2013 Abbar, WWW 2013 Valkanas, EBDT 2013 Agrawal, WSDM, 2009 Liu, SDM 2009 Zhu, WWW 2011 Li, CIKM 2012 Zhang, SIGIR 2002 Clarke, SIGIR 2008 Souravlias, 2010 Lathia, SIGIR 2010 Szpektor, WWW 2013 Interchange Yu, EDBT 2009 Vieira, ICDE 2011 Liu, PVLDB 2009 Minack, SIGIR 2011 Liu, TODS 2012 Others Vee, ICDE 2008 Zhang, RecSys 2008 Angel, SIGMOD 2011 Fraternali, SIGMOD 2012 Li, PVLDB 2013

16 Outline  Search Result Diversification: Introduction & Related Work  Content Diversification using Indices  Model  Diverse set computation  Combining diversity & relevance  DisC Diversity: Diversification based on Dissimilarity and Coverage  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 16

17 Introduction We focus on content-based diversification  M AX M IN Basic idea: employ indices for the efficient computation of diverse Items  Cover Trees We also define the Continuous k-diversity problem Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 17

18 The Cover Tree A leveled tree where each level is a “cover” for all levels beneath it Items at higher levels are farther apart from each other than items at lower levels Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 18

19 Cover Tree Invariants - Nesting Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 19 p1p1 p1p1 p2p2 p2p2 p2p2 p3p3 p3p3

20 Cover Tree Invariants - Covering Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 20 p1p1 p1p1 p2p2 p2p2 p2p2 p3p3 p3p3  b l-1 b: the “base” of the treel: the level of p i

21 Cover Tree Invariants - Separation Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 21 p1p1 p1p1 p2p2 p2p2 p2p2 p3p3 p3p3 > b l-2  b l-1

22 Example Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 22 Items indexed at the first ten levels of the same Cover Tree

23 Cover Tree Representations Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 23 Implicit RepresentationExplicit Representation O(n) space space depending on P p1p1 p1p1 p2p2 p2p2 p2p2 p3p3 p3p3 p1p1 p2p2 p3p3

24 Dynamic Construction Items can be inserted and deleted from a Cover Tree in a dynamic fashion Insertion: 1.Starting from the root, descend towards the candidate nodes that can cover the new item p 2.Continue until a level C l is reached where p is separated from all other items 3.Select as parent a candidate node of C l+1 that covers p Deletion: 1.Descend the tree looking for p, keeping note of candidate nodes that can cover the children of p 2.Remove p and reassign its children to the candidate nodes Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 24

25 Level Family of Algorithms Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 25

26 Approximation Bound Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 26 (Proved by exploiting the covering invariant of the tree to bound the level where the least common ancestor of any two items of the optimal solution appears in the tree) Let P be a set of items, k  2, d OPT (P,k) the optimal minimum distance for the M AX M IN problem and d CT (P,k) be the minimum distance of the diverse set computed by the Level-Basic algorithm. It holds that: d CT (P,k)    d OPT (P,k), where  = (b-1)/(2b 2 ) Let P be a set of items, k  2, d OPT (P,k) the optimal minimum distance for the M AX M IN problem and d CT (P,k) be the minimum distance of the diverse set computed by the Level-Basic algorithm. It holds that: d CT (P,k)    d OPT (P,k), where  = (b-1)/(2b 2 )

27 Cover Tree implementation of Greedy Any Cover Tree can be employed for implementing the greedy heuristic  ½-approximation of the optimal solution We perform k descends of the tree, using one of the following pruning rules: Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 27

28 Batch Construction If all items of P are available, we can perform a batch construction of the Cover Tree  We call such trees “Batch Cover Trees” (BCTs)  As we descend a BCT, we get items in the order selected by Greedy Algorithm: 1.The leaf level C l contains all items in P 2.We greedily select items from C l with distance larger than b l+1 and promote them to C l+1 3.The rest of the items in C l are distributed as children among the new nodes of C l+1 4.Continue until we reach the root level Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 28

29 Adding relevance Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 29

30 Continuous Model We consider a streaming scenario, where new items arrive and older items expire We want to provide users with a continuously updated subset of the top-k most diverse recent items in the stream We consider a sliding-window model: Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 30 wjump step Window P i-1 Window P i

31 Continuous k-Diversity Problem Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 31

32 Continuity Requirements Items in the tree are marked as valid or invalid:  Freshness: non-diverse items that are older than the newest diverse item from the previous window are marked as invalid in the cover tree and are not further considered.  Durability: Let r be the number of diverse items from previous windows that have not yet expired. We select k-r new valid diverse items from the new window. Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 32

33 Building Batch Cover Trees We measure the extra cost of building a BCT as compared to executing the greedy heuristic (GR) for k = n  This extra cost corresponds to assigning nodes to suitable parents to form the tree levels Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 33 ClusteredFaces bnon-npnpnon-npnp 1.30.42%0.58%1.49%1.94% 1.50.42%0.56%1.47%1.92% 1.70.41%0.55%1.47%1.91% np – nearest parent heuristic (choose closest candidate parent). The quality of the solution is the same for BCT and GR. Extra Cost

34 Building Incremental Cover Trees Building ICTs requires a small fraction of the cost required for the corresponding BCTs However, the quality of the solutions provided by ICTs is comparable to that of BCTs (and, thus, GR) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 34 bClusteredFaces 1.30.16%0.79% 1.50.08%0.41% 1.70.06%0.28% For trees with 10,ooo items:  Insertion cost: ~2.6 msec  Deletion cost: ~10 msec Inserting/Removing items after a window jump depends on the size of the window and the jump step but is much faster than re-building a BCT for the new set of items Extra Cost

35 Pruning Pruning is even better for non uniform datasets, since each selection of a diverse item results in pruning a largest number of items around it Also, pruning is better for large values of λ Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 35

36 Streaming Data We compare ICTs against SGR, a streaming version of GR:  At each window, we keep any remaining diverse items from the previous window (durability) and let GR select items from the new window satisfying freshness Comparable achieved diversity, while ICTs are much faster Retrieving the top-100 items from an ICT with 1,000-10,000 items requires ~1.5 msec Executing SGR requires 3.2 sec for 5,000 items and more than 15 sec for 10,000 items Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 36

37 Summary We proposed an indexed-based diversification approach based on Cover Trees We provided a new suite of algorithms along with theoretical results for the quality of our approach We studied the diversification problem in a dynamic setting, where items change over time and defined continuity requirements that the diversified items must satisfy Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 37

38 Related Publications 1.M. Drosou and E. Pitoura, Diverse Set Selection over Dynamic Data, in IEEE TKDE (to appear) 2.M. Drosou and E. Pitoura, Dynamic Diversification of Continuous Data, EDBT 2012 Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 38

39 Outline  Search Result Diversification: Introduction & Related Work  Content Diversification using Indices  DisC Diversity: Diversification based on Dissimilarity and Coverage  DisC Diversity  Algorithms  Comparison with other models  Incremental DisC  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 39

40 DisC Diversity What is the right size for the diverse subset S? What is a good k? What if… instead of k, a radius r? Given a result set P and a radius r, we select a representative subset S ⊆ P such that: 1.For each item in P, there is at least one similar item in S (coverage) 2.No two items in S are similar with each other (dissimilarity) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 40

41 DisC Diversity 41 Zoom-outZoom-inLocal zoom  Small r: more and less dissimilar points (zoom in)  Large r: less and more dissimilar points (zoom out)  Local zooming at specific points Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems

42 DisC Diversity Formal definition: Let P be a set of objects and r, r ≥ 0, a positive real number. A subset S ⊆ P is an r-Dissimilar-and-Covering diverse subset, or r-DisC diverse subset, of P, if the following two conditions hold: 1.coverage condition: ∀ p i ∈ P, ∃ p j ∈ N + r (p i ), such that p j ∈ S 2.dissimilarity condition: ∀ p i, p j ∈ S with p i ≠ p j, it holds that d(p i, p j ) > r Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 42 Since a DisC set for a set P is not unique  We seek a concise representation → the minimum DisC set

43 Graph model We use a graph to model the problem:  Each item is a vertex  There exists an edge between two vertices, if their distance is less than r Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 43 r

44 Graph model Finding a minimum r-DisC diverse subset of a set P is equivalent to finding a minimum Independent Dominating set of the corresponding graph  Independent: no edge between any two vertices in the set  Dominating: all vertices outside connected with at least one inside This is an NP-hard problem Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 44 Dominating, not independentDominating and independent 

45 Computing DisC subsets A basic or greedy approach:  select random items or items with large neighborhoods Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 45

46 How smaller is the (optimal) minimum DisC set? where B the maximum number of independent neighbors of any item in P  i.e., each item has at most B neighbors that are independent from each other B depends on the distance metric and data cardinality  We have proved that:  for the Euclidean distance in the 2D plane: B = 5  for the Manhattan distance in the 2D plane: B = 7  for the Euclidean distance in the 3D plane: B = 24 Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 46 The size of any r-DisC diverse subset S of P is  B times the size of any minimum r-DisC diverse subset S*

47 Raising the dissimilarity condition When we consider only coverage: Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 47 Let Δ be the maximum number of neighbors of any item in P; the size of any covering (but not dissimilar) diverse subset S of P is at most lnΔ times larger than any minimum covering subset S*

48 Adding weights Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 48

49 Multiple radii We want to allow different areas of the data to contribute more or less items to the diverse set The problem now loses its symmetry Two interpretations: 1.p i can represent all items lying at a distance at most r(p i ) around it (Covering problem) 2.p i can be represented only by items lying at a distance at most r(p i ) around it (CoveredBy problem) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 49

50 Multiple radii variations The problem is now modeled via a directed graph  Directed graphs do not always have an independent dominating set!  We provide heuristic algorithms that always locate a valid DisC set  Covering: start with items with larger radii  CoveredBy: start with items with smaller radii Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 50 CoveringCoveredByA set P

51 Comparison with other models Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 51 r-DisC M AX S UM M AX M IN k-medoids

52 Comparison with M AX M IN For a set of items P, we have proved that: 1.Let S be an r-DisC set and S* be an optimal M AX M IN set. Let and * be the M AX M IN distances of the two sets. Then, * ≤ 3. 2.Let S* be the optimal M AX M IN set of size k with M AX M IN distance equal to *. Let S be an r-DisC set with r = *. Then, |S| < k′, where k′ is the first integer larger than k for which the corresponding optimal M AX M IN set of P S* ′ has M AX M IN distance equal to λ* ′, with λ* ′ < λ*. Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 52

53 Zooming We want to change the radius r to r’ interactively and compute a new diverse set  r’ r zoom out Two requirements: 1.Support an incremental mode of operation: the new set S r’ should be as close as possible to the already seen result S r. Ideally, S r’ ⊇ S r for r’ r 2.The size of S r’ should be as close as possible to the size of the minimum r’-DisC diverse subset There is no monotonic property among the r-DisC diverse and the r’- DisC diverse subsets of a set of objects P (the two sets may be completely different) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 53

54 Size when moving from r to r’ The change in size of the diverse set when moving from r to r’ depends on the number of independent neighbors (for r’) in the “ring” around an object between the two radii. Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 54

55 Zooming Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 55

56 Zooming-In Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 56

57 Zooming-Out For zooming-out, we keep the independent items of S r and fill in the solution with items from uncovered areas. It holds that: 1.There are at most N items in S r \S r’ 2.For each item in S r \S r’, at most (B-1) items are added to S r’ (proof and various algorithms for keeping the size small in the thesis) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 57

58 Implementation We base our implementation on a spatial data structure (central operation: compute neighbors) We use an M-tree  We link together all leaf nodes (we visit items in a single left-to-right traversal of the leaf level to exploit locality)  We build trees using splitting policies that minimize overlap Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 58

59 Implementation Two implementations for our greedy approach  Grey-Greedy, White-Greedy Lazy variations for updating neighborhoods Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 59 Pruning Rule: A leaf node that contains no white objects is colored grey. When all its children become grey, an internal node is colored grey and becomes inactive. We prune subtrees with only “grey nodes”.

60 Performance Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 60 Many real and synthetic datasets General trade-off: Larger r → Smaller diverse set → higher cost Lazy variations of our algorithms further reduce computational cost The cost also depends on the characteristics of the M-tree (fat-factor) Smaller sizes for clustered data Cost Solution size

61 Diversity and Relevance Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 61 Similar diversity for the Basic and Greedy algorithms Greedy considers relevance and produces subsets of larger average weight Raising the dissimilarity condition improves average weight but minimum distance is decreased Also, we get larger subsets than in the diversity-only case

62 Zooming Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 62 Solution size Cost Jaccard distance among solutions Both requirements:  incremental (much smaller cost) and  small size (relative to computing it from scratch) Larger overlap among S r and S r’

63 Related Publications 1.M. Drosou and E. Pitoura, Multiple Radii DisC Diversity: Result Diversification based on Dissimilarity and Coverage (submitted) 2.M. Drosou and E. Pitoura, DisC Diversity: Result Diversification based on Dissimilarity and Coverage, in PVLDB, vol. 6, no.1, pp. 1324, 2012, VLDB Endowment (Best Paper Award) Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 63

64 Outline  Search Result Diversification: Introduction & Related Work  Content Diversification using Indices  DisC Diversity: Diversification based on Dissimilarity and Coverage  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 64

65 Visualizing Diverse Items Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 65 Selecting diversification parameters Zooming and Streaming Result Statistics

66 Visualizing Diverse Items Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 66

67 Related Publications 1.M. Drosou and E. Pitoura, POIKILO: A Tool for Evaluating the Results of Diversification Models and Algorithms, VLDB 2013 Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 67

68 Outline  Search Result Diversification: Introduction & Related Work  Content Diversification using Indices  DisC Diversity: Diversification based on Dissimilarity and Coverage  P OIKILO : Evaluating the Results of Diversification Models and Algorithms  Summary  Thesis contribution  Directions for future research Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 68

69 Thesis Contributions Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 69

70 Thesis Contributions Diversification based on dissimilarity and coverage  We introduced a novel diversity definition, called DisC diversity, based on using a radius r rather than a size limit k to select diverse items  We presented both a spatial and a graph model for our definition  We studied the weighted and multiple radii cases  We introduced incremental diversification to a new radius through zooming-in and zooming-out  We presented algorithms for locating DisC diverse subsets and derived bounds concerning the size of such subsets  We provided efficient implementations of our algorithms based on spatial index structures, namely the M-Tree Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 70

71 Thesis Contributions Visualizing and comparing diversification algorithms  We developed a system prototype, called “Poikilo”, providing implementations of a wide variety of diversification approaches to assist users in locating, visualizing and comparing diverse results Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 71

72 Directions for future research Short term plans  Diversification in Database Exploration  Interesting suggestions in database exploration are often similar  Also: exploit external sources  Diversification of Multiple Search Results  Exploit overlap among results of different queries  Use diversified results of past queries to answer new ones  Diversification of Keyword Search Results in Databases  Moving diversification to the ranking phase  Apply coverage-based definitions Long term plans  Diversification in a distributed setting  Place “diversification filters” on the overlay network to reduce computational and communication costs Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 72

73 Thesis Publications Journal Publications 1.M. Drosou and E. Pitoura, Multiple Radii DisC Diversity: Result Diversification based on Dissimilarity and Coverage (submitted) 2.M. Drosou and E. Pitoura, YmalDB: Exploring Relational Databases via Result Driven Recommendations, in VLDBJ (to appear) 3.M. Drosou and E. Pitoura, Diverse Set Selection over Dynamic Data, in IEEE TKDE (to appear) 4.M. Drosou and E. Pitoura, DisC Diversity: Result Diversification based on Dissimilarity and Coverage, in PVLDB, vol. 6, no.1, pp. 1324, 2012, VLDB Endowment (Best Paper Award) 5.M. Drosou and E. Pitoura, Search Result Diversification, in SIGMOD Record, vol. 39, no. 1, pp. 4147, 2010, ACM 6.M. Drosou and E. Pitoura, Diversity over Continuous Data, in IEEE Data Engineering Bulletin, vol. 32, no. 4, pp. 4956, 2009, IEEE Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 73

74 Thesis Publications Conference Publications 1.M. Drosou and E. Pitoura, Dynamic Diversification of Continuous Data, EDBT 2012 2.M. Drosou and E. Pitoura, R E D RIVE : Result Driven Database Exploration through Recommendations, CIKM 2011 3.K. Stefanidis, M. Drosou and E. Pitoura, PerK: Personalized Keyword Search in Relational Databases through Preferences, EDBT 2010 Workshop Publications 1.D. Souravlias, M. Drosou, K. Stefanidis and E. Pitoura, On Novelty in Publish/Subscribe Delivery, DBRank 2010 2.K. Stefanidis, M. Drosou and E. Pitoura, ‘‘You May Also Like’’ Results in Relational Databases, PersDB 2009 Demos 1.M. Drosou and E. Pitoura, POIKILO: A Tool for Evaluating the Results of Diversification Models and Algorithms, VLDB 2013 2.M. Drosou and E. Pitoura, YmalDB: A Result Driven Recommendation System for Databases, EDBT 2013 Marina Drosou, Preference and Diversity-based Ranking in Network-Centric Information Management Systems 74

75 75 Thank you!


Download ppt "Preference and Diversity-based Ranking in Network-Centric Information Management Systems PhD defense Marina Drosou Computer Science & Engineering Dept."

Similar presentations


Ads by Google