Group Recommendation: Semantics and Efficiency

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
Risk Modeling The Tropos Approach PhD Lunch Meeting 07/07/2005 Yudistira Asnar –
1 Top-K Algorithms: Concepts and Applications by Demetris Zeinalipour Visiting Lecturer Department of Computer Science University of Cyprus Department.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
1 Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Best-Effort Top-k Query Processing Under Budgetary Constraints
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Active Learning and Collaborative Filtering
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
THE ROLE OF ADAPTIVE ELEMENTS IN WEB-BASED SURVEILLANCE SYSTEM USER INTERFACES RICARDO LAGE, PETER DOLOG, AND MARTIN LEGINUS
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Recommendation system MOPSI project KAROL WAGA
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
+ Efficient network aware search in collaborative tagging Sihem Amer Yahia, Michael Benedikt, Laks V.S. Lakshmanan, Julia Stoyanovich Presented by: Ashish.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Exploiting Group Recommendation Functions for Flexible Preferences.
COLLABORATIVE SEARCH TECHNIQUES Submitted By: Shikha Singla MIT-872-2K11 M.Tech(2 nd Sem) Information Technology.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Fast Indexes and Algorithms For Set Similarity Selection Queries M. Hadjieleftheriou A.Chandel N. Koudas D. Srivastava.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Fast Pseudo-Random Fingerprints Yoram Bachrach, Microsoft Research Cambridge Ely Porat – Bar Ilan-University.
1 VLDB, Background What is important for the user.
Gleb Skobeltsyn Flavio Junqueira Vassilis Plachouras
Neighborhood - based Tag Prediction
Algorithms for Large Data Sets
Efficient Join Query Evaluation in a Parallel Database System
Top-k Query Processing
Popular Ranking Algorithms
Movie Recommendation System
Probabilistic Latent Preference Analysis
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Efficient Processing of Top-k Spatial Preference Queries
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Group Recommendation: Semantics and Efficiency Sihem Amer-Yahia (Yahoo! Labs), Senjuti Basu-Roy (Univ. of Arlington), Ashish Chawla (Yahoo! Inc), Gautam Das (Univ. of Arlington), and Cong Yu (Yahoo! Labs).

Recommendation Individual User Recommendation : “Which movie should I watch?” “What city should I visit?” “What book should I read?” “What web page has the information I need?” 2

Group Recommendation 3 Individual User Recommendation User seeking intelligent ways to search through the enormous volume of information available to her. How to Recommend? Movies – for a family! Restaurants – for a work group lunch! Change the appearance Places to visit – using a travel agency! Solution : Group Recommendation Helps socially acquainted individuals find content of interest to all of them together. 3

Existing Solution No prior work on efficient processing for Group Recommendation Existing solutions aggregate ratings (referred to as relevance) among group members Preference Aggregation: aggregates group members’ prior ratings into a single virtual user then computes recommendations for that user Rating Aggregation: aggregate individual ratings on the fly using Average Least Misery: computes min rating 4

Why Is Rating Aggregation Not Enough? Task: recommend a movie to group G ={u1, u2 ,u3} relevance (u1,”God Father”) = 5 relevance (u2, “God Father”) = 1 relevance (u3, ”God Father”) = 1 relevance (u1, ”Roman Holiday”) = 3 relevance (u2, “Roman Holiday”) = 3 relevance (u3, ”Roman Holiday”) = 1 Average Relevance and Least Misery fail to distinguish between “God Father” and “Roman Holiday” But, group members agree more on “Roman Holiday” with higher ratings Difference in opinion (Disagreement) between members may be important in Group Recommendation Semantics. 5

Outline Motivation Modeling & Problem Definition Algorithms Optimization Opportunities Experimental Evaluation Conclusion & Future Work 6

Semantics in Group Recommendation Relevance (average or least misery) and Disagreement in a recommendation’s score computed using a consensus function Group G ={u1, u2 ,u3} relevance(u1,”God Father”) = 5, relevance(u2, “God Father”) = 1, relevance=(u3,” God Father”) = 1 Average Pair-wise Variance = (|(5-1)|+|(1-1)|+|(1-5)|)/ 3 = [(5 -2.33)2 + (1-2.33)2 +(1-2.33)2] / 3 score (G, “God Father”) = w1 x .86 + w2 x (1-.2) (after normalization) 7

Problem Definition Top-k Group Recommendation: Given a user group G and a consensus function F, return a list of k items to the group such that each item is new to all users in G and the returned k- items are sorted in decreasing value of F 8

Efficient Recommendation Computation Average and Least Misery are monotone Sort user relevance list in decreasing value Apply Threshold Algorithm TA i1,2 i3,2 i2,2 i1,0 i4,2 i2,0 i4,0 ILu1 ILu2 For a user group G = {u1, u2}, 4 Sorted Accesses are required to compute top-1 item for Group Recommendation. Pruning operates only on the relevance component of consensus function. How to do pruning on Disagreement component also? 9

Role of Disagreement Lists Pair-wise disagreement lists are computed from individual relevance lists Disagreement lists are sorted in increasing values Items encountered in disagreement lists play a significant role in attaining early stoppage ILu1 ILu2 DL(u1,u2) i1,2 i3,2 i2,2 i1,0 i4,2 i3,0 i2,0 i4,0 Top-1 item for u1, u2 can be obtained after 3 Sorted Accesses if DL(u1,u2) is present. Without DL(u1,u2) 4 Sorted Accesses is required. 10

Outline Motivation Modeling & Problem Definition Algorithms Optimization Opportunities Experimental Evaluation Conclusion & Future Work 11

Relevance Only (RO) Algorithm Task : Recommend Top-1 item for a group of 3 users, u1, u2, and u3. Input: 3 Relevance Lists (ILu1 ,ILu2 ,ILu3 ) Relevance lists are sorted in decreasing scores Lists are chosen in round-robin fashion during Top-k computation. Performance is calculated by computing no of Sorted Access ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i1,1.33 Threshold is : 1.93 12

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i1,1.33 i2,1.33 Threshold is : 1.86 13

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i1,1.33 i2,1.33 i3,1.33 Threshold is : 1.73 14

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i1,1.33 i2,1.33 i3,1.33 Threshold is : 1.73 15

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i4,1.6 i1,1.33 i2,1.33 i3,1.33 16 Threshold is : 1.73

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4, i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i4,1.6 i1,1.33 i2,1.33 i3,1.33 17 Threshold is : 1.73

Relevance Only (RO) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i4,1.6 i1,1.33 i2,1.33 i3,1.33 18 Threshold is : 1.73

Relevance Only (RO) Algorithm After 8 Sorted Accesses (3 on IL(u1), 3 on IL(u2) and 2 on IL(u3)) ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i4,3 i2,3 Top-k Buffer i4,1.6 i1,1.33 i2,1.33 i3,1.33 Threshold is : 1.6 IT STOPS! Top-1 Item is i4 19

Full Materialization (FM) Algorithm Input: 3 Relevance Lists (ILu1 ,ILu2 ,ILu3 ) and 3 pair-wise materialized disagreement lists (DLu1,u2, DLu1,u3, DLu2,u3) Relevance lists are sorted in decreasing scores and Disagreement lists are sorted in increasing disagreement scores. ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILU3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 Top-k Buffer 20 i1,1.33 Threshold is : 1.93

Full Materialization (FM) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 i1,1.33 i2,1.33 Threshold is : 1.86 Top-k Buffer 21

Full Materialization (FM) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 i1,1.33 i2,1.33 i3,1.33 Top-k Buffer Threshold is : 1.73 22

Full Materialization (FM) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 i4,1.6 i2,1.33 i3,1.23 i4,1.33 Threshold is : 1.73 Top-k Buffer 23

Full Materialization (FM) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 i4,1.6 i2,1.33 i3,1.23 i4,1.33 Top-k Buffer Threshold is : 1.66 24

Full Materialization (FM) Algorithm ILu1 i1,4 i3,4 i4,4 i2,2 ILu2 i2,4 i4,4 i1,2 i3,2 ILu3 i3,3 i1,3 i2,3 DLu1,u2 i4,0 i3,2 i2,2 i1,2 DLu1,u3 i4,1 i3,1 i2,1 i1,2 DLu2,u3 i4,1 i1,1 i3,1 i2,1 After 6 Sorted Accesses(1 on each list) i4,1.6 i2,1.33 i3,1.23 i4,1.33 Top-k Buffer Threshold is : 1.6 Score(i4) = Threshold = 1.6 IT STOPS! Top-1 item is i4 25

Outline Motivation Modeling & Problem Definition Algorithms Optimization Opportunities Experimental Evaluation Conclusion & Future Work 26

Optimization Opportunities Partial Materialization: Given a space budget to materialize only a subset of n(n-1)/2 disagreement lists, which m lists to materialize? Threshold Sharpening in Recommendation Computation: How to sharpen threshold during top-k computation? 27

Why Partial Materialization ? A set of 10,000 users has 49995000 disagreement lists Only 10% of the disagreement lists can be materialized, given a space budget Problem : Which 4999500 lists should we choose so that those gives “maximum benefit” during query processing? Intuition : Materialize only those lists that significantly improves efficiency. Recommendation Algorithm needs to be adapted to it (refer to as PM in the paper) 28

Disagreement lists Materialization Algorithm Sort the table with decreasing difference (#SAs) and consider first m rows User Pair #SAs without disagreement list #SAs with disagreement lists Difference in #SAs {u1,u2} 200 100 {u3,u4} 290 195 95 {u10,u9} 170 70 {u6,u7} 230 190 40 {u2,u3} 175 145 30 {u5,u6} 179 21 {u7,u8} 120 20 -- m 29

Threshold Sharpening Can we exploit the dependencies between relevance and disagreement lists and sharpen thresholds in FM, RO and PM algorithms? ILu1 i1,0.5 i3,0.5 --- ILu2 i2,0.5 i3, 0.4 --- DLu1,u2 i3,0.2 I1,0.3 --- Threshold = 1.3 Maximize (iu1+ iu2)/2 + (1- |iu1-iu2|) s.t. 0 <= iu1 <= 0.5 0.2 <=| iu1 – iu2 |<= 1 New Threshold = 1.2 30

Outline Motivation Modeling & Problem Definition Algorithms Optimization Opportunities Experimental Evaluation Conclusion & Future Work 31

Experiments 32 Used Dataset User Studies Performance Experiments MovieLens data set 71,567 users, 10,681 movies, 10,000,054 ratings User Studies Compare the effectiveness of proposed Group Recommendation algorithms with existing approaches using Amazon Mechanical Turk users. Small and large groups of similar, dissimilar and random users are formed. Algorithms Average Relevance Only (AR), Least Misery Only (LM), Consensus with Pair-wise Disagreements (RP), Consensus with Disagreement Variance (RV) are compared Performance Experiments Performance (no of sorted accesses) comparison of FM, RO and PM varying group size, similarity and no of returned items. Effectiveness of partial Materialization Effectiveness of Threshold Sharpening 32

Disagreement is important for Dissimilar User Group Misery Only (MO) is the best model for similar user group Disagreement is important for dissimilar users. Consensus with Disagreement Variance (RV80) is the best model there. 33

Summary of Performance Results Presence of Disagreement lists improves performance for dissimilar user groups Sometime Partial Materialization (PM) is the best solution For the same query, different disagreement lists contribute differently in Top-k processing Optimization during threshold calculation improves overall performance Animations take out 34

Performance Results Less sorted accesses (SAs) are required for more similar user groups Disagreement lists are important for Dissimilar user groups FM is the best performer for very dissimilar user groups, RO is the best algorithm for very similar user groups. Sometimes only few disagreement lists attain the best performance. Therefore Partial Materialization is important Optimization during threshold calculation always achieves better performance (less #SAs) than without optimization case. 35

Outline Motivation Modeling & Problem Definition Algorithms Optimization Opportunities Experimental Evaluation Conclusion & Future Work 36

Novel optimization opportunities are present. Conclusion Disagreement impacts both quality and efficiency of Group Recommendation. Threshold algorithm, TA can be adapted to compute group recommendations. Novel optimization opportunities are present. We ask user to rate 20-30 movies. We compute similarity from there. 37

Ongoing and Future Work Can disagreement lists be optimized such that they consume less space and contain same information? Can group recommendation algorithms be adapted to work with those optimized lists? We ask user to rate 20-30 movies. We compute similarity from there. 38

Thank You ! 39

Modeling Semantics in Group Recommendation Distinguish Relevance and Disagreement in a recommendation’s score Combine Average Relevance with Disagreement Combine Least Misery with Disagreement Disagreement = difference in relevance among group members Group G ={u1, u2 ,u3} relevance(u1,”God Father”) = 5, relevance(u2, “God Father”) = 1, relevance=(u3,” God Father”) = 1 Average Pair-wise Disagreements (G,” God Father”) = (|(5-1)|+|(1-1)|+|(1-5)|)/ 3 Disagreement Variance Disagreement Variance (G,” God Father”) = [(5 -2.33)2 + (1-2.33)2 + (1-2.33)2] / 3 40

Consensus Function and Problem Definition A weighted sum of relevance and disagreement such that for each item, its group relevance is maximized and group disagreement is minimized. score (G, “God Father”) = w1 x .86 + w2 x (1-.2) (after normalization) Top-k Group Recommendation: Given a user group G and a consensus function F, return a list of k items to the group such that each item is new to all users in G and the returned k-items are sorted in decreasing value of F. 41