Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department.

Similar presentations


Presentation on theme: "Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department."— Presentation transcript:

1 Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department of Systems & Information Engineering University of Virginia

2 Running Example: News Recommendation Exploitation-Exploration dilemma: Focus on information that raises user interest Explore new information to improve user experience in the long run 1CS@UVaCollaborative Bandits

3 Exploitation VS. Exploration 2 Selection t CS@UVaCollaborative Bandits Find some icon to make these three different, we should reduce such unnecessary confusion

4 Exploitation VS. Exploration 3CS@UVaCollaborative Bandits SelectionArticle 1 0.3 t

5 Exploitation VS. Exploration 4 t CS@UVaCollaborative Bandits SelectionArticle 1Article 2 0.30.2 0.3

6 Exploitation VS. Exploration 5 t CS@UVaCollaborative Bandits SelectionArticle 1Article 2Article 3 0.30.20.3 0.2 0.1

7 Exploitation VS. Exploration 6 t CS@UVaCollaborative Bandits SelectionArticle 1Article 2Article 3? 0.30.20.3 0.2 0.1

8 Exploitation VS. Exploration 7 t CS@UVaCollaborative Bandits SelectionArticle 1Article 2Article 3Article 1 0.30.20.30.4 0.30.20.3 0.101

9 Exploitation VS. Exploration 8 t Exploit too early and too much! CS@UVaCollaborative Bandits SelectionArticle 1Article 2Article 3Article 1 … 0.30.20.30.40.50.550.6 0.30.2 … 0.1010.1 …

10 Running Example: News Recommendation 9 Multi-armed Bandit: Player  Recommendation system Arms  News articles (with side information) Reward  User Click CS@UVaCollaborative Bandits

11 A Collaborative Environment Motivation: Cold start problem in Personalized systems Bandits are not independent --Social influence among users: e.g., content and opinion sharing among friends in social network Solution: Model dependency among users 10CS@UVaCollaborative Bandits

12 Related work LinUCB [1,2] : --Use ridge regression and Upper Confidence Bound to balance exploitation and exploration. --Bandits/users are independent. 11CS@UVaCollaborative Bandits Move this after page 9 to explain state-of-the-art bandit solution is contextual bandit And do not name this slide as related work, you can refer to it as “state-of-the-art”

13 Related work GOB.Lin [5] : --Use ridge regression and Upper Confidence Bound to balance exploitation and exploration (as in LinUCB). --Graph Laplacian based regularization upon ridge regression to model dependency. --Similar users are assumed to share similar parameters. 12CS@UVaCollaborative Bandits Create a figure to illustrate the idea, just like what you have in the next slide, show a social network and demonstrate how does network-based regularization work

14 Related work CLUB [4] : --Adaptively cluster users into groups according to the closeness of estimated bandit models --Threshold to delete edges is based on confidence balls of the users’ models. 13CS@UVaCollaborative Bandits

15 Model Assumption A collaborative assumption: R-sub-Gaussian 14CS@UVaCollaborative Bandits Move this slide after the next slide, which supposes to talk about context-free case

16 Model Assumption 15CS@UVaCollaborative Bandits Weighted average of expected reward ? Those icons stand for news article? Can you find better icons? Change those colored balls to user icons Here we do not need to make it contextual, just put the estimated reward is fine

17 Model Assumption 16CS@UVaCollaborative Bandits I guess you misunderstood my point: using the reward symbol in the previous slide, and use context vector in this slide, since only context vector helps us generalize across arms

18 Exploitation Objective Function: Parameter Estimation: Ridge Regression Information Propagation among users through relational matrix W 17CS@UVaCollaborative Bandits

19 Exploration UCB-type exploration strategy Every iteration: Estimated reward (exploitation) Confidence interval (exploration) Upper Confidence Bound 18CS@UVaCollaborative Bandits

20 Exploration Exploration parameter (upper bound of parameter estimation error) 19CS@UVaCollaborative Bandits

21 Arm selection strategy Exploitation vs. Exploration Exploitation Exploration 20CS@UVaCollaborative Bandits

22 Algorithm-CoLin 21CS@UVaCollaborative Bandits

23 Regret Analysis Regret: optimization target Cumulated regret Regret at time tOptimal reward Received reward 22CS@UVaCollaborative Bandits Regret is not your optimization target, but one typical performance metric

24 Regret Analysis 23CS@UVaCollaborative Bandits

25 Regret Analysis 24CS@UVaCollaborative Bandits

26 Regret Analysis Compare with GOB.Lin: GOB.Lin : --Use graph Laplacian based model regularization to model dependency. -- Assumes the neighboring bandit parameters to be close to each other -- Only capture connectivity of the network 25CS@UVaCollaborative Bandits Bring back the illustration for GOB.Lin in slide 12 to remind the audience what this baseline is

27 Regret Analysis 26CS@UVaCollaborative Bandits

28 Experimental Results Synthetic Dataset Convergence of cumulated regret o Simulated 1000 articles, 100 users, each with a 5- dimensional feature vector o W is constructed based on the cosine similarity of user features 27 CoLin CS@UVaCollaborative Bandits

29 Experimental Results Synthetic Dataset Accuracy of bandit parameter estimation o Simulated 1000 articles, 100 users, each with a 5- dimensional feature vector o W is constructed based on the cosine similarity of user features 28 CoLin CS@UVaCollaborative Bandits

30 Experimental Results Yahoo! Today Module Normalized CTR on Yahoo dataset o 10 days data from Yahoo! Today Module o 45,811,833 user visits o Users are clustered into 160 groups o User relation matrix is constructed according to the similarity of centroids of user groups CoLin 29CS@UVaCollaborative Bandits

31 Experimental Results LastFM & Delicious LastFM Dataset o LasfFM: Extracted from music steaming service Last.fm o Delicious: Extracted from social bookmark sharing service Delicious o Both contain user friendship information(around 2000 users) o Users are clustered into groups(200) using graph-cut CoLin 30CS@UVaCollaborative Bandits

32 Experimental Results LastFM & Delicious o LasfFM: Extracted from music steaming service Last.fm o Delicious: Extracted from social bookmark sharing service Delicious o Both contain user friendship information(around 2000 users) o Users are clustered into groups(200) using graph-cut Delicious Dataset CoLin 31CS@UVaCollaborative Bandits

33 Experimental Results Effectiveness of Collaboration Rank user clusters by number of observations Group 1 ( Learning Bucket ): Top 50 user clusters Group 2 ( Testing Bucket ): The 50 user clusters that are most connected to users in Group 1, among the bottom 100 clusters Warm-Start Setting: --1. Run algorithms on the learning bucket to estimate parameters for both group of users. --2.run and evaluate algorithms on the testing bucket. Cold-Start Setting: --Directly run and evaluate the bandit algorithms on the testing bucket. 32CS@UVaCollaborative Bandits

34 Experimental Results Effectiveness of Collaboration LastFM Dataset o Compare the reward difference between Warm- Start setting and Cold-Start setting: (warm-cold) o In LinUCB, (warm- cold) is always 0 (No collaboration) CoLin 33CS@UVaCollaborative Bandits

35 Experimental Results Effectiveness of Collaboration Delicious Dataset o Compare the reward difference between Warm- Start setting and Cold-Start setting: (warm-cold) o In LinUCB, (warm- cold) is always 0 (No collaboration) CoLin 34CS@UVaCollaborative Bandits

36 Experimental Results Effectiveness of collaboration: User-based Analysis Delicious Dataset Improved user: the user who is served with improved recommendations from a collaborative bandit algorithm than those from isolated LinUCBs. 35CS@UVaCollaborative Bandits

37 Experimental Results Effectiveness of collaboration: User-based Analysis Improved user: the user who is served with improved recommendations from a collaborative bandit algorithm than those from isolated LinUCBs. 36CS@UVaCollaborative Bandits LastFM Dataset

38 Future Work Dynamically estimate the graph structure in the collaborative bandits setting Decentralize the computation by utilizing the sparse graph structure CS@UVaCollaborative Bandits37

39 Acknowledgement Thank the conference for awarding the travel grant Thank the NSF grant IIS-1553568 for supporting this work CS@UVaCollaborative Bandits38

40 References [1]. L. Li, W. Chu,J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of 19th WWW, pages 661–670. ACM, 2010. [2]. W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011. [3]. Y. Abbasi-yadkori, D.Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In NIPS, pages 2312–2320. 2011. [4]. C. Gentile, S. Li, and G. Zappella. Online clustering of bandits. In Proceedings of the 31st International Conference on Machine Learning, 2014 [5]. N. Cesa-Bianchi, C. Gentile,and G. Zappella. A gang of bandits. In Pro. NIPS, 2013. [6]. J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla. Efficient Thompson sampling for online matrix-factorization recommendation. In NIPS, pages 1297–1305, 2015. 39CS@UVaCollaborative Bandits

41 Contextual Bandits 40 … or CS@UVaCollaborative Bandits


Download ppt "Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department."

Similar presentations


Ads by Google