- 1 - Intro to Content Optimization Yury Lifshits. Yahoo! Research Largely based on slides by Bee-Chung Chen, Deepak Agarwal & Pradheep Elango.

Slides:



Advertisements
Similar presentations
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Advertisements

Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models Wei ChuSeung-Taek Park WWW 2009 Audience Science Yahoo! Labs.
Google News Personalization: Scalable Online Collaborative Filtering
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Large Scale Machine Learning for Content Recommendation and Computational Advertising Deepak Agarwal, Director, Machine Learning and Relevance Science,
Copyright © 2014 Criteo millions de prédictions par seconde Les défis de Criteo Nicolas Le Roux Scientific Program Manager - R&D.
BPT2423 – STATISTICAL PROCESS CONTROL
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
The Roles of Uncertainty and Randomness in Online Advertising Ragavendran Gopalakrishnan Eric Bax Raga Gopalakrishnan 2 nd Year Graduate Student (Computer.
 1  Outline  Model  problem statement  detailed ARENA model  model technique  Output Analysis.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.
Planning under Uncertainty
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.
1 Asim Ansari Carl Mela E-Customization. Page 2 Introduction Marketing Targeted Promotions List Segmentation Conjoint Analysis Recommendation Systems.
Lecture 5: Learning models using EM
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
A Payment-based Incentive and Service Differentiation Mechanism for P2P Streaming Broadcast Guang Tan and Stephen A. Jarvis Department of Computer Science,
Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu Department of Computing Science U. of.
1 Traffic Shaping to Optimize Ad Delivery Deepayan Chakrabarti Erik Vee.
Hybrid-ε-greedy for Mobile Context- Aware Recommender System Djallel Bouneffouf, Amel Bouzeghoub & Alda Lopes Gançarski Institut Télécom, Télécom SudParis,
Search Engines and Information Retrieval Chapter 1.
Particle Filtering in Network Tomography
Content Recommendation on Y! sites Deepak Agarwal Stanford Info Seminar 17 th Feb, 2012.
ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing Weijie Shi*, Linquan Zhang +, Chuan Wu*, Zongpeng Li +, Francis C.M. Lau*
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
Optimizing Marketing Spend Through Multi-Source Conversion Attribution David Jenkins.
Bug Localization with Machine Learning Techniques Wujie Zheng
Display & Remarketing What You Need to Know. PROPRIETARY AND CONFIDENTIAL / COPYRIGHT © 2013 BE FOUND ONLINE, LLC 2 WHAT IS DISPLAY?
Utilizing Call Admission Control for Pricing Optimization of Multiple Service Classes in Wireless Cellular Networks Authors : Okan Yilmaz, Ing-Ray Chen.
SOFTWARE / HARDWARE PARTITIONING TECHNIQUES SHaPES: A New Approach.
- 1 - Recommender Problems for Content Optimization Deepak Agarwal Yahoo! Research MMDS, June 15 th, 2010 Stanford, CA.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Content Optimization at Yahoo! Todd Beaupré. #1: 34 MM #1: 19 MM #1: 97 MM Re #1: 45 MM #1: 33 MM #1: 20 MM #1: 22 MM #1: 31 MM #1: 17.3 MM.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
1 Archiving Update June 9, 2003 Chuck Palsho President, NewsBank Media Services
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Ads Jim Jansen College of Information Sciences and Technology The Pennsylvania State University
Multiworld Testing Machine Learning for Contextual Decision-Making.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Creating Compelling Ads August 8, 2006 Case Studies.
1 Raghu Ramakrishnan Research Fellow Chief Scientist, Audience and Cloud Computing Yahoo! Purple Clouds:
Regression Based Latent Factor Models Deepak Agarwal Bee-Chung Chen Yahoo! Research KDD 2009, Paris 6/29/2009.
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Beyond Ranking: Optimizing Whole-Page Presentation
[xxxx] SEO Online Marketing for Business Catalyst Websites
Sitecore. Compelling Web Experiences Page 1www.sitecore.net Patrick Schweizer Director of Sales Enablement 2013.
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
SEARCH ENGINE OPTIMIZATION.
Learning Profiles from User Interactions
Data Mining: Concepts and Techniques
Intro to Content Optimization
Author: Kazunari Sugiyama, etc. (WWW2004)
Copyrights (H.Rashidi & E.Tsang)
Chapter 2: Evaluative Feedback
ISWC 2013 Entity Recommendations in Web Search
Chapter 2: Evaluative Feedback
Presentation transcript:

- 1 - Intro to Content Optimization Yury Lifshits. Yahoo! Research Largely based on slides by Bee-Chung Chen, Deepak Agarwal & Pradheep Elango

2 Outline High-level overview Explore/Exploit algorithm for Yahoo! Frontpage

3 High-Level Overview of Content Optimization

4 New Research Area Content optimization = optimizing publishing choices for every user visit under certain objective function

5 Publishing Choices Top stories Related stories Tweets, updates, trending queries/topics Headlines text, pictures, text length Ads Layout Modules, order of modules Menu items Balance between types/topics of content

6 Opportunities (User Visits) Time Context Search query Referrer, user session User demographics User history User interest profile User social graph (social targeting)

7 Objective function Clicks Time spent Engagement: comments, shares, sign-ups Actions on subsequent pages Ad revenue Off-line conversion Long-term objectives + Business rules constraints

8 Opportunity Users, queries, pages, … Item Inventory Articles, web page, ads, … Use an automated algorithm to select item(s) to show Get feedback (click, time spent,..) Refine the models Repeat (large number of times) Measure metric(s) of interest (Total clicks, Total revenue,…)

9 Some examples Simple version –I have an important module on my page, content inventory is obtained from a third party source which is further refined through editorial oversight. Can I algorithmically recommend content on this module? I want to drive up total CTR on this module More advanced –I got X% lift in CTR. But I have additional information on other downstream utilities (e.g. dwell time). Can I increase downstream utility without losing too many clicks? Highly advanced –There are multiple modules running on my website. How do I take a holistic approach and perform a simultaneous optimization?

10 Modeling: Key Components Offline (Logistic, GBDT,..) Feature construction Content: IR, clustering, taxonomy, entity,.. User profiles: clicks, views, social, community,.. Online (Fine resolution Corrections) (item, user level) (Quick updates) Explore/Exploit (Adaptive sampling) Initialize

11 Tasks of Content Optimization Understand content (Offline) Serve content to optimize our objectives (Online) Quickly learn from feedback obtained using ML/Statistics (Offline + Online) Constantly enhance our content inventory to improve future performance (Offline) Constantly enhance our user understanding to improve future performance (Offline + Online) Iterate

12 Science of Content Optimization Large scale Machine Learning & Statistics: Offline Models Online Models Collaborative Filtering Explore/Exploit

13 Explore/Exploit Algorithm for Yahoo! Frontpage Story Selection

14 Recommend applications Recommend search queries Recommend news article Recommend packages: Image Title, summary Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages

15 Problems in this example Optimize CTR on different modules together in a holistic way –Today Module, Trending Now, Personal Assistant, News, Ads –Treat them as independent? For a given module –Optimize some combination of CTR, downstream engagement and perhaps revenue.

16 Single Module CTR Optimization Problem Pick the top n items (stories) from an item pool of K items (stories) for each user visit to the Yahoo! homepage in order to maximize the number of clicks in the Today Module In general, items can be articles, ads, modules, configuration parameters of page layout One may also replace click with any performance metric observable with low latency

17  Only consider the first position - ~ 2/3 clicks happen at the first position - Pick the best one from K items (stories)  The single best one for all users - No personalization in this talk - Best means having the highest click-through rate (CTR)  How to solve this “simple” problem? - Can’t we just show (exploit) the item having the highest CTR? - We need to explore every available item (using some fraction of traffic) to estimate its CTR  Explore too little  Unreliable CTR estimates  Explore too much  Little traffic to show the best item now future  How much traffic should we allocate to each item now, in order to maximize the total number of clicks in the future (e.g., in a week) Simplified Problem Simplified Single Module Problem

18 Example Scenario 5 min intervals, 100 visits per interval One new story arrives every interval, story expires after 4 intervals Every story is either “strong” (100% CTR) or “weak” (0% CTR) New story is strong with probability 75%, weak with probability 25% What is the optimal strategy to allocate 100 views for the next interval between the current 4 stories?

19 Sequential Decision Problem time Item 1 Item 2 … Item K x 1 % page views x 2 % page views … x K % page views Determine (x 1, x 2, …, x K ) based on clicks and views observed before t in order to maximize the expected total number of clicks in the future t –1 t –2 t now clicks in the future

20 Modeling the Uncertainty, NOT just the Mean Simplified setting: Two items CTR Probability density Item A Item B We know the CTR of Item A (say, shown 1 million times) We are uncertain about the CTR of Item B (only 100 times) If we only make a single decision, give 100% page views to Item A If we make multiple decisions in the future explore Item B since its CTR can potentially be higher

21 Each curve is the 1st-position CTR of an item over time CTRs are estimated using 1% random data (See our WWW’09 paper for more information) CTR Curves of Some Items in Two Days

22 Characteristics of Our Application Non-stationary CTR –The CTR of each item changes over time Dynamic item pools short lifetimes –Items come and go with short lifetimes (~10hr) Batch serving –For scalability reasons, data is processed in batches (e.g., one batch per minute) –We need to provide a sampling plan for each batch (time interval)

23 Bayesian Explore/Exploit –Adaptation of Whittle (1988) to our problem setting With approximations to ensure computational feasibility With a time-series model to track the non-stationary CTR –It provides an approximately Bayes optimal solution –Development Bayes optimal solution to a simplified case: Two items, two intervals Near optimal solution to the general case by using the above solution as a building block

24 Bayesian Solution: Two Items, Two Intervals Two time intervals: t = 0 and t = 1 –Item P: We are uncertain about its CTR, p 0 at t = 0 and p 1 at t = 1 –Item Q: We know its CTR exactly, q 0 at t = 0 and q 1 at t = 1 xTo determine x, we need to estimate what would happen in the future Question: x What fraction x of N 0 views to item P x (1-x) to item Q t=0t=1Now time N 0 views N 1 views End x Obtain c clicks after serving x (not yet observed; random variable) Assume we observe c; we can update p 1 CTR density Item Q Item P q1q1 p1(x,c)p1(x,c) CTR density Item Q Item P q0q0 p0p0 If x and c are given, optimal solution: Give all views to Item P iff E[ p 1 (x,c) I x, c ] > q 1

25 Expected total number of clicks in the two intervals Gain(x, q 0, q 1 ) = Expected number of additional clicks if we explore the uncertain item P with fraction x of views in interval 0, compared to a scheme that only shows the certain item Q in both intervals Solution: argmax x Gain(x, q 0, q 1 ) The Two Item, Two Interval Case E[#clicks] at t = 0 E[#clicks] at t = 1 Item P Item Q Show the item with higher E[CTR]: E[#clicks] if we always show item Q Gain(x, q 0, q 1 ) Gain of exploring the uncertain item P using x

26 Approximate by the normal distribution –Reasonable approximation because of the central limit theorem Proposition: Using the approximation, the Bayes optimal solution x can be found in time O(log N 0 ) Two Items, Two Intervals: Normal Approximation

27 –Apply Whittle’s Lagrange relaxation (1988) to our problem setting Relax  i z i (c) = 1, for all c, to E c [  i z i (c)] = 1 Apply Lagrange multipliers (q 1 and q 2 ) to enforce the constraints –We essentially reduce the K-item case to K independent two-item sub-problems (which we have solved) Bayesian Solution: General Case From two items to K items –Very difficult problem: Note: c = [c 1, …, c K ] c i is a random variable representing the # clicks on item i we may get

28 Bayesian Solution: General Case From two intervals to multiple intervals –Approximate multiple intervals by two stages Non-stationary CTR –Incorporate a time-series model (WWW’09) into our solution Coarse-grained personalization: –Partition user-feature space into segments (e.g., decision tree) –Explore/exploit most popular items for each segment

29 Simulation Experiment: Different Traffic Volume Simulation with ground truth estimated based on real data (WWW’09) Setting:16 live items per interval Scenarios: Web sites with different traffic volume (x-axis)

30 Simulation Experiment: Different Sizes of the Item Pool Simulation with ground truth estimated based on real data Setting: 1000 views per interval; average item lifetime = 20 intervals Scenarios: Different sizes of the item pool (x-axis)

31 Experimental Result: Controlled Bucket Test Bayes2x2, B-UCB1 and  -Greedy were implemented in production and used to serve 3 random samples of real users on a Yahoo! site

32 Characteristics of Different Schemes Why the Bayesian solution has better performance Characterize each scheme by three dimensions: –Exploitation regret: The regret of a scheme when it is showing the item which it thinks is the best (may not actually be the best) 0 means the scheme always picks the actual best It quantifies the scheme’s ability of finding good items –Exploration regret: The regret of a scheme when it is exploring the items which it feels uncertain about It quantifies the price of exploration (lower  better) –Fraction of exploitation (higher  better) Fraction of exploration = 1 – fraction of exploitation

33 Characteristics of Different Schemes Exploitation regret: Ability of finding good items (lower  better) Exploration regret: Price of exploration (lower  better) Fraction of Exploitation (higher  better) Exploration Regret Exploitation fraction Exploitation Regret Good

34 Summary Explore/exploit is an effective strategy to maximize CTR in a content display system Ongoing research –Explore/exploit for personalized recommendation, page layout optimization, and in the presence of business constraints

35 Thank You!! Questions