TARGETED, NOT TRACKED: CLIENT-SIDE SOLUTIONS FOR PRIVACY-FRIENDLY BEHAVIORAL ADVERTISING Janice Tsai Misha Bilenko Matt Richardson
Anonymous User Sees This Ad
Known User Sees A Different Ad
Personalized Advertising Today User is tracked: history of activity is stored On ad platform’s server and/or cookie History is processed into profile Reduced representation for quick lookup Can also be communicated or sold across parties Profile is used for ad targeting Total targeting revenue expected $2.6B by 2014 (eMarketer 2011) Supported by all major ad platforms
Talk Outline Client-side vs. server-side profiles Client-only Profiles (CoP): balancing privacy and personalization Experiments: client- vs. server-side revenue difference
Personalized Advertising is Ubiquitous Driven by economics Publishers, platforms: CPM rates 2.7x higher [Beales ‘10] Advertisers: 6x gain in CTR [Yao et al. ‘08] What about users? “It’s a little creepy, especially if you don’t know what’s going on” [NYT ‘11] What’s going on is complex and misunderstood [McDonald ’10-11] Ad industry: self-regulation, users can opt out via Browsers: Do Not Track (FF, IE, Safari), KeepMyOptOuts (Chrome) Privacy advocates: self-regulation is insufficient W3C Tracking Protection Working Group Legislation: multiple bills/hearings in US; European e-Privacy directive
Personalized Advertising Mechanics User information drives market efficiency Users have no knowledge/control of their information First vs. third-party distinction is increasingly non-trivial Publisher Ad Platform User Ad platform … … Advertiser Aggregator
Server-side User Profiles in Advertising (query or url)
Server-side User Profiles in Advertising (query or url) (ad)
Server-side User Profiles in Advertising (query or url) (ad)
Problem: No User Control over Data Users do not know what is stored, where and why Use, retention, sharing Users cannot edit or delete their behavioral data Deleting cookies insufficient: re-identification, LBOs, local storage Opting out ≠ having your data purged Most users tracking invasive when asked [McDonald-Cranor’10] But don’t do much about it: Do Not Track adoption in Firefox: 4-6%
Current “Do Not Track” Proposals Provide a mechanism for users to prevent being tracked Existing browser implementations HTTP headers, opt-out cookies Browser contacts server but notifies it that user does not want to be tracked. User must trust service providers Domain blocking / TPL lists Browser doesn’t send request to certain domains Tracking vs. targeting: collection vs. usage “All or nothing” approach: privacy = no targeting Undesirables extremes: inefficiency vs. loss of revenue
Client-Side Tracking Tracking is performed solely on client machine User retains control, targeting is still possible User can delete or edit profile Services don’t retain user history No back-end sharing of user data between companies Avoid issues around retention policies, deleting all copies, etc. Studies indicate users care more about being tracked than about being targeted
Existing Plugin-Based Approaches Privad, Adnostic, RePRIV User installs client plugin which collects user data and communicates with ad network Difficulties Requires user to install plugin Requires significant changes to existing ad serving infrastructure Hard to manage click fraud, ad budgets Bandwidth (e.g., 10x ads sent to client) Targeting algorithms baked into plugin may slow innovation Targeting on client = less information than targeting on server
Alternative: Client-Only Profiles (CoP) Profile stored in cookie on client machine Browser sends profile to server upon page request Server returns page and updated profile in cookie Server does not log user activity
Client-only Profiles
+ No plugins (AdNostic, RePRIV, Privad: users install plugins) + No major changes to serving infrastructure + Targeting server-side (advanced features/algorithms) + Profile update server-side (advanced features/algorithms) - Must trust ad platform to comply with policy and not retain Debatable proposition for security community… …but Do Not Track already makes the same assumption What will it cost compared to server-side tracking?
Comparison of Tracking Approaches
Incremental Profile Updates: Task How much does incremental update hurt? Compare to profiles constructed on server from full history May depend on the task (personalizing ads, content, search results) Representative task: predicting future ad clicks Discriminates long-term user interests Can be used for ad selection, ranking, CTR prediction, auction Bid Increments Advertiser specifies an increment to their bid if the user has the keyword in their profile
Incremental Profile Updates: Method [Bilenko and Richardson KDD-2011] Algorithm based on machine learning Features based on behavior frequency/recency, context, etc. ML function predicts p(click|keyword) using these features Select top-k keywords for profile Keyword value is incremental utility of ads not covered in profile so far Leads to a submodular optimization problem Solved by efficient, accurate approximate algorithm
Incremental Updates: Study Two months of activity on Bing search engine 2.4 million users (randomly sampled from total population) Train predictor using first 6 weeks Cookie contains Profile: Top-k keywords by predicted value Cache: LRU policy Metric Fraction of future clicks in profile (proportional to revenue gain)
Incremental Updates: Results Retains 97-99% of gain vs. server-side tracking Requires only keywords in profile (50 in cache)
Conclusions Client-side tracking balances privacy and market efficiency Possible approach: CoP, which Ensures user control over tracking Requires insignificant change to existing infrastructure Retains 97+% of revenue gains by ad targeting Should Do Not Track distinguish client-and server-side tracking? 1 st vs. 3 rd party are increasingly difficult to differentiate
THANKS!
Backup Slide If I have to trust the server anyway, why not trust it to store my profile as well? Trusting not to store is a lower bar than trusting to properly handle profile Storing profile on server = Trusting any team with access to your profile to: Know the policies Correctly implement things like opt-out, retention, publication. Either never copy your history, or ensure your edits/deletions are propagated through to all copies. Not to share it with any other team that might not know these things Storing profile on client = Trusting just the team that receives the profile to use it and throw it away.
1 st vs. 3 rd party Distinction is getting increasingly muddled 1 st party data collection is becoming pervasive 3 rd party collection can be tightly controlled by advertiser.
Regulatory Interest in Behavioral Advertising United States Federal Trade Commission has proposed a regulatory framework calling for Do Not Track solutions Legislation calls for Do Not Track solutions US Senate, US House of Representatives, California Legislature Europe Notice and Consent prior to depositing cookies
30 Do Not Track Solutions
31 Do Not Track Solutions DNT SolutionApple SafariGoogle Chrome Microsoft IEMozilla Firefox Blocking Traffic Opt-Out Cookie HTTP Header Do Not Track solutions are built into each browser with the exception of Google Chrome where the Opt-Out cookies are a part of a browser extension.