Download presentation
Presentation is loading. Please wait.
1
Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo
TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo
2
Outline What is “TiVo” ? Why Suggestions?
Collaborative filtering background TiVo collaborative filtering data cycle Server-side learning Previous Work Contributions
3
Contributions Large fielded system Distributed architecture
Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours Distributed architecture Server: Throttle-able Clients do bulk of work Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server
4
What is “TiVo” ? TiVo = set-top TV box + program-guide service
Pause & rewind live TV Linux OS Viewers can rate shows Suggestions Q4 1999
5
Why Suggestions? Connect users to shows they’ll like
Predict degree to which viewer will like TV show Produces ranked list of upcoming shows Records shows if disk space is available
6
Filtering Background Recommendation Systems
Content-based: use “intrinsic” features such as genre, cast, director, writers, age, channel-type,… Combined, Cascaded Collaborative filtering: use other people’s ratings
7
Content isn’t sufficient
Genres are few Text length is small
8
Data cycle Rated shows in sorted order Thumbs Profile on TiVo
1: Collecting Feedback: Thumbs up/dn Recorded Rated shows in sorted order Thumbs Profile on TiVo Client box 5. Use correlations and Thumbs profile to rate shows 2. TiVo calls server uploads entire anonymized profile Correlation pairs on client No persistent state (even in randomized form) on server for each viewer Random ID generated for profile and stored on server Correlation pairs <s1,s2,r> on server 4. Download pairs during some client-initiated calls 3. Server- side learning
9
Collaborative Filtering Model
k Nearest Neighbor over other rated correlated shows Use Pair-wise Pearson correlation Adjusted correlation for low support Use weighted linear combination
10
1. Collecting Feedback Explicit: Implicit: Thumbs up, down: -3 ... +3
User-initiated recording thumbs
11
2. Privacy and Data Upload
TiVo calls server daily Entire profile uploaded and given temp id Server deletes old profiles: sliding window
12
3.1 Server-side scaling 300,000 unique shows /week
10^11 pairs of shows 3M users Average of 90 thumbs / user: > 10^8 thumbs (ratings) Ratings are sparse in the pair space Don’t need to predict for very unpopular pairs
13
3.2 Server-side Learning Building pair-wise item/item correlations on server Use simple Pearson pair-wise correlation 7 ratings levels per show [-3 … +3] Only need to maintain 7 * 7 array of counts per pair Efficient: CPU, memory Compute r-to-z transform to computer confidence interval Support-penalized degree of correlation: lower bound of confidence-interval Distinguishes r = 0.8 for S=10 versus S=1000
14
3.3 Throttled Server-side Architecture
Log Collector 1 Boxes K Log Collector m Boxes 100K(m-1) K m By-series Counter 1 Series 0..30K By-series Counter n Series 30k(n-1)..30kn 1: By-series-pair Counter and Correlations Calc. P: By-series-pair Counter and Correlations Calc. Transmit correlation pairs to TiVo Clients
15
3.4 Server-side throttling
min_single (150) min_pair (100) Throttle-able: More HW available Increasing TiVo population Go deeper into distribution tail
16
Details Pearson r Weighted average r-to-z transform (Fisher)
Standard: Lower bound of confidence interval:
17
4. Download to clients 28K pairs sent to client (320kb)
Correl. between old shows don’t change fast New Shows: want to do it faster
18
5. Client-side processing
Ratings must not cause video glitching! 2am: TiVo re-rates all shows Collab: k-nearest neighbor Content-based: Naïve Bayes
19
Previous Work User-user or item-item - Sarwar et al Form of model
k-nearest neighbor, Bayes nets (Breese et al.), Factor Analysis (Canny) Similarity/distance function Pearson (subsumes cosine) TFIDF corrections (Salton et al.) User amplification Combination functions: k-NN, Bayes nets.. Evaluation Criteria: MAE, Spearman rank correl.
20
Contributions Large fielded system Distributed architecture
Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours Distributed architecture Server: Throttle-able Clients do actual suggestion calculations Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.