Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo

Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo
TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo

Outline What is “TiVo” ? Why Suggestions?
Collaborative filtering background TiVo collaborative filtering data cycle Server-side learning Previous Work Contributions

Contributions Large fielded system Distributed architecture
Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours Distributed architecture Server: Throttle-able Clients do bulk of work Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server

What is “TiVo” ? TiVo = set-top TV box + program-guide service
Pause & rewind live TV Linux OS Viewers can rate shows Suggestions Q4 1999

Why Suggestions? Connect users to shows they’ll like
Predict degree to which viewer will like TV show Produces ranked list of upcoming shows Records shows if disk space is available

Filtering Background Recommendation Systems
Content-based: use “intrinsic” features such as genre, cast, director, writers, age, channel-type,… Combined, Cascaded Collaborative filtering: use other people’s ratings

Content isn’t sufficient
Genres are few Text length is small

Data cycle Rated shows in sorted order Thumbs Profile on TiVo
1: Collecting Feedback: Thumbs up/dn Recorded Rated shows in sorted order Thumbs Profile on TiVo Client box 5. Use correlations and Thumbs profile to rate shows 2. TiVo calls server uploads entire anonymized profile Correlation pairs on client No persistent state (even in randomized form) on server for each viewer Random ID generated for profile and stored on server Correlation pairs <s1,s2,r> on server 4. Download pairs during some client-initiated calls 3. Server- side learning

Collaborative Filtering Model
k Nearest Neighbor over other rated correlated shows Use Pair-wise Pearson correlation Adjusted correlation for low support Use weighted linear combination

1. Collecting Feedback Explicit: Implicit: Thumbs up, down: -3 ... +3
User-initiated recording thumbs

2. Privacy and Data Upload
TiVo calls server daily Entire profile uploaded and given temp id Server deletes old profiles: sliding window

3.1 Server-side scaling 300,000 unique shows /week
10^11 pairs of shows 3M users Average of 90 thumbs / user: > 10^8 thumbs (ratings) Ratings are sparse in the pair space Don’t need to predict for very unpopular pairs

3.2 Server-side Learning Building pair-wise item/item correlations on server Use simple Pearson pair-wise correlation 7 ratings levels per show [-3 … +3] Only need to maintain 7 * 7 array of counts per pair Efficient: CPU, memory Compute r-to-z transform to computer confidence interval Support-penalized degree of correlation: lower bound of confidence-interval Distinguishes r = 0.8 for S=10 versus S=1000

3.3 Throttled Server-side Architecture
Log Collector 1 Boxes K Log Collector m Boxes 100K(m-1) K m By-series Counter 1 Series 0..30K By-series Counter n Series 30k(n-1)..30kn 1: By-series-pair Counter and Correlations Calc. P: By-series-pair Counter and Correlations Calc. Transmit correlation pairs to TiVo Clients

3.4 Server-side throttling
min_single (150) min_pair (100) Throttle-able: More HW available Increasing TiVo population Go deeper into distribution tail

Details Pearson r Weighted average r-to-z transform (Fisher)
Standard: Lower bound of confidence interval:

4. Download to clients 28K pairs sent to client (320kb)
Correl. between old shows don’t change fast New Shows: want to do it faster

5. Client-side processing
Ratings must not cause video glitching! 2am: TiVo re-rates all shows Collab: k-nearest neighbor Content-based: Naïve Bayes

Previous Work User-user or item-item - Sarwar et al Form of model
k-nearest neighbor, Bayes nets (Breese et al.), Factor Analysis (Canny) Similarity/distance function Pearson (subsumes cosine) TFIDF corrections (Salton et al.) User amplification Combination functions: k-NN, Bayes nets.. Evaluation Criteria: MAE, Spearman rank correl.

Contributions Large fielded system Distributed architecture
Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours Distributed architecture Server: Throttle-able Clients do actual suggestion calculations Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server

Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo

Similar presentations

Presentation on theme: "Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo

Similar presentations

Presentation on theme: "Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo"— Presentation transcript:

Similar presentations

About project

Feedback