Download presentation
Presentation is loading. Please wait.
1
Learning Bit by Bit Collaborative Filtering/Recommendation Systems
2
Collaborative Filtering
3
Collaborative Filtering - Definition Traversing a large body of information contributed “collaboratively” by many different people in such a way as to find similarities between users or things. Bootstrapping these similarities to make recommendations to users.
4
Key Components -tracking user behavior
5
Key Components -tracking user behavior -storing this data long term
6
Key Components -tracking user behavior -storing this data long term -mining this data for patterns (similarity)
7
Key Components -tracking user behavior -storing this data long term -mining this data for patterns (similarity) -predicting future behavior (recommendation)
8
Data!
9
Similarity
10
Similarity is a quantity that reflects the strength of relationship between two objects or two features.
11
Similarity Ratings: movies, songs, restaurants… User did the hard part- quantifying their feelings
12
Similarity Use the info to inform others Notice trends between users Suggest new content, products …
13
Similarity Feature Space
15
cull relevant information from data to create a limited portrait of a person, thing, event or behavior.
17
Euclidean Distance Distance between point A and point B = √(A1 – B1)² + (A2 - B2)²
18
Distance between Rose and Seymour = √(3 – 2)² + (4 - 2)² = 2.236
19
Euclidean Distance √(A1 – B1)² + (A2 - B2)² + (A3 – B3)² + (A4 – B4)² + … + (An – Bn)² Where n is the number of dimensions or features you are looking at
20
Similarity as Distance Somewhat reciprocal: distance as a measure of dissimilarity
21
Similarity as Distance Somewhat reciprocal: distance as a measure of dissimilarity naïve similarity = 1+distance / 1
22
Demo SimilarityMetrics in iweb2.ch3.collaborative.data
23
Pearson Correlation
24
Positive Correlation Negative Correlation No Correlation A B C D
25
Jaccard Index Ratio of the intersection : the union of 2 sets Points in agreement/ total points Movie1, Movie2 Movie1, Movie2, Movie3, Movie4, Movie5 User1: liked Movie1, disliked Movie2, liked Movie3, liked Movie4, disliked Movie5 User2: liked Movie1, disliked Movie2, disliked Movie3, disliked Movie4, liked Movie5
26
Similarity Metrics Summary Euclidean DistancePearson General purpose Normalizing data, finding a relationship Categorical data Jaccard
27
User-based vs. Item-based
28
Recommendations: User-based Find items similar users liked Weighted average to predict a rating for a user
29
Weights
31
Recommendations: Item-based Similarity between items is used instead of similarity between users
32
Recommendations: difference
33
User-based
34
Item-based
35
Types of Collaborative Filtering User-Based -index of user similarities stored -Other users similar to you liked X Item-Based -index of item similarities is stored -Other people who liked X also liked Y -faster for large data sets with less overlapping data between users (sparse)
36
Testing Almost impossible to intuit accuracy Save a portion of the known data for test Ex. If you have 100 users with 10 ratings each randomly spot check accuracy of rating prediction on 10%
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.