Download presentation
Presentation is loading. Please wait.
1
3/4: The Zombie Day Feedback thingie.. Last bits of clustering Review of any questions etc –Assuming I actually remember.. Filtering
2
Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides
3
Personalization Recommenders are instances of personalization software. Personalization concerns adapting to the individual needs, interests, and preferences of each user. Includes: –Recommending –Filtering –Predicting (e.g. form or calendar appt. completion) From a business perspective, it is viewed as part of Customer Relationship Management (CRM).
4
Feedback & Prediction/Recommendation Traditional IR has a single user—probably working in single-shot modes –Relevance feedback… WEB search engines have: –Working continually User profiling –Profile is a “model” of the user (and also Relevance feedback) –Many users Collaborative filtering –Propagate user preferences to other users… You know this one
5
Recommender Systems in Use Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. Many on-line stores provide recommendations (e.g. Amazon, CDNow). Recommenders have been shown to substantially increase sales at on-line stores.
6
Feedback Detection –Click certain pages in certain order while ignore most pages. –Read some clicked pages longer than some other clicked pages. –Save/print certain clicked pages. –Follow some links in clicked pages to reach more pages. –Buy items/Put them in wish-lists/Shopping Carts –Explicitly ask users to rate items/pages Non-Intrusive Intrusive
7
3/11 -Midterm returned --Two talks of interest tomorrow --Louiqa Raschid@10am --Zaiqing Nie@2pm
8
Midterm(In-class) Overall class –Max 61.5 Min 12 Mean 38.15 Stdev 13.4 494 section –Max 59.5 Min 12 Mean 32.17 Stdev 11.98 598section –Max 61.5 Min 30 Mean 47.4 Stdev 10 Pickup your exam at the end of the class
9
Midterm discussion
10
Content-based vs. Collaborative Recommendation
11
Collaborative Filtering A 9 B 3 C : Z 5 A B C 9 : Z 10 A 5 B 3 C : Z 7 A B C 8 : Z A 6 B 4 C : Z A 10 B 4 C 8. Z 1 User Database Active User Correlation Match A 9 B 3 C. Z 5 A 9 B 3 C : Z 5 A 10 B 4 C 8. Z 1 Extract Recommendations C Correlation analysis Here is similar to the Association clusters Analysis!
12
Collaborative Filtering Method Weight all users with respect to similarity with the active user. Select a subset of the users (neighbors) to use as predictors. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. Present items with highest predicted ratings as recommendations.
13
Neighbor Selection For a given active user, a, select correlated users to serve as source of predictions. Standard approach is to use the most similar n users, u, based on similarity weights, w a,u Alternate approach is to include all users whose similarity weight is above a given threshold.
14
Rating Prediction Predict a rating, p a,i, for each item i, for active user, a, by using the n selected neighbor users, u {1,2,…n}. To account for users different ratings levels, base predictions on differences from a user’s average rating. Weight users’ ratings contribution by their similarity to the active user. ri,j is user i’s rating for item j
15
Similarity Weighting Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. r a and r u are the ratings vectors for the m items rated by both a and u r i,j is user i’s rating for item j
16
Covariance and Standard Deviation Covariance: Standard Deviation:
17
Significance Weighting Important not to trust correlations based on very few co-rated items. Include significance weights, s a,u, based on number of co-rated items, m.
18
Problems with Collaborative Filtering Cold Start: There needs to be enough other users already in the system to find a match. Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Cannot recommend an item that has not been previously rated. –New items –Esoteric items Popularity Bias: Cannot recommend items to someone with unique tastes. – Tends to recommend popular items. WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEY SPEARS YOU DUNDERHEAD? #$%$%$&^
19
Content-Based Recommending Recommendations are based on information on the content of items rather than on other users’ opinions. Uses machine learning algorithms to induce a profile of the users preferences from examples based on a featural description of content. Lots of systems
20
Advantages of Content-Based Approach No need for data on other users. –No cold-start or sparsity problems. Able to recommend to users with unique tastes. Able to recommend new and unpopular items – No first-rater problem. Can provide explanations of recommended items by listing content-features that caused an item to be recommended. Well-known technology The entire field of Classification Learning is at (y)our disposal!
21
Disadvantages of Content-Based Method Requires content that can be encoded as meaningful features. Users’ tastes must be represented as a learnable function of these content features. Unable to exploit quality judgments of other users. –Unless these are somehow included in the content features.
22
Primer on Classification Learning (you can learn more about this in CSE 471 Intro to AI CSE 575 Datamining EEE 511 Neural Networks) FAST
23
Many uses of Classification Learning in IR/Web Search Learn user profiles Classify documents into categories based on their contents –Useful in focused crawling Spam mail filtering Relevance reasoning..
24
A classification learning example Predicting when Rusell will wait for a table --similar to book preferences, predicting credit card fraud, predicting when people are likely to respond to junk mail
25
Uses different biases in predicting Russel’s waiting habbits Naïve bayes (bayesnet learning) --Examples are used to --Learn topology --Learn CPTs Neural Nets --Examples are used to --Learn topology --Learn edge weights Decision Trees --Examples are used to --Learn topology --Order of questions If patrons=full and day=Friday then wait (0.3/0.7) If wait>60 and Reservation=no then wait (0.4/0.9) Association rules --Examples are used to --Learn support and confidence of association rules SVMs K-nearest neighbors
26
Mirror, Mirror, on the wall Which learning bias is the best of all? Well, there is no such thing, silly! --Each bias makes it easier to learn some patterns and harder (or impossible) to learn others: -A line-fitter can fit the best line to the data very fast but won’t know what to do if the data doesn’t fall on a line --A curve fitter can fit lines as well as curves… but takes longer time to fit lines than a line fitter. -- Different types of bias classes (Decision trees, NNs etc) provide different ways of naturally carving up the space of all possible hypotheses So a more reasonable question is: -- What is the bias class that has a specialization corresponding to the type of patterns that underlie my data? -- In this bias class, what is the most restrictive bias that still can capture the true pattern in the data? --Decision trees can capture all boolean functions --but are faster at capturing conjunctive boolean functions --Neural nets can capture all boolean or real-valued functions --but are faster at capturing linearly seperable functions --Bayesian learning can capture all probabilistic dependencies But are faster at capturing single level dependencies (naïve bayes classifier)
27
Fitting test cases vs. predicting future cases The BIG TENSION…. Why Simple is Better? Why not the 3rd? 1 2 3
28
Naïve Bayesian Classification Problem: Classify a given example E into one of the classes among [C 1, C 2,…, C n ] –E has k attributes A 1, A 2,…, A k and each A i can take d different values Bayes Classification: Assign E to class C i that maximizes P(C i | E) P(C i | E) = P(E| C i ) P(C i ) / P(E) P(C i ) and P(E) are a priori knowledge (or can be easily extracted from the set of data) Estimating P(E|C i ) is harder –Requires P(A 1 =v 1 A 2 =v 2 ….A k =v k |C i ) Assuming d values per attribute, we will need nd k probabilities Naïve Bayes Assumption: Assume all attributes are independent P(E| C i ) = P(A i =v j | C i ) –The assumption is BOGUS, but it seems to WORK (and needs only n*d*k probabilities
29
NBC in terms of BAYES networks.. NBC assumption More realistic assumption
30
Estimating the probabilities for NBC Given an example E described as A 1 =v 1 A 2 =v 2 ….A k =v k we want to compute the class of E –Calculate P(C i | A 1 =v 1 A 2 =v 2 ….A k =v k ) for all classes C i and say that the class of E is the one for which P(.) is maximum –P(C i | A 1 =v 1 A 2 =v 2 ….A k =v k ) = P(v j | C i ) P(C i ) / P(A 1 =v 1 A 2 =v 2 ….A k =v k ) Given a set of training N examples that have already been classified into n classes C i Let #(C i ) be the number of examples that are labeled as C i Let #(C i, A i =v i ) be the number of examples labeled as C i that have attribute A i set to value v j P(C i ) = #(C i )/N P(A i =v j | C i ) = #(C i, A i =v i ) / #(C i ) Common factor USER PROFILE
31
P(willwait=yes) = 6/12 =.5 P(Patrons=“full”|willwait=yes) = 2/6=0.333 P(Patrons=“some”|willwait=yes)= 4/6=0.666 P(willwait=yes|Patrons=full) = P(patrons=full|willwait=yes) * P(willwait=yes) ----------------------------------------------------------- P(Patrons=full) = k*.333*.5 P(willwait=no|Patrons=full) = k* 0.666*.5 Similarly we can show that P(Patrons=“full”|willwait=no) =0.6666 Example
32
Using M-estimates to improve probablity estimates The simple frequency based estimation of P(A i =v j |C k ) can be inaccurate, especially when the true value is close to zero, and the number of training examples is small (so the probability that your examples don’t contain rare cases is quite high) Solution: Use M-estimate P(A i =v j | C i ) = [#(C i, A i =v i ) + mp ] / [#(C i ) + m] –p is the prior probability of A i taking the value v i If we don’t have any background information, assume uniform probability (that is 1/d if A i can take d values) –m is a constant—called “equivalent sample size” If we believe that our sample set is large enough, we can keep m small. Otherwise, keep it large. Essentially we are augmenting the #(C i ) normal samples with m more virtual samples drawn according to the prior probability on how A i takes values Also, to avoid overflow errors do addition of logarithms of probabilities (instead of multiplication of probabilities)
33
Applying NBC for Text Classification Text classification is the task of classifying text documents to multiple classes –Is this mail spam? –Is this article from comp.ai or misc.piano? –Is this article likely to be relevant to user X? –Is this page likely to lead me to pages relevant to my topic? (as in topic- specific crawling) NBC has been applied a lot to text classification tasks. The big question: How to represent text documents as feature vectors? –Vector space variants (e.g. a binary version of the vector space rep) Used by Sahami et.al. in SPAM filtering A problem is that the vectors are likely to be as large as the size of the vocabulary –Use “feature selection” techniques to select only a subset of words as features (see Sahami et al paper) –Unigram model [Mitchell paper] Used by Joachims for newspaper article categorization Document as a vector of positions with values being the words
34
25 th March Text Classification Spam mail filtering
35
Extensions to Naïve Bayes idea Vector of Bags model –E.g. Books have several different fields that are all text Authors, description, … A word appearing in one field is different from the same word appearing in another –Want to keep each bag different—vector of m Bags Additional useful terms Odds Ratio P(rel|example)/P(~rel|example) An example is positive if the odds ratio is > 1 Strengh of a keyword –Log[P(w|rel)/P(w|~rel)] We can summarize a user’s profile in terms of the words that have strength above some threshold.
36
Sahami et al’s Solution for SPAM detection use standard “Term Vector Space” model developed by Information Retrieval field (similar to AdEater) 1 e-mail message single fixed-width feature vector have 1 bit in this vector for each term that occurs in some message in E (plus a bunch of domain-specific features—eg, when message was sent) –learning algorithm use standard “Naive Bayes” algorithm
37
Sahami et. Al. spam filtering The above framework is completely general. We just need to encode each e-mail as a fixed-width vector X = X 1, X 2, X 3,..., X N of features. So... What features are used in Sahami’s system –words –suggestive phrases (“free money”, “must be over 21”,...) –sender’s domain (.com,.edu,.gov,...) –peculiar punctuation (“!!!Get Rich Quick!!!”) –did email contain an attachment? –was message sent during evening or daytime? –? (We’ll see a similar list for AdEater and other learning systems) hand crafted! generated automatically
38
Feature Selection A problem -- too many features -- each vector x contains “several thousand” features. –Most come from “word” features -- include a word if any e-mail contains it (eg, every x contains an “opossum” feature even though this word occurs in only one message). –Slows down learning and predictoins –May cause lower performance The Naïve Bayes Classifier makes a huge assumption -- the “independence” assumption. A good strategy is to have few features, to minimize the chance that the assumption is violated. Ideally, discard all features that violate the assumption. (But if we knew these features, we wouldn’t need to make the naive independence assumption!) Feature selection: “a few thousand” 500 features
39
Feature-Selection approach Lots of ways to perform feature selection –FEATURE SELECTION ~ DIMENSIONALITY REDUCTION One simple strategy: mutual information Suppose we have two random variables A and B. Mutual information MI(A,B) is a numeric measure of what we can conclude about A if we know B, and vice-versa. MI(A,B) = Pr(A&B) log(Pr(A&B)/(Pr(A)Pr(B))) –Example: If A and B are independent, then we can’t conclude anything: MI(A, B) = 0 Note that MI can be calculated without needing conditional probabilities.
40
Mutual Information, continued –Check our intuition: independence -> MI(A,B)=0 MI(A,B) = Pr(A&B) log(Pr(A&B)/(Pr(A)Pr(B))) = Pr(A&B) log(Pr(A)Pr(B)/(Pr(A)Pr(B))) = Pr(A&B) log 1 = 0 –Fully correlated, it becomes the “information content” MI(A,A)= - Pr(A)log(Pr(A)) – {it depends on how “uncertain” the event is; notice that the expression becomes maximum (=1) when Pr(A)=.5; this makes sense since the most uncertain event is one whose probability is.5 (if it is.3 then we know it is likely not to happen; if it is.7 we know it is likely to happen).
41
MI and Feature Selection Back to feature selection: Pick features X i that have high mutual information with the junk/legit classification C. –These are exactly the features that are good for prediction –Pick 500 features X i with highest value MI(X i, C) NOTE: NBC’s estimate of probabilities is actually quite a bit wrong but they still got by with those.. Also, note that this analysis looks at each feature in isolation and may thus miss highly predictive word groups whose individual words are quite non-predictive –e.g. “free” and “money” may have low MI, but “Free money” may have higher MI. –A way to handle this is to look at MI of not just words but subsets of words » (in the worst case, you will need to compute 2 n MIs… ) –So instead, Sahami et. Al. add domain specific phrases separately.. Note: There’s no reason that the highest-MI features are the ones that least violate the independence assumption -- this is just a heuristic!
42
MI based feature selection vs. LSI Both MI and LSI are dimensionality reduction techniques MI is looking to reduce dimensions by looking at a subset of the original dimensions –LSI looks instead at a linear combination of the subset of the original dimensions (Good: Can automatically capture sets of dimensions that are more predictive. Bad: the new features may not have any significance to the user) MI does feature selection w.r.t. a classification task (MI is being computed between a feature and a class) –LSI does dimensionality reduction independent of the classes (just looks at data variance)
43
Experiments 1789 hand-tagged e-mail messages –1578 junk –211 legit Split into… –1528 training messages (86%) –251 testing messages (14%) –Similar to experiment described in AdEater lecture, except messages are not randomly split. This is unfortunate -- maybe performance is just a fluke. Training phase: Compute Pr[X=x|C=junk], Pr[X=x], and P[C=junk] from training messages Testing phase: Compute Pr[C=junk|X=x] for each training message x. Predict “junk” if Pr[C=junk|X=x]>0.999. Record mistake/correct answer in confusion matrix.
44
Precision/Recall Curves better performance Points from Table on Slide 14
45
Results Junk PrecJunk Rec Legit PrecLegit Rec Acc Words (W) 97948893 Words + phrases (W+P) 98948895 Words + phrases + extra features (W+P+EP) 1009896100 W+P+EF (different messages) 99948797 W+P+EF - legit/porn/junk 96776191 W+P+EF - “real” scenario 9280959895 same configuration, just different training/test messages
46
“Real” scenario Data in previous experiments was collected in a strange way. “Real” scenario tries to fix it. Three kinds of messages: 1.Read and keep 2.Read and discard (eg, joke from a friend) 3.Junk Real scenario models setting where messages arrive, and some are deleted because they are junk, others are deleted because they aren’t worth saving, and others are read and then saved. Both of these should count as “legit” -- but “read & discard” messages were not collected
47
Summary Bayesian Classification Naïve Bayesian Classification Email features: automatically generated lists of words + hand- picked phrases + domain-specific features Feature selection by Mutual Information heuristic Semi-controlled experiments –Collect data in various ways; compare 2/3 categories –Confusion Matrix –Precision & recall vs Accuracy –Can trade precision for recall by varying classification threshold.
48
Current State of the Art in Spam Filtering SpamAssassin (http://www.spamassassin.org ) is pretty much the best spam filter out there (it is FREE!)http://www.spamassassin.org Based on a variety of tests. Each test gives a numerical score (spam points) to the message (the more positive it is, the more spammy it is). When the cumulative scores is above a threshold, it puts the message in spam box. Tests used are at http://www.spamassassin.org/tests.html.http://www.spamassassin.org/tests.html Tests are 1 of three types: –Domain Specific: Has a set of hand-written rules (sort of like the Sahami et. Al. domain specific features). If the rule matches then the message is given a score (+ve or –ve). If the cumulative score is more than a threshold, then the message is classified as SPAM.. –Bayesian Filter: Uses NBC to train on messages that the user classified (requires that SA be integrated with a mail client; ASU IMAP version does it) An interesting point is that it is hard to “explain” to the user why the bayesian filter found a message to be spam (while domain specific filter can say that specific phrases were found). –Collaborative Filter: E.g. Vipul’s razor, etc. If this type of message has been reported as SPAM by other users (to a central spam server), then the message is given additional spam points. Messages are reported in terms of their “signatures” –Simple “checksum” signatures don’t quite work (since the Spammers put minor variations in the body) –So, these techniques use “fuzzy” signatures, and “similarity” rather than “equality” of signatures. (see the connection with Crawling and Duplicate Detection).
49
A message caught by Spamassassin Message 346: From aetbones@ccinet.ab.ca Thu Mar 25 16:51:23 2004 From: Geraldine Montgomery To: randy.mullen@asu.edu Cc: ranga@asu.edu, rangers@asu.edu, rao@asu.edu, raphael@asu.edu, rapture@asu.edu, rashmi@asu.edu Subject: V1AGKRA 80% DISCOUNT !! sg g pz kf Date: Fri, 26 Mar 2004 02:49:21 +0000 (GMT) X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on parichaalak.eas.asu.edu X-Spam-Level: ****************************************** X-Spam-Status: Yes, hits=42.2 required=5.0 tests=BIZ_TLD,DCC_CHECK, FORGED_MUA_OUTLOOK,FORGED_OUTLOOK_TAGS,HTML_30_40,HTML_FONT_BIG, HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_NO_CHARSET, MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MISSING_MIMEOLE, OBFUSCATING_COMMENT,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,RCVD_IN_NJABL, RCVD_IN_NJABL_PROXY,RCVD_IN_OPM,RCVD_IN_OPM_HTTP, RCVD_IN_OPM_HTTP_POST,RCVD_IN_SORBS,RCVD_IN_SORBS_HTTP,SORTED_RECIPS, SUSPICIOUS_RECIPS,X_MSMAIL_PRIORITY_HIGH,X_PRIORITY_HIGH autolearn=no version=2.63 MIME-Version: 1.0 This is a multi-part message in MIME format. ------------=_40637084.02AF45D4 Content-Type: text/plain Content-Disposition: inline --More--
50
Example of SpamAssassin explanation X-Spam-Status: Yes, hits=42.2 required=5.0 tests=BIZ_TLD,DCC_CHECK, FORGED_MUA_OUTLOOK,FORGED_OUTLOOK_TAGS,HTML_30_40,HTML_ FONT_BIG, HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_NO_CHARSE T, MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MISSING_MIMEOLE, OBFUSCATING_COMMENT,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL, RCVD_IN_NJABL, RCVD_IN_NJABL_PROXY,RCVD_IN_OPM,RCVD_IN_OPM_HTTP, RCVD_IN_OPM_HTTP_POST,RCVD_IN_SORBS,RCVD_IN_SORBS_HTTP,S ORTED_RECIPS, SUSPICIOUS_RECIPS,X_MSMAIL_PRIORITY_HIGH,X_PRIORITY_HIGH autolearn=no version=2.63 Domain specific collaborative In this case, autolearn is set to no; so bayesian filter is not active.
51
General comments on Spam Spam is a technical problem (we created it) It has the “arms-race” character to it –We can’t quite legislate against SPAM Most spam comes from outside national boundaries… Need “technical” solutions –To detect Spam (we mostly have a handle on it) –To STOP spam generation (detecting spam after its gets sent still is taxing mail servers—by some estimates more than 66% of the mail relayed by AOL/Yahoo mailservers is SPAM Brother Gates suggest “monetary” cost –Make every mailer pay for the mail they send »Not necessarily in “stamps” but perhaps by agreeing to give some CPU cycles to work on some problem (e.g. finding primes; computing PI etc) »The cost will be minuscule for normal users, but will multiply for spam mailers who send millions of mails. Other innovative ideas needed—we now have a conferences on Spam mail –http://www.ceas.cc/http://www.ceas.cc/
52
NBC with Unigram Model Assume that words from a fixed vocabulary V appear in the document D at different positions (assume D has L words) P(D|C) is P(p1=w1,p2=w2…pL=wl | C) –Assume that words appearance probabilities are independent of each other P(D|C) is P(p1=w1|C)*P(p2=w2|C) …*P(pL=wl | C) –Assume that word occurrence probability is INDEPENDENT of its position in the document –P(p1=w1|C)=P(p2=w1|C)=…P(pL=w1|C) Use m-estimates; set p to 1/V and m to V (where V is the size of the vocabulary) P(w k |C i ) = [#(wk,Ci) + 1]/#w(Ci) + V –#(wk,Ci) is the number of times wk appears in the documents classified into class Ci –#w(Ci) is the total number of words in all documents Used to classify usenet articles from 20 different groups --achieved an accuracy of 89%!! (random guessing will get you 5%)
53
How Well (and WHY) DOES NBC WORK? Naïve bayes classifier is darned easy to implement Good learning speed, classification speed Modest space storage Supports incrementality –Recommendations re-done as more attribute values of the new item become known. It seems to work very well in many scenarios –Peter Norvig, the director of Machine Learning at GOOGLE said, when asked about what sort of technology they use “Naïve bayes” But WHY? –[Domingos/Pazzani; 1996] showed that NBC has much wider ranges of applicability than previously thought (despite using the independence assumption) –classification accuracy is different from probability estimate accuracy Notice that normal classification application application don’t quite care about the actual probability; only which probability is the highest –Exception is Cost-based learning—suppose false positives and false negatives have different costs… »E.g. Sahami et al consider a message to be spam only if Spam class probability is >.9 (so they are using incorrect NBC estimates here)
54
Combining Content and Collaboration Content-based and collaborative methods have complementary strengths and weaknesses. Combine methods to obtain the best of both. Various hybrid approaches: –Apply both methods and combine recommendations. –Use collaborative data as content. –Use content-based predictor as another collaborator. –Use content-based predictor to complete collaborative data.
55
Movie Domain EachMovie Dataset [Compaq Research Labs] –Contains user ratings for movies on a 0–5 scale. –72,916 users (avg. 39 ratings each). –1,628 movies. –Sparse user-ratings matrix – (2.6% full). Crawled Internet Movie Database (IMDb) –Extracted content for titles in EachMovie. Basic movie information: –Title, Director, Cast, Genre, etc. Popular opinions: –User comments, Newspaper and Newsgroup reviews, etc.
56
Content-Boosted Collaborative Filtering IMDb EachMovie Web Crawler Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings Matrix (Sparse) Content-based Predictor Recommendations
57
Content-Boosted CF - I Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings User-ratings Vector User-rated Items Unrated Items
58
Content-Boosted CF - II Compute pseudo user ratings matrix –Full matrix – approximates actual full user ratings matrix Perform CF –Using Pearson corr. between pseudo user-rating vectors User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor
59
Conclusions Recommending and personalization are important approaches to combating information over-load. Machine Learning is an important part of systems for these tasks. Collaborative filtering has problems. Content-based methods address these problems (but have problems of their own). Integrating both is best.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.