Presentation is loading. Please wait.

Presentation is loading. Please wait.

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Similar presentations


Presentation on theme: "CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER."— Presentation transcript:

1 CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER BY RAYMOND J. MOONEY AND LORIENE ROY UNIVERSITY OF TEXAS, AUSTIN

2 2 OVERVIEW Introduction Techniques Drawbacks of Existing Systems Advantages of Content Based Systems LIBRA System Description Experimental Results Future Work Conclusions

3 3 INTRODUCTION General goal of a Recommender System Make personalized suggestions based on previous examples of users likes and dislikes Types Existing systems that use Social Filtering methods (base recommendations on other users preferences) Content Based systems (use information about an item itself to make suggestions)

4 4 INTRODUCTION Companies Firefly Net Perceptions LikeMinds Amazon ( Book Recommending ) Barnes And Noble ( Book Recommending )

5 5 TECHNIQUES Social / Collaborative Filtering Maintain a Database of user preferences Find other users whose known preferences correlate significantly with a given user Content Based Filtering Allows a system to uniquely characterize each user without having to match their interests to someone else’s Items are recommended based on the information of the item itself

6 6 DRAWBACKS OF EXISTING SYSTEMS Assume that a given user’s tastes are generally the same as another user Assume that there are sufficient number of ratings Tend to recommend popular titles Need for sufficient information about other users which raises concerns about privacy and access to customer data

7 7 ADVANTAGES OF CONTENT BASED SYSTEMS Items are recommended based on the content of the item rather than on other users preferences Provides a way to list content features that caused the item to be recommended Allows users to provide initial subject information to aid the system

8 8 LIBRA (Learning Intelligent Book Recommending Agent) A database of book information extracted from web pages at Amazon.com Users select a set of training books and rate them on a scale of 1-10 System learns a profile of the user using a Bayesian learning algorithm Produces a ranked list of the most recommended additional titles from the system catalog

9 9 SYSTEM DESCRIPTION Extracting information and building a database Perform Amazon subject search Download book description URL’s Information Extraction using slots to get valuable information about each book Current slots used are title, authors, published reviews and many more A simple extraction system is sufficient as the layout of Amazon’s automatically generated pages is regular Some preprocessing is done (author names into unique tokens of the form first_initial_last-name)

10 10 SYSTEM DESCRIPTION Learning a Profile User selects titles (maybe for a particular author) - Need not perform a random scan of the entire database Users rate the selected titles based on a scale of 1-10 Naïve Bayesian text classifier is used to classify a book title as either positive(6-10) or negative(1-5) N training books B e (1 <= e <= N) Each has 2 real weights -Positive weight  e1 = (r-1)/9 -Negative weight  e0 = 1 -  e1 -r = user rating (1 <= r <= 10)

11 11 SYSTEM DESCRIPTION Parameters P(c j ) =   ej / N P(w k |c j, s m ) =   ej n kem / L(c j, s m ) –Where n kem = count of the number of times a word w k appears in example B e in slot s m –L(c j, s m ) =   ej / d m denotes the total weighted length of the documents in category c j and slot s m – d m = vector of documents Strength – It measures how much more likely a word in a slot is to appear in a positively rated book than a negatively rated book

12 12 Sample Positive Profile Features Slot Word Strength WORDS ZUBRIN 9.85 WORDS SMOLIN 9.39 WORDS TREFIL 8.77 WORDS DOT 8.67 SUBJECTS COMPARATIVE8.39 AUTHOR D GOLDSMITH 8.04 WORDS ALH 7.97 WORDS MANNED 7.97 RELATED­TITLES SETTLE 7.91

13 13 SYSTEM DESCRIPTION Producing, Explaining and Revising Recommendations Once a profile is learnt, it is used to predict the preferred ranking of the remaining books Recommendations are reviewed by the user and the user may assign their own rating to the examples they believe to be incorrectly ranked Retrain the system by repeating the above several times in order to produce the best results

14 14 EXPERIMENTAL RESULTS Data Collection Several data sets were assembled (LIT1, LIT2, MYST, SCI, SF) In order to present a quantitative picture of performance on a realistic sample, books were selected at random If the user was not familiar with a book, the user was asked to give a rating based on the information provided by the Amazon page describing the book

15 15 EXPERIMENTAL RESULTS Performance Evaluation Performed 10-fold cross validation on the examples Various metrics were used to measure the performance –Classification accuracy (Acc): The percentage of examples correctly classified as positive or negative –Precision (Pr): The percentage of examples classified as positive which are positive

16 16 EXPERIMENTAL RESULTS Discussion User-selected examples v/s Randomly selected examples –User-selected examples are better as the user can accurately rate the selection –Randomly selected examples tend to cover the complete dataset Conclusion – Avoid prematurely committing to a specific methodology

17 17 EXPERIMENTAL RESULTS Can Collaborative and Content-Based approaches be combined to produce better results? Slots – related authors, related titles When the above slots were removed, performance degraded Use of both approaches together produces better results

18 18 FUTURE WORK Web-Based interface (with a larger body of users) Compare LIBRA’s Content-Based Approach to a standard Collaborative Approach Maximize the utility of the small training set by using various Machine Learning techniques –Unsupervised learning –Active learning (incremental approach) One effective approach – provide highly rated examples, generate initial recommendations, review the results, provide low rating for bad items and retrain the system to get new recommendations

19 19 CONCLUSIONS Content-Based Approach holds the promise of being able to effectively recommend items that have not been rated Provides accurate information without any background knowledge of other users preferences Combining Collaborative techniques does provide better results www.cs.utexas.edu/users/ml/recommender.html Partially supported by NSF

20 20 QUESTIONS ??


Download ppt "CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER."

Similar presentations


Ads by Google