Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.

Similar presentations


Presentation on theme: "Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore."— Presentation transcript:

1 Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County ian@cs.umbc.edu

2 Techniques for Collaboration in Text Filtering 2 Overview Text filtering and collaborative filtering Finding collaboration among content profiles Experimental results Ongoing work

3 Techniques for Collaboration in Text Filtering 3 Information Filtering Given a stream of documents (news articles, movies) a set of users (with stable and specific interests) Recommend documents to users who will be interested in them "Tell me when a jazz CD comes out that I'll like." "Tell me when an earthquake is reported."

4 Techniques for Collaboration in Text Filtering 4 Content Filtering Construct profiles from example documents vector of weights for terms in documents can use known relevant and nonrelevant docs can use external resources such as a home page, job description, or research papers Match new documents against content profiles

5 Techniques for Collaboration in Text Filtering 5 Filtering in a Community Many people will be watching the same stream Some of them may have overlapping interests earthquakes, mideast politics, building codes, Turkey Charles Mingus, Duke Ellington, Kenny G Want to take advantage of group effort

6 Techniques for Collaboration in Text Filtering 6 "Pure" Collaborative Filtering collect users' ratings for documents thumbs up/down, or 1-5 scale compute correlations among users predict ratings for new/unseen items using existing ratings and correlation values

7 Techniques for Collaboration in Text Filtering 7 Pure CF Example Alice Bob Carmen Doug ComediesDramas 5 ?9 49 ?9 7 7?29 7818

8 Techniques for Collaboration in Text Filtering 8 Combining Content and Collaboration Pure collaborative filtering can recommend anything must have ratings to give predictions don't know much about documents or ratings Adding content to collaboration content filtering can recommend an unrated document exploit common themes among content profiles

9 Techniques for Collaboration in Text Filtering 9 One Approach to CBCF Construct content profiles Documents are vectors of weighted features Build profiles from known relevant and nonrelevant documents Collaborative step Combine profile vectors into single matrix Compute latent semantic index of profile collection Route new documents in profiles' "LSI space"

10 Techniques for Collaboration in Text Filtering 10 Latent Semantic Indexing Compute singular value decomposition of a content matrix D, a representation of M in r dimensions T, a matrix for transforming new documents  gives relative importance of dimensions w td t  d = T t  r  r  r DTDT r  d

11 Techniques for Collaboration in Text Filtering 11 Collaborating with LSI LSI dimensions are... based on term co-occurrence patterns between documents (profiles) ordered by their prominence in collection LSI space built from profiles highlights common patterns among profiles "noisy" dimensions can be pruned project new documents into a collaborative space for routing

12 Techniques for Collaboration in Text Filtering 12 Experiments with Cranfield Cranfield, a standard (if small) IR collection 1398 documents, 255 scored queries Profiles: selected Cranfield queries 26 queries with  15 relevant documents 70% of profile's relevant docs used in each profile Results shows improvement for using LSI of profiles compared to using profiles alone compared to using LSI of all of Cranfield

13 Techniques for Collaboration in Text Filtering 13 Results: Average Precision k-valueSet 1Set 2 - 0.28940.2705 Content LSI250.26560.1980 500.31360.2686 1000.32510.3053 200 0.3314 0.3144 5000.3302 0.3149 Collaborative LSI80.31360.2583 (LSI of profiles) 15 0.41510.3745 180.36000.3615 Content (log-tfidf) (LSI of all of Cranfield)

14 Techniques for Collaboration in Text Filtering 14 Results: Precision-Recall

15 Techniques for Collaboration in Text Filtering 15 Experiments with TREC TREC-8 routing task Profiles: 50 topics (351-400) Test Documents: Financial Times 1993-4 Training Documents: FT 92, LA Times 89-90, FBIS Building profiles short topic description known relevant documents in training set sample of non-relevant documents from training set

16 Techniques for Collaboration in Text Filtering 16 Average Precision in TREC Average precision... with profiles alone = 0.4464 with profile LSI = 0.3971 LSI shows no improvement over original profiles Some topics conceivably have common interests "hydrogen energy"; "hydrogen fuel automobiles"; "hybrid fuel cars" "clothing sweatshops"; "human smuggling" But too little training overlap?

17 Techniques for Collaboration in Text Filtering 17 Conclusions LSI can improve filtering performance but might not, if SVD can't find anything to work with LSI of profiles is much cheaper to compute than LSI of a whole collection (or even a sample!)

18 Techniques for Collaboration in Text Filtering 18 Current and Future Work Looking at other collections More TREC! Reuters-21578 Collaborative filtering collections... such as? Looking at other techniques Comparison to collaboration alone? Other methods of combining content and collaboration


Download ppt "Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore."

Similar presentations


Ads by Google