Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Latent Semantic Analysis
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Search Engines and Information Retrieval
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
Indexing by Latent Semantic Analysis Written by Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) Reviewed by Cinthia Levy.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Sparsity, Scalability and Distribution in Recommender Systems
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.
Chapter 5: Information Retrieval and Web Search
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
Search Engines and Information Retrieval Chapter 1.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Toward the Next generation of Recommender systems
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
CONCLUSION & FUTURE WORK Given a new user with an information gathering task consisting of document IDs and respective term vectors, this can be compared.
Chapter 6: Information Retrieval and Web Search
Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
SINGULAR VALUE DECOMPOSITION (SVD)
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Collaborative Filtering Zaffar Ahmed
Computer-assisted essay assessment zSimilarity scores by Latent Semantic Analysis zComparison material based on relevant passages from textbook zDefining.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
Evaluation of Information Retrieval Systems Xiangming Mu.
Natural Language Processing Topics in Information Retrieval August, 2002.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Item-Based Collaborative Filtering Recommendation Algorithms
Collaborative Filtering With Decoupled Models for Preferences and Ratings Rong Jin 1, Luo Si 1, ChengXiang Zhai 2 and Jamie Callan 1 Language Technology.
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Relevance and Reinforcement in Interactive Browsing
Restructuring Sparse High Dimensional Data for Effective Retrieval
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Latent Semantic Analysis
Presentation transcript:

Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County

Techniques for Collaboration in Text Filtering 2 Overview Text filtering and collaborative filtering Finding collaboration among content profiles Experimental results Ongoing work

Techniques for Collaboration in Text Filtering 3 Information Filtering Given a stream of documents (news articles, movies) a set of users (with stable and specific interests) Recommend documents to users who will be interested in them "Tell me when a jazz CD comes out that I'll like." "Tell me when an earthquake is reported."

Techniques for Collaboration in Text Filtering 4 Content Filtering Construct profiles from example documents vector of weights for terms in documents can use known relevant and nonrelevant docs can use external resources such as a home page, job description, or research papers Match new documents against content profiles

Techniques for Collaboration in Text Filtering 5 Filtering in a Community Many people will be watching the same stream Some of them may have overlapping interests earthquakes, mideast politics, building codes, Turkey Charles Mingus, Duke Ellington, Kenny G Want to take advantage of group effort

Techniques for Collaboration in Text Filtering 6 "Pure" Collaborative Filtering collect users' ratings for documents thumbs up/down, or 1-5 scale compute correlations among users predict ratings for new/unseen items using existing ratings and correlation values

Techniques for Collaboration in Text Filtering 7 Pure CF Example Alice Bob Carmen Doug ComediesDramas 5 ?9 49 ?9 7 7?

Techniques for Collaboration in Text Filtering 8 Combining Content and Collaboration Pure collaborative filtering can recommend anything must have ratings to give predictions don't know much about documents or ratings Adding content to collaboration content filtering can recommend an unrated document exploit common themes among content profiles

Techniques for Collaboration in Text Filtering 9 One Approach to CBCF Construct content profiles Documents are vectors of weighted features Build profiles from known relevant and nonrelevant documents Collaborative step Combine profile vectors into single matrix Compute latent semantic index of profile collection Route new documents in profiles' "LSI space"

Techniques for Collaboration in Text Filtering 10 Latent Semantic Indexing Compute singular value decomposition of a content matrix D, a representation of M in r dimensions T, a matrix for transforming new documents  gives relative importance of dimensions w td t  d = T t  r  r  r DTDT r  d

Techniques for Collaboration in Text Filtering 11 Collaborating with LSI LSI dimensions are... based on term co-occurrence patterns between documents (profiles) ordered by their prominence in collection LSI space built from profiles highlights common patterns among profiles "noisy" dimensions can be pruned project new documents into a collaborative space for routing

Techniques for Collaboration in Text Filtering 12 Experiments with Cranfield Cranfield, a standard (if small) IR collection 1398 documents, 255 scored queries Profiles: selected Cranfield queries 26 queries with  15 relevant documents 70% of profile's relevant docs used in each profile Results shows improvement for using LSI of profiles compared to using profiles alone compared to using LSI of all of Cranfield

Techniques for Collaboration in Text Filtering 13 Results: Average Precision k-valueSet 1Set Content LSI Collaborative LSI (LSI of profiles) Content (log-tfidf) (LSI of all of Cranfield)

Techniques for Collaboration in Text Filtering 14 Results: Precision-Recall

Techniques for Collaboration in Text Filtering 15 Experiments with TREC TREC-8 routing task Profiles: 50 topics ( ) Test Documents: Financial Times Training Documents: FT 92, LA Times 89-90, FBIS Building profiles short topic description known relevant documents in training set sample of non-relevant documents from training set

Techniques for Collaboration in Text Filtering 16 Average Precision in TREC Average precision... with profiles alone = with profile LSI = LSI shows no improvement over original profiles Some topics conceivably have common interests "hydrogen energy"; "hydrogen fuel automobiles"; "hybrid fuel cars" "clothing sweatshops"; "human smuggling" But too little training overlap?

Techniques for Collaboration in Text Filtering 17 Conclusions LSI can improve filtering performance but might not, if SVD can't find anything to work with LSI of profiles is much cheaper to compute than LSI of a whole collection (or even a sample!)

Techniques for Collaboration in Text Filtering 18 Current and Future Work Looking at other collections More TREC! Reuters Collaborative filtering collections... such as? Looking at other techniques Comparison to collaboration alone? Other methods of combining content and collaboration