Download presentation
Presentation is loading. Please wait.
Published byDaryl Addington Modified over 10 years ago
1
Adaptive Information Filtering Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)
2
Outline Introduction Our Research – Interactive Retrieval Based on Faceted Feedback (SIGIR 2010) – Discriminative Factored Prior Models for Personalized Content-Based Recommendation (CIKM 2010) Future Work 2 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
3
Why Filtering? In some cases, users want to persistently track certain kinds of information on the Internet – CDC (Centers for Disease Control and Prevention) personnel News reports about H1N1 – Physicians New treatments of a disease – FBI investigators Potential terrorist threats – Financial analysts News that may influence a stock For these tasks, search engines that require users to actively issue the queries are not enough 3 We need an intelligent system that can PUSH our desired information to us whenever it is available! Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
4
Adaptive Information Filtering 4 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor) The central task – Identify the relevant documents from a document stream
5
The Cold-Start Problem The filtering performance for new users is usually bad due to a lack of enough training data (user feedback) from these users We follow two directions to handle this problem – Explore new user interaction mechanisms to encourage more user feedback – Research advanced filtering models that can borrow information for new users 5 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
6
Outline Introduction Our Research – Direction 1: A New User Feedback Mechanism Faceted Feedback – Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work 6 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
7
Semi-Structured Documents Semi-structured documents with metadata are proliferating on the Internet – Authors, Topic, Publisher, Created Time, etc. – Metadata might be useful for filtering 7 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
8
From New York Times Human assigned metadata Algorithm generated metadata 8
9
Definitions Facet – Each metadata field is called a facet – E.g., Date, Topic, Location, Author, etc. Facet-Value Pair – A metadata field with a specific value is called a facet-value pair – E.g., Publisher = New York Times 9 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
10
Faceted Feedback Traditional User Feedback Mechanism – Allows users to provide feedback on the relevance of documents Doc1 Relevant Doc2 Non-relevant Faceted Feedback – Allows users to provide feedback on facet-value pairs – Each facet-value pair represents a constraint on the desired documents Topic = FIFA World Cup Yes Year = 2010 Yes Year = 2006 No 10 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
11
Why Faceted Feedback Users may have clear ideas on some facets of the target documents –FIFA World Cup Year = 2010 May encourage user feedback – Facet-value pairs are short and easy to understand 11 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
12
Research Questions Question 1 – How to select a small number of facet-value pair candidates? Question 2 – How to make use of faceted feedback? 12 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
13
Q1: Facet-Value Pair Selection Four approaches to rank facet-value pairs – Top Document Frequency (TDF) Frequency in the top N ranked documents – TDF*IDF (Inverse Document Frequency) – Query Likelihood (QL) P(q|f=v) – TDF+QL TDF: P(f=v|q) QL: P(q|f=v) 13 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
14
Q2: How to Use Faceted Feedback? The commonly used method – Boolean Model Problem with Boolean Model – Document metadata is not perfect Inaccurate / incomplete – This may badly hurt the retrieval performance 14 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
15
The Soft Model The basic idea – Rewarding documents with user-identified facet- value pairs by adding a certain number of credits – The number of credits for each facet are learnt on training queries 15 Score(d) = original score + rewards for facet match Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
16
Experimental Settings Datasets OHSUMED + Queries from TREC (Text REtrieval Conference) 2000 filtering track 348,566 medical articles, 63 queries RCV1 + Queries from TREC 2002 filtering track ~810,000 news articles from Reuters, 50 queries User Study We collected user faceted feedback on Amazon Mechanical Turk 16 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
17
Chosen Facets 17 OHSUMED RCV1 MeSH (Medical Subject Headlines) Region Industry Topic Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
18
Experimental Results: Overall Performance of Faceted Feedback Faceted feedback significantly improves the retrieval performance 18 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
19
Experimental Results: Boolean Models vs. Soft Model OHSUMED RCV1 The Boolean models dont work well or even hurt, while the soft model always performs well 19 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
20
Outline Introduction Our Research – Direction 1: A New User Feedback Mechanism Faceted Feedback – Direction 2: A New Filtering Model Discriminative Factored Prior Model Future Work 20 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
21
Existing Filtering Approaches Two categories – Retrieval models + threshold setting methods Rocchio, BM25, Language Models, etc. – Standard machine learning models for binary text classification Naïve Bayes, logistic regression, SVM, neural networks, etc. 21 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
22
Characteristics of User Interests For example, – User 1: Sports, Technology – User 2: Sports, Politics, Shopping – User 3: Politics, Technology, Travel Characteristics – A single user may have multiple interests – Different users may have overlapped interests Existing filtering approaches dont explicitly capture these characteristics 22 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
23
Discriminative Factored Prior Models (DFPM) 23 The hidden factor matrix The variance matrix The profile/classifier of user m The feature vector of the j-th training document of user m The label of the j-th training document of user m The hidden vector of user m
24
Advantages As discriminative models, our models can incorporate any kinds of features – Textual features (words) – Semantic features (very useful) Topic = Lung Cancer Source = Cancer Cause and Control Borrow information from other users when learning profiles for new users – All user profiles share a common hidden factor matrix Capture a single users multiple interests – Each user profile follows a factored prior distribution 24 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
25
Parameter Estimation Assume is diagonal and all entries are equivalent to a constant value c 1, then 25
26
Optimization Use an EM-like iterative algorithm to solve the above optimization problem 1: Initialize 2: 3: Close form solution! Conjugate gradient decent 26
27
Experimental Settings Dataset – Collected from Digg.com, where users can digg their interested news articles to promote their rankings – 15,162 users, 251 relevant documents per user Details – 80%(training), 10%(validation), 10%(test) – Words as features: 35,865 (TFIDF score) – Metrics: Precision, Recall, Macro-F1 Baselines – L-2 normalized Logistic Regression (L2LR) Learns user profile separately without borrowing information – The standard Bayesian Hierarchical model with Logistic Regression (BHLR) Uses a standard prior 27 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
28
Performance Comparison Our models outperform the baselines significantly 28 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
29
Outline Introduction Our Research – A New User Interaction Mechanism Faceted Feedback – A New Filtering Approach Discriminative Factored Prior Model Future Work 29 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
30
Future Work Active learning on facet-value pair selection – To maximize learning benefits Integrating multiple types of user feedback – Feedback on documents – Feedback on facets – … 30 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
31
Thanks! Comments & Questions ? 31 Adaptive Information Filtering. Lanbo Zhang, Yi Zhang (UCSC advisor), Carla Kuiken (LANL mentor)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.