Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.

Slides:



Advertisements
Similar presentations
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Advertisements

Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Adaptive Information Filtering Lanbo Zhang (ISSDM fellow) Yi Zhang (UCSC advisor) Carla Kuiken (LANL mentor)
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Pattern Recognition and Machine Learning
Visual Recognition Tutorial
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Lecture 5: Learning models using EM
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Variational Bayesian Inference for fMRI time series Will Penny, Stefan Kiebel and Karl Friston Wellcome Department of Imaging Neuroscience, University.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
November 10, 2004Dmitriy Fradkin, CIKM'041 A Design Space Approach to Analysis of Information Retrieval Adaptive Filtering Systems Dmitriy Fradkin, Paul.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
The TREC-9 Adaptive Filtering track (Coordinators: David Hull and Stephen Robertson) Stephen Robertson Microsoft Research Cambridge
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,
A Review of Information Filtering Part I: Adaptive Filtering Chengxiang Zhai Language Technologies Institiute School of Computer Science Carnegie Mellon.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Carnegie Mellon University
CEE 6410 Water Resources Systems Analysis
Empirical risk minimization
Bounding the error of misclassification
An Empirical Study of Learning to Rank for Entity Search
Ch3: Model Building through Regression
Multimodal Learning with Deep Boltzmann Machines
MMS Software Deliverables: Year 1
Lecture 04: Logistic Regression
Dipartimento di Ingegneria «Enzo Ferrari»,
Relevance Feedback Hongning Wang
John Lafferty, Chengxiang Zhai School of Computer Science
Biointelligence Laboratory, Seoul National University
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
CS 4501: Information Retrieval
Empirical risk minimization
Relevance and Reinforcement in Interactive Browsing
Learning to Rank with Ties
Presentation transcript:

Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America

A Typical Adaptive Filtering System Filtering System … Accumulated docs document stream Delivered docs Feedback User Profile Learning First Request initialization (Binary Classifier) (Utility Function)

Commonly Used Evaluation RelevantNon-Relevant DeliveredARAR ANAN Not DeliveredBRBR BNBN If we assume user satisfaction is mostly influenced by what she/he has seen, then a simplified version for utility is: For example: Utility=2R + -N + (Used in TREC9, TREC10, TREC11 Adaptive Filtering Track trec.nist.gov)

Common Approach in Adaptive Filtering Set the dissemination threshold where the immediate utility gain of delivering a document is zero: For example: in order to optimize Utility=2R + -N +, system delivers iff P(Rel)>=0.33 Because U immediate =2P(Rel)-P(Nrel)>=0

Problem with Current Adaptive Filtering Research Why deliver a document to the user? 1.Satisfies the information need immediate 2.Get user feedback so the system can improve its model of the user’s information need, thus satisfy the information need better in the future Current research in adaptive filtering: underestimates the utility gain of delivering a document by ignoring the second effect –Related work: active learning, Bayesian experimental design

Solution: Explicitly Model the Future Utility of Delivering a Document N future : number of discounted documents in the future Exploitation: estimation of the immediate utility of delivering a new document based on model learned Exploration: estimation of the future utility of delivering a new document by considering the improvement of the model learned if we can get user feedback obout the document.

Exploitation: Estimate U immediate Using Bayesian Inference Let P(  |D t-1 ) be the posterior distribution of model parameters given training data set D t-1. Using Bayesian Inference, we have: A y is the credit/penalty defined by the utility function that model user satisfaction Y=R if relevant, y=N if none relevant

Exploration: Utility Divergence to Measure Loss (1) If we use while the true model is, we incur some loss (utility divergence (UD)): Document Space deliver

Exploration: Utility Divergence to Measure Loss (2) We do not know. However, based on our beliefs about its distribution, we can estimate the expected loss of using : Thus we can measure the quality of Training data D as the expected loss if we use the estimator

The Whole Process Step 1: Step 2: x y

Adaptive Filtering: Logistic Regression to Find Dissemination Threshold X: score* indicates how well each document matches the profile Metropolis-Hasting algorithm to sample  I for integration. *scoring function is learned adaptively using Rocchio algorithm

Experimental Data Sets and Evaluation Measures TREC9 OHSUMED TREC10 Reuter’s relevant total+300, ,000 Relevant%0.016%1.2% Initialization 2 relevant documents + topic description

Trec-10 Filtering Data: Reuters Dataset Bayesian Active Bayesian Immediate Norm. Exp. ML N-E T9U T11SU Precision Recall Docs/Profile Active learning is very effective on TREC10 dataset

Trec-9 Filtering Data: OHSUMED Dataset Bayesian Active Bayesian Immediate Norm. Exp. ML N-E T9U T11SU Precision Recall Docs/Profile On average, only 51 out of are relevant documents. Active learning didn’t improve utility on TREC9 dataset. But it didn’t hurt either. (The algorithm is robust)

Related Work –active learning Uncertainty about the label of document –Request the label of the most uncertain document –Minimize the uncertainty about future labels Uncertainty about the model parameters (KL divergence, variance) –Bayesian Experimental Design Improvement of the utility of the model –Information Retrieval Mutual Information between document and label

Contribution and Future Work Our Contribution –Derivation of Utility Divergence to measure model quality –Combining immediate utility and future utility gain in adaptive filtering task –Empirically robust algorithm Future Work –High dimensional space Computational issues: variational algorithms, Gaussian approximations, Gibbs sampling, … Number of training data needed –Other active learning applications Online marketing Interactive retrieval …

The End Thanks