Evidence from Behavior LBSC 796/CMSC 828o Douglas W. Oard Session 5, February 23, 2004.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search and Ye Shall Find (maybe) Seminar on Emergent Information Technology August 20, 2007 Douglas W. Oard.
Collaborative Filtering Sue Yeon Syn September 21, 2005.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Search Engines and Information Retrieval
K nearest neighbor and Rocchio algorithm
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Search Session 12 LBSC 690 Information Technology.
Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.
Information Retrieval in Practice
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, November 12, 2007.
August 1, 2003SIGIR Implicit Workshop Protecting the Privacy of Observable Behavior in Distributed Recommender Systems Douglas W. Oard University of Maryland.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Evidence from Behavior INST 734 Doug Oard Module 7.
Patterns, effective design patterns Describing patterns Types of patterns – Architecture, data, component, interface design, and webapp patterns – Creational,
User Models for Personalization Josh Alspector Chief Technology Officer.
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
Filtering and Recommendation INST 734 Module 9 Doug Oard.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Information Filtering LBSC 878 Douglas W. Oard and Dagobert Soergel Week 10: April 12, 1999.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Filtering and Recommendation INST 734 Module 9 Doug Oard.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Similarity & Recommendation Arjen P. de Vries CWI Scientific Meeting September 27th 2013.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Relevance Feedback Hongning Wang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
User Modeling and Recommender Systems: recommendation algorithms
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Information Retrieval in Practice
Recommender Systems & Collaborative Filtering
Overview of Machine Learning
Web Mining Research: A Survey
Filtering and Recommendation
Presentation transcript:

Evidence from Behavior LBSC 796/CMSC 828o Douglas W. Oard Session 5, February 23, 2004

Agenda Questions Observable Behavior Information filtering

Some Observable Behaviors

Behavior Category

Minimum Scope

Some Examples Read/Ignored, Saved/Deleted, Replied to (Stevens, 1993) Reading time (Morita & Shinoda, 1994; Konstan et al., 1997) Hypertext Link (Brin & Page, 1998)

Estimating Authority from Links Authority Hub

Collecting Click Streams Browsing histories are easily captured –Make all links initially point to a central site Encode the desired URL as a parameter –Build a time-annotated transition graph for each user Cookies identify users (when they use the same machine) –Redirect the browser to the desired page Reading time is correlated with interest –Can be used to build individual profiles –Used to target advertising by doubleclick.com

No Interest Low Interest Moderate Interest High Interest Rating Reading Time (seconds) Full Text Articles (Telecommunications)

More Complete Observations User selects an article –Interpretation: Summary was interesting User quickly prints the article –Interpretation: They want to read it User selects a second article –Interpretation: another interesting summary User scrolls around in the article –Interpretation: Parts with high dwell time and/or repeated revisits are interesting User stops scrolling for an extended period –Interpretation: User was interrupted

No Interest No Interest Low Interest Moderate Interest High Interest Abstracts (Pharmaceuticals)

Information Access Problems Collection Information Need Stable Different Each Time Retrieval Filtering Different Each Time

Information Filtering User Profile Matching New Documents Recommendation Rating

Information Filtering An abstract problem in which: –The information need is stable Characterized by a “profile” –A stream of documents is arriving Each must either be presented to the user or not Introduced by Luhn in 1958 –As “Selective Dissemination of Information” Named “Filtering” by Denning in 1975

A Simple Filtering Strategy Use any information retrieval system –Boolean, vector space, probabilistic, … Have the user specify a “standing query” –This will be the profile Limit the standing query by date –Each use, show what arrived since the last use

Social Filtering Exploit ratings from other users as features –Like personal recommendations, peer review, … Reaches beyond topicality to: –Accuracy, coherence, depth, novelty, style, … Applies equally well to other modalities –Movies, recorded music, … Sometimes called “collaborative” filtering

Rating-Based Recommendation Use ratings as to describe objects –Personal recommendations, peer review, … Beyond topicality: –Accuracy, coherence, depth, novelty, style, … Has been applied to many modalities –Books, Usenet news, movies, music, jokes, beer, …

Using Positive Information Source: Jon Herlocker, SIGIR 1999

Using Negative Information Source: Jon Herlocker, SIGIR 1999

The Cold Start Problem Social filtering will not work in isolation –Without ratings, we get no recommendations –Without recommendations, we read nothing –Without reading, we get no ratings An initial recommendation strategy is needed –Stereotypes –Content-based search The need for both leads to hybrid strategies

Some Things We (Sort of) Know Treating each genre separately can be useful –Separate predictions for separate tastes Negative information can be useful –“I hate everything my parents like” People like to know who provided ratings Popularity provides a useful fallback People don’t like to provide ratings –Few experiments have achieved sufficient scale

Challenges Any form of sharing necessarily incurs: –Distribution costs –Privacy concerns –Competitive concerns Requiring explicit ratings also: –Increases the cognitive load on users –Can adversely affect ease-of-use

Motivations to Provide Ratings Self-interest –Use the ratings to improve system’s user model Economic benefit –If a market for ratings is created Altruism

The Problem With Self-Interest Number of Ratings Value of ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

Solving the Cost vs. Value Problem Maximize the value –Provide for continuous user model adaptation Minimize the costs –Use implicit feedback rather than explicit ratings –Minimize privacy concerns through encryption –Build an efficient scalable architecture –Limit the scope to noncompetitive activities

Solution: Reduce the Marginal Cost Number of Ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

Implicit Feedback Observe user behavior to infer a set of ratings –Examine (reading time, scrolling behavior, …) –Retain (bookmark, save, save & annotate, print, …) –Refer to (reply, forward, include link, cut & paste, …) Some measurements are directly useful –e.g., use reading time to predict reading time Others require some inference –Should you treat cut & paste as an endorsement?

Recommending w/Implicit Feedback Estimate Rating User Model Ratings Server User Ratings Community Ratings Predicted Ratings User Observations User Ratings User Model Estimate Ratings Observations Server Predicted Observations Community Observations Predicted Ratings User Observations

Beyond Information Filtering Citation indexing –Exploits reference behavior Search for people based on their behavior –Discovery of potential collaborators Collaborative data mining in large collections –Discoveries migrate to people with similar interests

Relevance Feedback Make Profile Vector Compute Similarity Select and Examine (user) Assign Ratings (user) Update User Model New Documents Vector Documents, Vectors, Rank Order Document, Vector Rating, Vector Vector(s) Make Document Vectors Initial Profile Terms Vectors

Rocchio Formula Original profile Positive Feedback Negative feedback (+) (-) New profile

Supervised Learning Given a set of vectors with associated values –e.g., term vectors with relevance judgments Predict the values associated with new vectors –i.e., learn a mapping from vectors to values All learning systems share two problems –They need some basis for making predictions This is called an “inductive bias” –They must balance adaptation with generalization

Machine Learning Approaches Hill climbing (Rocchio) Instance-based learning Rule induction Regression Neural networks Genetic algorithms Statistical classification

Statistical Classification Represent relevant docs as one random vector –And nonrelevant docs as another Build a statistical model for each distribution –e.g., model each with mean and covariance Find the surface separating the distributions –e.g., a hyperplane for linear discriminant analysis Rank documents by distance from that surface –Possibly based on the shape of the distributions

Rule Induction Automatically derived Boolean profiles –(Hopefully) effective and easily explained Specificity from the “perfect query” –AND terms in a document, OR the documents Generality from a bias favoring short profiles –e.g., penalize rules with more Boolean operators –Balanced by rewards for precision, recall, …

Training Strategies Overtraining can hurt performance –Performance on training data rises and plateaus –Performance on new data rises, then falls One strategy is to learn less each time –But it is hard to guess the right learning rate Splitting the training set is a useful alternative –Part provides the content for training –Part for assessing performance on unseen data

Critical Issues Protecting privacy –What absolute assurances can we provide? –How can we make remaining risks understood? Scalable rating servers –Is a fully distributed architecture practical? Non-cooperative users –How can the effect of spamming be limited?

Gaining Access to Observations Observe public behavior –Hypertext linking, publication, citing, … Policy protection –EU: Privacy laws –US: Privacy policies + FTC enforcement Architectural assurance of privacy –Distributed architecture –Model and mitigate privacy risks

A More Secure Data Flow Item Behavior Feature Recommendation Recommendations IxR Personal Features IxF Behaviors IxB Community Features IxF

Low Entropy Attack Community Features IxF Side information IxB For user U adversary Solution space –Read access to IxF requires minimum number of unique contributors. Cryptographic data structure support Controlled mixing.

Matrix Difference Attack Community Features (IxF) adversary User U Community Features (IxF)’ Matrix Difference (IxF) - (IxF)’ IxB For user U Solution space –Users can’t control “next hop” –Routing can hide real source and destination

Identity Integrity Attack Community Features (IxF) User U Community Features (IxF)’ Matrix Difference (IxF) - (IxF)’ IxB For user U adversary Solution space –Registrar service Blinded Credentials Attribute Membership Credentials

One Minute Paper What do you think is the most significant factor that limits the utility of recommender systems? What was the muddiest point in today’s lecture?