Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.

Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007

Picture of Relevance Feedback x x x x o o o Revised query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x  x x Initial query  x

Rocchio Formula q m = modified query vector; q 0 = original query vector; α,β,γ: weights (hand-chosen or set empirically); D r = set of known relevant doc vectors; D nr = set of known irrelevant doc vectors

Rocchio Example 040800 124001 201104 6370-3 040800 248002 8044016 Original query Positive Feedback Negative feedback (+) (-) New query Typically,  < 

Using Positive Information Source: Jon Herlocker, SIGIR 1999

Using Negative Information Source: Jon Herlocker, SIGIR 1999

Rating-Based Recommendation Use ratings as to describe objects –Personal recommendations, peer review, … Beyond topicality: –Accuracy, coherence, depth, novelty, style, … Has been applied to many modalities –Books, Usenet news, movies, music, jokes, beer, …

Motivations to Provide Ratings Self-interest –Use the ratings to improve system’s user model Economic benefit –If a market for ratings is created Altruism

Explicit Feedback: Assumptions A1: User has sufficient knowledge for a reasonable initial query A2: Selected examples are representative A3: The user will try giving feedback A4: The user will keep giving feedback

A1: Good Initial Query? Two problems: –User may not have sufficient initial knowledge –Few or no relevant documents may be retrieved Examples: –Misspellings (Brittany Speers) –Cross-language information retrieval –Vocabulary mismatch (e.g., cosmonaut/astronaut)

A2: Representative Examples? There may be several clusters of relevant documents Examples: –Burma/Myanmar –Contradictory government policies –Opinions

A3: Will People Use It? Efficiency –Longer queries require more processing time Understandability –Harder to see why subsequent documents retrieved Risk –Users are reluctant to provide negative feedback

A4: Self-Interest Decreases! Number of Ratings Value of ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

Solving the Cost vs. Value Problem Maximize the value –Provide for continuous user model adaptation Minimize the costs –Use implicit feedback rather than explicit ratings –Minimize privacy concerns through encryption –Build an efficient scalable architecture –Limit the scope to noncompetitive activities

Solution: Reduce the Marginal Cost Number of Ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Type Edit

View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

Minimum Scope SegmentObjectClass View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

Estimating Authority from Links Authority Hub

Anchor Text Indexing Sometimes yields unexpected results: –Google ranked Microsoft #1 for “evil empire” www.ibm.com Armonk, NY-based computer giant IBM announced today Joe’s computer hardware links Compaq HP IBM Big Blue today announced record profits for the quarter Adapted from: Manning and Raghavan

Collecting Click Streams Browsing histories are easily captured –Make all links initially point to a central site Encode the desired URL as a parameter –Build a time-annotated transition graph for each user Cookies identify users (when they use the same machine) –Redirect the browser to the desired page Reading time is correlated with interest –Can be used to build individual profiles –Used to target advertising by doubleclick.com

0 20 40 60 80 100 120 140 160 180 No Interest Low Interest Moderate Interest High Interest Rating Reading Time (seconds) Full Text Articles (Telecommunications) 50 32 58 43

More Complete Observations User selects an article –Interpretation: Summary was interesting User quickly prints the article –Interpretation: They want to read it User selects a second article –Interpretation: another interesting summary User scrolls around in the article –Interpretation: Parts with high dwell time and/or repeated revisits are interesting User stops scrolling for an extended period –Interpretation: User was interrupted

No Interest No Interest Low Interest Moderate Interest High Interest Abstracts (Pharmaceuticals) 42 55 52 51

Recommending w/Implicit Feedback Estimate Rating User Model Ratings Server User Ratings Community Ratings Predicted Ratings User Observations User Ratings User Model Estimate Ratings Observations Server Predicted Observations Community Observations Predicted Ratings User Observations

Critical Issues Protecting privacy –What absolute assurances can we provide? –How can we make remaining risks understood? Scalable rating servers –Is a fully distributed architecture practical? Non-cooperative users –How can the effect of spamming be limited?

Gaining Access to Observations Observe public behavior –Hypertext linking, publication, citing, … Policy protection –EU: Privacy laws –US: Privacy policies + FTC enforcement Statistical assurance of privacy –Distributed architecture –Model and mitigate privacy risks

Search Engine Query Logs A: Southeast Asia (Dec 27, 2004) B: Indonesia (Mar 29, 2005) C; Pakistan (Oct 10, 2005) D; Hawaii (Oct 16, 2006) E: Indonesia (Aug 8, 2007) F: Peru (Aug 16, 2007)

Search Engine Query Logs http://hannu.biz/aolsearch/

AOL User 4417749

Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.

Similar presentations

Presentation on theme: "Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.

Similar presentations

Presentation on theme: "Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007."— Presentation transcript:

Similar presentations

About project

Feedback