Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007.

Similar presentations


Presentation on theme: "Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007."— Presentation transcript:

1 Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007

2 Picture of Relevance Feedback x x x x o o o Revised query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x  x x Initial query  x

3 Rocchio Formula q m = modified query vector; q 0 = original query vector; α,β,γ: weights (hand-chosen or set empirically); D r = set of known relevant doc vectors; D nr = set of known irrelevant doc vectors

4 Rocchio Example 040800 124001 201104 6370-3 040800 248002 8044016 Original query Positive Feedback Negative feedback (+) (-) New query Typically,  < 

5 Using Positive Information Source: Jon Herlocker, SIGIR 1999

6 Using Negative Information Source: Jon Herlocker, SIGIR 1999

7 Rating-Based Recommendation Use ratings as to describe objects –Personal recommendations, peer review, … Beyond topicality: –Accuracy, coherence, depth, novelty, style, … Has been applied to many modalities –Books, Usenet news, movies, music, jokes, beer, …

8 Motivations to Provide Ratings Self-interest –Use the ratings to improve system’s user model Economic benefit –If a market for ratings is created Altruism

9 Explicit Feedback: Assumptions A1: User has sufficient knowledge for a reasonable initial query A2: Selected examples are representative A3: The user will try giving feedback A4: The user will keep giving feedback

10 A1: Good Initial Query? Two problems: –User may not have sufficient initial knowledge –Few or no relevant documents may be retrieved Examples: –Misspellings (Brittany Speers) –Cross-language information retrieval –Vocabulary mismatch (e.g., cosmonaut/astronaut)

11 A2: Representative Examples? There may be several clusters of relevant documents Examples: –Burma/Myanmar –Contradictory government policies –Opinions

12 A3: Will People Use It? Efficiency –Longer queries require more processing time Understandability –Harder to see why subsequent documents retrieved Risk –Users are reluctant to provide negative feedback

13 A4: Self-Interest Decreases! Number of Ratings Value of ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

14 Solving the Cost vs. Value Problem Maximize the value –Provide for continuous user model adaptation Minimize the costs –Use implicit feedback rather than explicit ratings –Minimize privacy concerns through encryption –Build an efficient scalable architecture –Limit the scope to noncompetitive activities

15 Solution: Reduce the Marginal Cost Number of Ratings Marginal value to rater Marginal value to community Few Lots Marginal cost

16 View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Type Edit

17 View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

18 Minimum Scope SegmentObjectClass View Listen Select Print Bookmark Save Purchase Delete Subscribe Copy / paste Quote Forward Reply Link Cite Mark up Tag Publish Organize Behavior Category Examine Retain Reference Annotate Create Type Edit

19 Estimating Authority from Links Authority Hub

20 Anchor Text Indexing Sometimes yields unexpected results: –Google ranked Microsoft #1 for “evil empire” www.ibm.com Armonk, NY-based computer giant IBM announced today Joe’s computer hardware links Compaq HP IBM Big Blue today announced record profits for the quarter Adapted from: Manning and Raghavan

21 Collecting Click Streams Browsing histories are easily captured –Make all links initially point to a central site Encode the desired URL as a parameter –Build a time-annotated transition graph for each user Cookies identify users (when they use the same machine) –Redirect the browser to the desired page Reading time is correlated with interest –Can be used to build individual profiles –Used to target advertising by doubleclick.com

22 0 20 40 60 80 100 120 140 160 180 No Interest Low Interest Moderate Interest High Interest Rating Reading Time (seconds) Full Text Articles (Telecommunications) 50 32 58 43

23 More Complete Observations User selects an article –Interpretation: Summary was interesting User quickly prints the article –Interpretation: They want to read it User selects a second article –Interpretation: another interesting summary User scrolls around in the article –Interpretation: Parts with high dwell time and/or repeated revisits are interesting User stops scrolling for an extended period –Interpretation: User was interrupted

24 No Interest No Interest Low Interest Moderate Interest High Interest Abstracts (Pharmaceuticals) 42 55 52 51

25 Recommending w/Implicit Feedback Estimate Rating User Model Ratings Server User Ratings Community Ratings Predicted Ratings User Observations User Ratings User Model Estimate Ratings Observations Server Predicted Observations Community Observations Predicted Ratings User Observations

26 Critical Issues Protecting privacy –What absolute assurances can we provide? –How can we make remaining risks understood? Scalable rating servers –Is a fully distributed architecture practical? Non-cooperative users –How can the effect of spamming be limited?

27 Gaining Access to Observations Observe public behavior –Hypertext linking, publication, citing, … Policy protection –EU: Privacy laws –US: Privacy policies + FTC enforcement Statistical assurance of privacy –Distributed architecture –Model and mitigate privacy risks

28 Search Engine Query Logs A: Southeast Asia (Dec 27, 2004) B: Indonesia (Mar 29, 2005) C; Pakistan (Oct 10, 2005) D; Hawaii (Oct 16, 2006) E: Indonesia (Aug 8, 2007) F: Peru (Aug 16, 2007)

29 Search Engine Query Logs http://hannu.biz/aolsearch/

30 AOL User 4417749


Download ppt "Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007."

Similar presentations


Ads by Google