2003.11.13 - SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Introduction to Information Retrieval
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Information Retrieval Models: Probabilistic Models
K nearest neighbor and Rocchio algorithm
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Modeling Modern Information Retrieval
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
8/28/97Information Organization and Retrieval IR Implementation Issues, Web Crawlers and Web Search Engines University of California, Berkeley School of.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SLIDE 1IS 240 – Fall 2005 Prof. Ray Larson University of California, Berkeley School of Information Management & Systems Introduction to Probabilistic.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Indexing and Representation: The Vector Space Model Document represented by a vector of terms Document represented by a vector of terms Words (or word.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
9/19/2000Information Organization and Retrieval Vector and Probabilistic Ranking Ray Larson & Marti Hearst University of California, Berkeley School of.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
9/21/2000Information Organization and Retrieval Ranking and Relevance Feedback Ray Larson & Marti Hearst University of California, Berkeley School of Information.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
C.Watterscsci64031 Probabilistic Retrieval Model.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
Relevance Feedback Hongning Wang
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Plan for Today’s Lecture(s)
5. Vector Space and Probabilistic Retrieval Models
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 19: Probabilistic IR and Relevance Feedback

SLIDE 2IS 202 – FALL 2003 Lecture Overview Review –Vector Representation –Term Weights –Vector Matching –Clustering Probabilistic Models of IR Relevance Feedback Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 3IS 202 – FALL 2003 Lecture Overview Review –Vector Representation –Term Weights –Vector Matching –Clustering Probabilistic Models of IR Relevance Feedback Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 4IS 202 – FALL 2003 Document Vectors

SLIDE 5IS 202 – FALL 2003 Vector Space Documents and Queries D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D7D7 D8D8 D9D9 D 10 D 11 t2t2 t3t3 t1t1 Boolean term combinations Q is a query – also represented as a vector

SLIDE 6IS 202 – FALL 2003 Documents in Vector Space t1t1 t2t2 t3t3 D1D1 D2D2 D 10 D3D3 D9D9 D4D4 D7D7 D8D8 D5D5 D 11 D6D6

SLIDE 7IS 202 – FALL 2003 Binary Weights Only the presence (1) or absence (0) of a term is included in the vector

SLIDE 8IS 202 – FALL 2003 Raw Term Weights The frequency of occurrence for the term in each document is included in the vector

SLIDE 9IS 202 – FALL 2003 tf*idf weights

SLIDE 10IS 202 – FALL 2003 Inverse Document Frequency IDF provides high values for rare words and low values for common words For a collection of documents (N = 10000)

SLIDE 11IS 202 – FALL 2003 tf*idf Normalization Normalize the term weights (so longer vectors are not unfairly given more weight) –Normalize usually means force all values to fall within a certain range, usually between 0 and 1, inclusive

SLIDE 12IS 202 – FALL 2003 Vector Space Similarity Now, the similarity of two documents is: This is also called the cosine, or normalized inner product –The normalization was done when weighting the terms –Note that the w ik weights can be stored in the vectors/ inverted files for the documents

SLIDE 13IS 202 – FALL 2003 Vector Space Matching D2D2 D1D1 Q Term B Term A D i =(d i1,w di1 ;d i2, w di2 ;…;d it, w dit ) Q =(q i1,w qi1 ;q i2, w qi2 ;…;q it, w qit ) Q = (0.4,0.8) D1=(0.8,0.3) D2=(0.2,0.7)

SLIDE 14IS 202 – FALL 2003 Vector Space Visualization

SLIDE 15IS 202 – FALL 2003 Document/Document Matrix

SLIDE 16IS 202 – FALL 2003 Text Clustering Clustering is “The art of finding groups in data.” -- Kaufmann and Rousseau Term 1 Term 2

SLIDE 18IS 202 – FALL 2003 Problems with Vector Space There is no real theoretical basis for the assumption of a term space –it is more for visualization that having any real basis –most similarity measures work about the same regardless of model Terms are not really orthogonal dimensions –Terms are not independent of all other terms Retrieval efficiency vs. indexing and update efficiency for stored pre-calculated weights

SLIDE 19IS 202 – FALL 2003 Lecture Overview Review –Vector Representation –Term Weights –Vector Matching –Clustering Probabilistic Models of IR Relevance Feedback Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 20IS 202 – FALL 2003 Probabilistic Models Rigorous formal model attempts to predict the probability that a given document will be relevant to a given query Ranks retrieved documents according to this probability of relevance (Probability Ranking Principle) Relies on accurate estimates of probabilities

SLIDE 21IS 202 – FALL 2003 Probability Ranking Principle “If a reference retrieval system’s response to each request is a ranking of the documents in the collections in the order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data has been made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data.” Stephen E. Robertson, J. Documentation 1977

SLIDE 22IS 202 – FALL 2003 Model 1 – Maron and Kuhns Concerned with estimating probabilities of relevance at the point of indexing: –If a patron came with a request using term t i, what is the probability that she/he would be satisfied with document D j ?

SLIDE 23IS 202 – FALL 2003 Model 1 A patron submits a query (call it Q) consisting of some specification of her/his information need. Different patrons submitting the same stated query may differ as to whether or not they judge a specific document to be relevant. The function of the retrieval system is to compute for each individual document the probability that it will be judged relevant by a patron who has submitted query Q. Robertson, Maron & Cooper, 1982

SLIDE 24IS 202 – FALL 2003 Model 1 – Bayes A is the class of events of using the library D i is the class of events of Document i being judged relevant I j is the class of queries consisting of the single term I j P(D i |A,I j ) = probability that if a query is submitted to the system then a relevant document is retrieved

SLIDE 25IS 202 – FALL 2003 Model 2 Documents have many different properties; some documents have all the properties that the patron asked for, and other documents have only some or none of the properties. If the inquiring patron were to examine all of the documents in the collection she/he might find that some having all the sought after properties were relevant, but others (with the same properties) were not relevant. And conversely, he/she might find that some of the documents having none (or only a few) of the sought after properties were relevant, others not. The function of a document retrieval system is to compute the probability that a document is relevant, given that it has one (or a set) of specified properties. Robertson, Maron & Cooper, 1982

SLIDE 26IS 202 – FALL 2003 Model 2 – Robertson & Sparck Jones Document Relevance Document Indexing Given a term t and a query q r n-r n - R-r N-n-R+r N-n R N-R N

SLIDE 27IS 202 – FALL 2003 Robertson-Sparck Jones Weights Retrospective formulation

SLIDE 28IS 202 – FALL 2003 Robertson-Sparck Jones Weights Predictive formulation

SLIDE 29IS 202 – FALL 2003 Probabilistic Models: Some Unifying Notation D = All present and future documents Q = All present and future queries (D i,Q j ) = A document query pair x = class of similar documents, y = class of similar queries, Relevance (R) is a relation:

SLIDE 30IS 202 – FALL 2003 Probabilistic Models Model 1 -- Probabilistic Indexing, P(R|y,D i ) Model 2 -- Probabilistic Querying, P(R|Q j,x) Model 3 -- Merged Model, P(R| Q j, D i ) Model 0 -- P(R|y,x) Probabilities are estimated based on prior usage or relevance estimation

SLIDE 31IS 202 – FALL 2003 Probabilistic Models Q D x y DiDi QjQj

SLIDE 32IS 202 – FALL 2003 Logistic Regression Another approach to estimating probability of relevance Based on work by William Cooper, Fred Gey and Daniel Dabney Builds a regression model for relevance prediction based on a set of training data Uses less restrictive independence assumptions than Model 2 –Linked Dependence

SLIDE 33IS 202 – FALL 2003 So What’s Regression? A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion The most common type of regression is linear regression

SLIDE 34IS 202 – FALL 2003 What’s Regression? Least Squares Fitting is a mathematical procedure for finding the best fitting curve to a given set of points by minimizing the sum of the squares of the offsets ("the residuals") of the points from the curve The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity

SLIDE 35IS 202 – FALL 2003 Logistic Regression Term Frequency in Document Relevance

SLIDE 36IS 202 – FALL 2003 Probabilistic Models: Logistic Regression Estimates for relevance based on log- linear model with various statistical measures of document content as independent variables Log odds of relevance is a linear function of attributes: Term contributions summed: Probability of Relevance is inverse of log odds:

SLIDE 37IS 202 – FALL 2003 Logistic Regression Attributes Average Absolute Query Frequency Query Length Average Absolute Document Frequency Document Length Average Inverse Document Frequency Inverse Document Frequency Number of Terms in common between query and document -- logged

SLIDE 38IS 202 – FALL 2003 Logistic Regression Probability of relevance is based on Logistic regression from a sample set of documents to determine values of the coefficients At retrieval the probability estimate is obtained by: For the 6 X attribute measures shown previously

SLIDE 39IS 202 – FALL 2003 Probabilistic Models Strong theoretical basis In principle should supply the best predictions of relevance given available information Can be implemented similarly to Vector Relevance information is required -- or is “guestimated” Important indicators of relevance may not be term -- though terms only are usually used Optimally requires on- going collection of relevance information AdvantagesDisadvantages

SLIDE 40IS 202 – FALL 2003 Vector and Probabilistic Models Support “natural language” queries Treat documents and queries the same Support relevance feedback searching Support ranked retrieval Differ primarily in theoretical basis and in how the ranking is calculated –Vector assumes relevance –Probabilistic relies on relevance judgments or estimates

SLIDE 41IS 202 – FALL 2003 Current Use of Probabilistic Models Virtually all the major systems in TREC now use the “Okapi BM25 formula” which incorporates the Robertson-Sparck Jones weights…

SLIDE 42IS 202 – FALL 2003 Okapi BM25 Where: Q is a query containing terms T K is k 1 ((1-b) + b.dl/avdl) k 1, b and k 3 are parameters, usually 1.2, 0.75 and tf is the frequency of the term in a specific document qtf is the frequency of the term in a topic from which Q was derived dl and avdl are the document length and the average document length measured in some convenient unit w (1) is the Robertson-Sparck Jones weight

SLIDE 43IS 202 – FALL 2003 Language Models A recent addition to the probabilistic models is “language modeling” that estimates the probability that a query could have been produced by a given document. This is a slight variation on the other probabilistic models that has led to some modest improvements in performance

SLIDE 44IS 202 – FALL 2003 Logistic Regression and Cheshire II The Cheshire II system (see readings) uses Logistic Regression equations estimated from TREC full-text data Used for a number of production level systems here and in the U.K.

SLIDE 45IS 202 – FALL 2003 Lecture Overview Review –Vector Representation –Term Weights –Vector Matching –Clustering Probabilistic Models of IR Relevance Feedback Credit for some of the slides in this lecture goes to Marti Hearst

SLIDE 46IS 202 – FALL 2003 Querying in IR System Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System

SLIDE 47IS 202 – FALL 2003 Relevance Feedback in an IR System Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Selected relevant docs

SLIDE 48IS 202 – FALL 2003 Query Modification Problem: How to reformulate the query? –Thesaurus expansion: Suggest terms similar to query terms –Relevance feedback: Suggest terms (and documents) similar to retrieved documents that have been judged to be relevant

SLIDE 49IS 202 – FALL 2003 Relevance Feedback Main Idea: –Modify existing query based on relevance judgements Extract terms from relevant documents and add them to the query And/or re-weight the terms already in the query –Two main approaches: Automatic (pseudo-relevance feedback) Users select relevant documents –Users/system select terms from an automatically-generated list

SLIDE 50IS 202 – FALL 2003 Relevance Feedback Usually do both: –Expand query with new terms –Re-weight terms in query There are many variations –Usually positive weights for terms from relevant docs –Sometimes negative weights for terms from non-relevant docs –Remove terms ONLY in non-relevant documents

SLIDE 51IS 202 – FALL 2003 Rocchio Method

SLIDE 52IS 202 – FALL 2003 Rocchio/Vector Illustration Retrieval Information D1D1 D2D2 Q0Q0 Q’ Q” Q 0 = retrieval of information = (0.7,0.3) D 1 = information science = (0.2,0.8) D 2 = retrieval systems = (0.9,0.1) Q’ = ½*Q 0 + ½ * D 1 = (0.45,0.55) Q” = ½*Q 0 + ½ * D 2 = (0.80,0.20)

SLIDE 53IS 202 – FALL 2003 Example Rocchio Calculation Relevant docs Non-rel doc Original Query Constants Rocchio Calculation Resulting feedback query

SLIDE 54IS 202 – FALL 2003 Rocchio Method Rocchio automatically –Re-weights terms –Adds in new terms (from relevant docs) Have to be careful when using negative terms Rocchio is not a machine learning algorithm Most methods perform similarly –Results heavily dependent on test collection Machine learning methods are proving to work better than standard IR approaches like Rocchio

SLIDE 55IS 202 – FALL 2003 Probabilistic Relevance Feedback Document Relevance Document Indexing Given a query term t r n-r n - R-r N-n-R+r N-n R N-R N Where N is the number of documents seen

SLIDE 56IS 202 – FALL 2003 Robertson-Sparck Jones Weights Retrospective formulation

SLIDE 57IS 202 – FALL 2003 Using Relevance Feedback Known to improve results –In TREC-like conditions (no user involved) What about with a user in the loop? –How might you measure this?

SLIDE 58IS 202 – FALL 2003 Relevance Feedback Summary Iterative query modification can improve precision and recall for a standing query In at least one study, users were able to make good choices by seeing which terms were suggested for R.F. and selecting among them

SLIDE 59IS 202 – FALL 2003 Alternative Notions of Relevance Feedback Find people whose taste is “similar” to yours –Will you like what they like? Follow a users’ actions in the background –Can this be used to predict what the user will want to see next? Track what lots of people are doing –Does this implicitly indicate what they think is good and not good?

SLIDE 60IS 202 – FALL 2003 Alternative Notions of Relevance Feedback Several different criteria to consider: –Implicit vs. Explicit judgements –Individual vs. Group judgements –Standing vs. Dynamic topics –Similarity of the items being judged vs. similarity of the judges themselves

SLIDE 61 Collaborative Filtering (Social Filtering) If Pam liked the paper, I’ll like the paper If you liked Star Wars, you’ll like Independence Day Rating based on ratings of similar people –Ignores the text, so works on text, sound, pictures, etc. –But: Initial users can bias ratings of future users

SLIDE 62 Ringo Collaborative Filtering Users rate musical artists from like to dislike –1 = detest 7 = can’t live without 4 = ambivalent –There is a normal distribution around 4 –However, what matters are the extremes Nearest Neighbors Strategy: Find similar users and predicted (weighted) average of user ratings Pearson r algorithm: weight by degree of correlation between user U and user J –1 means very similar, 0 means no correlation, -1 dissimilar –Works better to compare against the ambivalent rating (4), rather than the individual’s average score

SLIDE 63IS 202 – FALL 2003 Social Filtering Ignores the content, only looks at who judges things similarly Works well on data relating to “taste” –something that people are good at predicting about each other too Does it work for topic? –GroupLens results suggest otherwise (preliminary) –Perhaps for quality assessments –What about for assessing if a document is about a topic?

SLIDE 64 Learning Interface Agents Add agents in the UI, delegate tasks to them Use machine learning to improve performance –Learn user behavior, preferences Useful when: –1) Past behavior is a useful predictor of the future –2) Wide variety of behaviors amongst users Examples: –Mail clerk: Sort incoming messages in right mailboxes –Calendar manager: Automatically schedule meeting times?

SLIDE 65IS 202 – FALL 2003 Summary Relevance feedback is an effective means for user-directed query modification Modification can be done with either direct or indirect user input Modification can be done based on an individual’s or a group’s past input

SLIDE 66IS 202 – FALL 2003 Next Time Information Retrieval Evaluation & more on collaborative filtering Readings –An Evaluation of Retrieval Effectiveness (Blair & Maron); Carolyn –Rave Reviews: Acquiring Relevance Assessments from Multiple Users (Belew, Richard); Megan –A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness (Koeneman & Belkin); margaret Spring –GroupLens: Applying Collaborative Filtering to Usenet News (Konstan, Joseph et. Al.); Jeff –Social Information Filtering: Algorithms for Automating "Word of Mouth" (Shardanand, Upendra and Maes, Pattie) Rebecca