Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.

Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004

2 Background Amanda Spink  Self-described areas of work: Information Retrieval Web Retrieval Human Information Behavior / Information Seeking Medical Informatics  Ph.D. 1993 – Rutgers University Thesis - Feedback in Information Retrieval Studied under Tefko Saracevic

3 Background Amanda Spink  Over 140 papers published 5 th in journal article production, 18 th in citation production among U.S. IS faculty Institute for Information Science – most highly cited paper in Web Retrieval:  Real Life, Real Users, Real needs: A Study and Analysis of User Queries on the Web (2000)

4 Background Amanda Spink  Associate Professor at University of Pittsburgh  School of Information Sciences  Prior faculty positions Pennsylvania State University  School of Information Science & Technology  Web Research Group University of North Texas  School of Library and Information Sciences

5 Background Tefko Saracevic  Associate Dean School of Communication, Information and Library Studies, Rutgers University  Related research Test and Evaluation of IR systems Relevance in Information Science Analysis of web queries

6 Web Searching and Retrieval Analyze user queries  Important for building future IR systems on Web  Focus on search terms Failure analysis in query construction Term Relevance Feedback (TRF) Topics / Classification Use of language

7 Studies Conducted U.S. – Excite (www.excite.com)  “51K study” 51,473 queries 18,113 users March 9, 1997  “1M study” 1,025,910 queries 211,063 users September 16, 1997

8 Studies Conducted European - AllTheWeb.com  1 million queries  200,000 users  Logs from two days: February 6, 2001 May 28, 2002  Most users from Norway and Germany

9 Studies Conducted Issues with Web transaction logs  Where does session start and end? Temporal boundary – Spink found 15 mins avg,  Others found 5mins, 12mins, 32mins, and 2 hours Numerical boundary – 100 entries  How to eliminate non-individual users Meta-search engines, other agents  No user insight into user’s process

10 Findings Relevance Feedback Advanced Search Techniques Term Characteristics Query Classification American vs. European

11 Findings: Relevance Feedback Term Relevance Feedback (TRF) rarely used  51K study 1,597 queries from 823 users (<5% of queries) Those using TRF had longer sessions Successful 60% of time Implications:  Failure rate of 40% may be too high  IR designers could automatically perform TRF

12 Findings: Relevance Feedback Mediated searching  11% of search terms come from TRF  37% from users, 63% from mediators  2/3 of TRF contributed positively

13 Findings: Relevance Feedback Identified 6 session states  Initial Query, Modified Query, Next Page,  New Query, Relevance Feedback, Prev Query Identified 4 session patterns  Using the 6 session states Implication: IR designers should accommodate these states and patterns

14 Findings: Relevance Feedback Relevance Feedback Session Patterns

15 Findings: Advanced Search Techniques Includes:  Boolean operators  Modifiers +, -  Quotes (phrases) Not often used by Web users, but used more by mediated search  Boolean <10%, Modifiers 9%, 6% phrases Used incorrectly  Boolean: AND:50%, OR:28%, AND NOT:19%  Modifiers: 75% of time  Phrases: 8% Users and advanced techniques do not get along!

16 Findings: Advanced Search Techniques Boolean, most common problems:  Not capitalizing AND  Confusing ‘AND’ operator with ‘and’ conjunction e.g. Science and Technology Science AND Technology Modifiers, most common problems:  Prefix rather than mathematical postix +news +weather rather than news+weather  No space required, as is required with Boolean

17 Findings: Term Characteristics Terms per query  1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8%  Mediated searching: 7-15 terms Distribution of terms not quite Zipf:  Top terms account for 10% of all terms  Single-use terms account for 9% of all terms  Not understood why this occurs

18 Findings: Query Classification Classification of queries based on Rutgers’ Web Classification

19 Findings: Query Classification What users are looking for is not what is on Web:  Distribution of content: 83% Commercial, 6% Educational, 3% Health  Example: 10% of searches are for Health Searchers find classifications understandable  IR system presentation design

20 Findings: American & European Searching Commonalities:  Three or fewer terms American: 80%, European 85%  Predominantly use English terms  Relevance judgments: less than 15 minutes viewing retrieved documents  Information seeking sessions short

21 Findings: American & European Searching Differences  Categories American: Entertainment, Sex, Commerce European: People-places-things, Computers, Commerce  American searchers spent more time searching e- commerce sites than European counterparts Did not examine:  Use of advanced techniques  Relevance feedback  First in initial set of studies?

22 Findings: Summary Number of query terms is about 2 TRF is not used often Boolean operators and modifiers not used often – difficulty in using them correctly Users do not spend much time making relevancy judgments Term frequency distribution is a few terms used often, many terms used only once

23 Findings: Summary Most users had single query only and did not follow up with successive queries Average viewing of 2 pages 50% did not access beyond first page; more than 75% did not go beyond 2 pages

24 Implications / Further Research Improve use of advanced search techniques UI changes, Venn Diagrams Improve use of relevance feedback Automatic generation of TRF results Improve classification of results UI changes, result overview Improve understanding of language use Adapt IR designs to language Examine cultural differences TRF, advanced search techniques (same or different)

25 Amanda Spink - Web Searching and Retrieval Questions

Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.

Similar presentations

Presentation on theme: "Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.

Similar presentations

Presentation on theme: "Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004."— Presentation transcript:

Similar presentations

About project

Feedback