Download presentation
Presentation is loading. Please wait.
1
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004
2
2 Background Amanda Spink Self-described areas of work: Information Retrieval Web Retrieval Human Information Behavior / Information Seeking Medical Informatics Ph.D. 1993 – Rutgers University Thesis - Feedback in Information Retrieval Studied under Tefko Saracevic
3
3 Background Amanda Spink Over 140 papers published 5 th in journal article production, 18 th in citation production among U.S. IS faculty Institute for Information Science – most highly cited paper in Web Retrieval: Real Life, Real Users, Real needs: A Study and Analysis of User Queries on the Web (2000)
4
4 Background Amanda Spink Associate Professor at University of Pittsburgh School of Information Sciences Prior faculty positions Pennsylvania State University School of Information Science & Technology Web Research Group University of North Texas School of Library and Information Sciences
5
5 Background Tefko Saracevic Associate Dean School of Communication, Information and Library Studies, Rutgers University Related research Test and Evaluation of IR systems Relevance in Information Science Analysis of web queries
6
6 Web Searching and Retrieval Analyze user queries Important for building future IR systems on Web Focus on search terms Failure analysis in query construction Term Relevance Feedback (TRF) Topics / Classification Use of language
7
7 Studies Conducted U.S. – Excite (www.excite.com) “51K study” 51,473 queries 18,113 users March 9, 1997 “1M study” 1,025,910 queries 211,063 users September 16, 1997
8
8 Studies Conducted European - AllTheWeb.com 1 million queries 200,000 users Logs from two days: February 6, 2001 May 28, 2002 Most users from Norway and Germany
9
9 Studies Conducted Issues with Web transaction logs Where does session start and end? Temporal boundary – Spink found 15 mins avg, Others found 5mins, 12mins, 32mins, and 2 hours Numerical boundary – 100 entries How to eliminate non-individual users Meta-search engines, other agents No user insight into user’s process
10
10 Findings Relevance Feedback Advanced Search Techniques Term Characteristics Query Classification American vs. European
11
11 Findings: Relevance Feedback Term Relevance Feedback (TRF) rarely used 51K study 1,597 queries from 823 users (<5% of queries) Those using TRF had longer sessions Successful 60% of time Implications: Failure rate of 40% may be too high IR designers could automatically perform TRF
12
12 Findings: Relevance Feedback Mediated searching 11% of search terms come from TRF 37% from users, 63% from mediators 2/3 of TRF contributed positively
13
13 Findings: Relevance Feedback Identified 6 session states Initial Query, Modified Query, Next Page, New Query, Relevance Feedback, Prev Query Identified 4 session patterns Using the 6 session states Implication: IR designers should accommodate these states and patterns
14
14 Findings: Relevance Feedback Relevance Feedback Session Patterns
15
15 Findings: Advanced Search Techniques Includes: Boolean operators Modifiers +, - Quotes (phrases) Not often used by Web users, but used more by mediated search Boolean <10%, Modifiers 9%, 6% phrases Used incorrectly Boolean: AND:50%, OR:28%, AND NOT:19% Modifiers: 75% of time Phrases: 8% Users and advanced techniques do not get along!
16
16 Findings: Advanced Search Techniques Boolean, most common problems: Not capitalizing AND Confusing ‘AND’ operator with ‘and’ conjunction e.g. Science and Technology Science AND Technology Modifiers, most common problems: Prefix rather than mathematical postix +news +weather rather than news+weather No space required, as is required with Boolean
17
17 Findings: Term Characteristics Terms per query 1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8% Mediated searching: 7-15 terms Distribution of terms not quite Zipf: Top terms account for 10% of all terms Single-use terms account for 9% of all terms Not understood why this occurs
18
18 Findings: Query Classification Classification of queries based on Rutgers’ Web Classification
19
19 Findings: Query Classification What users are looking for is not what is on Web: Distribution of content: 83% Commercial, 6% Educational, 3% Health Example: 10% of searches are for Health Searchers find classifications understandable IR system presentation design
20
20 Findings: American & European Searching Commonalities: Three or fewer terms American: 80%, European 85% Predominantly use English terms Relevance judgments: less than 15 minutes viewing retrieved documents Information seeking sessions short
21
21 Findings: American & European Searching Differences Categories American: Entertainment, Sex, Commerce European: People-places-things, Computers, Commerce American searchers spent more time searching e- commerce sites than European counterparts Did not examine: Use of advanced techniques Relevance feedback First in initial set of studies?
22
22 Findings: Summary Number of query terms is about 2 TRF is not used often Boolean operators and modifiers not used often – difficulty in using them correctly Users do not spend much time making relevancy judgments Term frequency distribution is a few terms used often, many terms used only once
23
23 Findings: Summary Most users had single query only and did not follow up with successive queries Average viewing of 2 pages 50% did not access beyond first page; more than 75% did not go beyond 2 pages
24
24 Implications / Further Research Improve use of advanced search techniques UI changes, Venn Diagrams Improve use of relevance feedback Automatic generation of TRF results Improve classification of results UI changes, result overview Improve understanding of language use Adapt IR designs to language Examine cultural differences TRF, advanced search techniques (same or different)
25
25 Amanda Spink - Web Searching and Retrieval Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.