Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.

Slides:



Advertisements
Similar presentations
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Advertisements

Chapter 5: Introduction to Information Retrieval
Natural Language Processing WEB SEARCH ENGINES August, 2002.
The Literature Review in 3 Key Steps
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
1 Web Research - Large-Scale Web Data Analysis Amanda Spink Queensland University of Technology Jim Jansen The Pennsylvania State University.
Search Engines and Information Retrieval
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
© Tefko Saracevic, Rutgers University adapted for sectoin 21 PRINCIPLES OF SEARCHING 17:610:530 (02) Paul Kantor SCILS, Rm. 307 (732) /Ext
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
INFO 624 Week 3 Retrieval System Evaluation
1 Guide to exercise 10 Bibliometric searching on indicators for journals, papers, and institutions Tefko Saracevic.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
© Tefko Saracevic, Rutgers University1 PRINCIPLES OF SEARCHING 17:610:530 (01) Tefko Saracevic SCILS, Rm. 306 (732) /Ext. 8222
© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram.
An Overview of Relevance Feedback, by Priyesh Sudra 1 An Overview of Relevance Feedback PRIYESH SUDRA.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Use and Usefulness of E-journals: a Case study of Research Scholars Dr. V. Chandrakumar Senior Lecturer Department of Information Science University of.
User Search Behaviors: Insights from Easy Search Transaction Log Analyses William Mischo, Mary Schlembach, Jason Heldreth, Avinash Kumar Grainger Engineering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
User Searching Behaviors (and Interactive Retrieval Techniques) within a Library Gateway William H. Mischo Mary C. Schlembach David S. Vess University.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Uichin Lee, Jihyoung Kim *, Eunhee Yi **, Juyup Sung, Mario Gerla * KAIST Knowledge Service Engineering * UCLA Computer Science ** LG UX R&D Lab
Aardvark Anatomy of a Large-Scale Social Search Engine.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Query Logs – Used everywhere and for everything Sai Vallurupalli.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Subject (Exam) Review WSTA 2015 Trevor Cohn. Exam Structure Worth 50 marks Parts: – A: short answer [14] – B: method questions [18] – C: algorithm questions.
Chapter 6: Information Retrieval and Web Search
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Week 2 The lecture for this week is designed to provide students with a general overview of 1) quantitative/qualitative research strategies and 2) 21st.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Smart Searching Techniques Fall 2006 the Library.
Information Retrieval
Reference Collections: Collection Characteristics.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
2011 © University of Michigan Query Log Analysis of an Electronic Health Record Search Engine 1 Lei Yang 1, Qiaozhu Mei 1,2, Kai Zheng 1,3, David A. Hanauer.
C.Tenopir Using E-Journals To Promote Information Worldwide Carol Tenopir University of Tennessee
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Searching the Web for academic information Ruth Stubbings.
Our Digital Showcase Scholars’ Mine Annual Report from July 2015 – June 2016 Providing global access to the digital, scholarly and cultural resources.
PLANNING AND DESIGNING A RESEARCH STUDY
Information Science in International Perspective
ICT Communications Lesson 2: Searching the Web
Panagiotis G. Ipeirotis Luis Gravano
Citation Searching with Web of Knowledge
Information Retrieval and Web Design
Introduction to Search Engines
Journal of Web Semantics 55 (2019)
ADVANCED SEARCH ON WESTLAWNEXT
Presentation transcript:

Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004

2 Background Amanda Spink  Self-described areas of work: Information Retrieval Web Retrieval Human Information Behavior / Information Seeking Medical Informatics  Ph.D – Rutgers University Thesis - Feedback in Information Retrieval Studied under Tefko Saracevic

3 Background Amanda Spink  Over 140 papers published 5 th in journal article production, 18 th in citation production among U.S. IS faculty Institute for Information Science – most highly cited paper in Web Retrieval:  Real Life, Real Users, Real needs: A Study and Analysis of User Queries on the Web (2000)

4 Background Amanda Spink  Associate Professor at University of Pittsburgh  School of Information Sciences  Prior faculty positions Pennsylvania State University  School of Information Science & Technology  Web Research Group University of North Texas  School of Library and Information Sciences

5 Background Tefko Saracevic  Associate Dean School of Communication, Information and Library Studies, Rutgers University  Related research Test and Evaluation of IR systems Relevance in Information Science Analysis of web queries

6 Web Searching and Retrieval Analyze user queries  Important for building future IR systems on Web  Focus on search terms Failure analysis in query construction Term Relevance Feedback (TRF) Topics / Classification Use of language

7 Studies Conducted U.S. – Excite (  “51K study” 51,473 queries 18,113 users March 9, 1997  “1M study” 1,025,910 queries 211,063 users September 16, 1997

8 Studies Conducted European - AllTheWeb.com  1 million queries  200,000 users  Logs from two days: February 6, 2001 May 28, 2002  Most users from Norway and Germany

9 Studies Conducted Issues with Web transaction logs  Where does session start and end? Temporal boundary – Spink found 15 mins avg,  Others found 5mins, 12mins, 32mins, and 2 hours Numerical boundary – 100 entries  How to eliminate non-individual users Meta-search engines, other agents  No user insight into user’s process

10 Findings Relevance Feedback Advanced Search Techniques Term Characteristics Query Classification American vs. European

11 Findings: Relevance Feedback Term Relevance Feedback (TRF) rarely used  51K study 1,597 queries from 823 users (<5% of queries) Those using TRF had longer sessions Successful 60% of time Implications:  Failure rate of 40% may be too high  IR designers could automatically perform TRF

12 Findings: Relevance Feedback Mediated searching  11% of search terms come from TRF  37% from users, 63% from mediators  2/3 of TRF contributed positively

13 Findings: Relevance Feedback Identified 6 session states  Initial Query, Modified Query, Next Page,  New Query, Relevance Feedback, Prev Query Identified 4 session patterns  Using the 6 session states Implication: IR designers should accommodate these states and patterns

14 Findings: Relevance Feedback Relevance Feedback Session Patterns

15 Findings: Advanced Search Techniques Includes:  Boolean operators  Modifiers +, -  Quotes (phrases) Not often used by Web users, but used more by mediated search  Boolean <10%, Modifiers 9%, 6% phrases Used incorrectly  Boolean: AND:50%, OR:28%, AND NOT:19%  Modifiers: 75% of time  Phrases: 8% Users and advanced techniques do not get along!

16 Findings: Advanced Search Techniques Boolean, most common problems:  Not capitalizing AND  Confusing ‘AND’ operator with ‘and’ conjunction e.g. Science and Technology Science AND Technology Modifiers, most common problems:  Prefix rather than mathematical postix +news +weather rather than news+weather  No space required, as is required with Boolean

17 Findings: Term Characteristics Terms per query  1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8%  Mediated searching: 7-15 terms Distribution of terms not quite Zipf:  Top terms account for 10% of all terms  Single-use terms account for 9% of all terms  Not understood why this occurs

18 Findings: Query Classification Classification of queries based on Rutgers’ Web Classification

19 Findings: Query Classification What users are looking for is not what is on Web:  Distribution of content: 83% Commercial, 6% Educational, 3% Health  Example: 10% of searches are for Health Searchers find classifications understandable  IR system presentation design

20 Findings: American & European Searching Commonalities:  Three or fewer terms American: 80%, European 85%  Predominantly use English terms  Relevance judgments: less than 15 minutes viewing retrieved documents  Information seeking sessions short

21 Findings: American & European Searching Differences  Categories American: Entertainment, Sex, Commerce European: People-places-things, Computers, Commerce  American searchers spent more time searching e- commerce sites than European counterparts Did not examine:  Use of advanced techniques  Relevance feedback  First in initial set of studies?

22 Findings: Summary Number of query terms is about 2 TRF is not used often Boolean operators and modifiers not used often – difficulty in using them correctly Users do not spend much time making relevancy judgments Term frequency distribution is a few terms used often, many terms used only once

23 Findings: Summary Most users had single query only and did not follow up with successive queries Average viewing of 2 pages 50% did not access beyond first page; more than 75% did not go beyond 2 pages

24 Implications / Further Research Improve use of advanced search techniques UI changes, Venn Diagrams Improve use of relevance feedback Automatic generation of TRF results Improve classification of results UI changes, result overview Improve understanding of language use Adapt IR designs to language Examine cultural differences TRF, advanced search techniques (same or different)

25 Amanda Spink - Web Searching and Retrieval Questions