© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram.

Slides:



Advertisements
Similar presentations
Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Advertisements

Chapter 5: Introduction to Information Retrieval
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Database Searching: Education Abstracts/Full Text & Professional Development Collection.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
© Tefko Saracevic, Rutgers University1 Services in digital libraries Following functions? Following new capabilities?
INFO 624 Week 3 Retrieval System Evaluation
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Internet reading Skills: mindful reading IT concepts: characteristics of Internet readers This work is licensed under a Creative Commons Attribution-Noncommercial-
1 Guide to exercise 10 Bibliometric searching on indicators for journals, papers, and institutions Tefko Saracevic.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
© Tefko Saracevic, Rutgers University1 Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Tefko Saracevic, Rutgers University 1 Practice for logical operators Boolean search statements and Venn diagrams.
© Tefko Saracevic, Rutgers University1 digital libraries and human information behavior Tefko Saracevic, Ph.D. School of Communication, Information and.
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Search Tools for the Internet Adapted from: Kathy Schrock M. Rosettis St. Augustine CHS.
Library HITS Library HITS: Helpful Information for Trinity Students/Staff Library eResources for Sciences Michaelmas Term 2013 Trinity College Library.
Effective Internet Searching. Why use the Internet Search for a question Research a topic Current research Variety of sources, a click away What other.
© Tefko Saracevic, Rutgers University1 Mediation in librarianship & information retrieval Reference interview Human-human interaction Question negotiation.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Authors: Maryam Kamvar and Shumeet Baluja Date of Publication: August 2007 Name of Speaker: Venkatasomeswara Pawan Addanki.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
ELibrary Curriculum Edition The ultimate K-12 curriculum & reference resource August 2006.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Library HITS Library HITS: Helpful Information for Trinity Students/Staff Library eResources for SUBJECT Michaelmas Term 2013 Trinity College Library Dublin,
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Chapter 6: Information Retrieval and Web Search
Where do I find it? Created by Connie CampbellConnie Campbell.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Power Searching 501 (?): a crash course The stuff you need to know about searching, but may have forgotten along the way! (And, the stuff I want you to.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
1 Freshman Workshop EBSCO HOST INFORMATION SERVICES: Accessing the Databases.
More on Document Similarity and Clustering How similar are these two documents (Again) ? Are these two documents about the same topic ?
Computer Sharing Centre YouTube 4 December YouTube - Agenda 2  YouTube introduction  How to set up a YouTube (also Google) account  How to find.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Statistical Properties of Text
Internet Power Searching Finding Pearls in a Zillion Grains of Sand By Amelia Kassel Found in “Technical Communication” on page 198.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
Searching the Web for academic information Ruth Stubbings.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Automated Information Retrieval
Exercise: Logical operators
Data Mining Chapter 6 Search Engines
Finding Trends with Visualizations
digital libraries and human information behavior
Information Retrieval and Web Design
Information Retrieval and Web Design
What is a ‘collection’ in digital libraries?
ADVANCED SEARCH ON WESTLAWNEXT
Presentation transcript:

© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram (U. of Wisconsin - Milwaukee) Amanda Spink (Penn State U.) Major Bernard J. Jansen (U.S. Army) Tefko Saracevic (Rutgers U. )

© Tefko Saracevic, Rutgers University 2 A major Internet media company Search capabilities: –Up to 10 terms per query; default OR –Advanced search: Boolean AND, OR, AND NOT & parentheses “phrase” : must appear in answer + or - before term must or must not be in answer –More Like This : clickable relevance feedback –proprietary algortihms & concept linking method, but follow basic information retrieval

© Tefko Saracevic, Rutgers University 3 Samples Three samples: pilot: 51,000 queries by 18,000 users collected in March 1997 (label: 51K) 1 million queries by over 200,000 users collected in September 1997 (1M97) 1 million queries by over 200,000 users collected in December 1999 (1M99)

© Tefko Saracevic, Rutgers University 4 Number of queries per user Sessions (as to no. of queries) are SHORT

© Tefko Saracevic, Rutgers University 5 Terms per query distribution SHORT QUERIES: Some 60% have 1 or 2 terms

© Tefko Saracevic, Rutgers University 6 Use of Boolean operators Many uses of Boolean operators are wrong - not according to instructions how to use them

© Tefko Saracevic, Rutgers University 7 Number of pages viewed per user Most users view VERY FEW pages beyond the first or first two

© Tefko Saracevic, Rutgers University 8 Distribution of terms TOP: a very small number of distinct terms used with very high frequency BOTTOM: unusually high number of distinct terms used with low frequency Web query vocabulary contains a very large number of distinct terms – more than in ordinary English texts –has its own & unique characteristics

© Tefko Saracevic, Rutgers University 9 Term distribution 51K sample Top: frequency of 100 or more: 74 terms –0.34% of all unique terms (of 21,862) –18% of all terms in all queries (of 113,793) Bottom: frequency of one: 9,790 terms –44.8% of all unique terms –8.6% of all terms in all queries In freq. of 100 or more (subject terms only): –63 subject terms: 0.29% of unique terms; 10.3% of all terms

© Tefko Saracevic, Rutgers University 10 Term distribution 1M97 sample

© Tefko Saracevic, Rutgers University 11 Top 15 terms (common excluded)

© Tefko Saracevic, Rutgers University 12 Top 10 co-occurring terms (only meaningful ones)

© Tefko Saracevic, Rutgers University 13 Classification of queries - a sample

© Tefko Saracevic, Rutgers University 14 Major findings ( across all three samples) Users: not many queries per search –2.4 mean Terms: not many per query –2.4 mean –in traditional IR queries 3 to 7 times larger Boolean stuff not used much –used from 1 in 10 to 1 in 5 queries

© Tefko Saracevic, Rutgers University 15 Major findings... Users did not view many pages –mean 1.9 pages - percentage of views falling –1 in 2 or 1 in 3 of users did not go beyond the first page Relevance feedback (More Like This) not used much –used in about 1 in 20 queries Over time searching did NOT change much –use changed mostly in greater use of advances features

© Tefko Saracevic, Rutgers University 16 Major findings... Frequency of use of terms is highly skewed –highest 1/3 of 1% of terms accounted for 1 in every 10 terms used; terms that were used only once were 1/2 of unique terms –Web query language quite unique Lot of searching about sex, but queries in category Sex still represents a small proportion of all categories –great many other topics searched –diversity of subjects searched very high

© Tefko Saracevic, Rutgers University 17 Conclusions Web searching is still IR, but very different IR –Web users search in different & simplified ways Many Web search features need redesign to accommodate the way users use the Web Web is a marvelous new technology –but people are unpredictable in use of any new technology - –how are they really using the Web?

© Tefko Saracevic, Rutgers University 18 Thank you Gracias Danke Merci Hvala … until the next installment...