Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram.

Similar presentations


Presentation on theme: "© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram."— Presentation transcript:

1

2 © Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram (U. of Wisconsin - Milwaukee) Amanda Spink (Penn State U.) Major Bernard J. Jansen (U.S. Army) Tefko Saracevic (Rutgers U. )

3 © Tefko Saracevic, Rutgers University 2 Excite@Home A major Internet media company Search capabilities: –Up to 10 terms per query; default OR –Advanced search: Boolean AND, OR, AND NOT & parentheses “phrase” : must appear in answer + or - before term must or must not be in answer –More Like This : clickable relevance feedback –proprietary algortihms & concept linking method, but follow basic information retrieval

4 © Tefko Saracevic, Rutgers University 3 Samples Three samples: pilot: 51,000 queries by 18,000 users collected in March 1997 (label: 51K) 1 million queries by over 200,000 users collected in September 1997 (1M97) 1 million queries by over 200,000 users collected in December 1999 (1M99)

5 © Tefko Saracevic, Rutgers University 4 Number of queries per user Sessions (as to no. of queries) are SHORT

6 © Tefko Saracevic, Rutgers University 5 Terms per query distribution SHORT QUERIES: Some 60% have 1 or 2 terms

7 © Tefko Saracevic, Rutgers University 6 Use of Boolean operators Many uses of Boolean operators are wrong - not according to instructions how to use them

8 © Tefko Saracevic, Rutgers University 7 Number of pages viewed per user Most users view VERY FEW pages beyond the first or first two

9 © Tefko Saracevic, Rutgers University 8 Distribution of terms TOP: a very small number of distinct terms used with very high frequency BOTTOM: unusually high number of distinct terms used with low frequency Web query vocabulary contains a very large number of distinct terms – more than in ordinary English texts –has its own & unique characteristics

10 © Tefko Saracevic, Rutgers University 9 Term distribution 51K sample Top: frequency of 100 or more: 74 terms –0.34% of all unique terms (of 21,862) –18% of all terms in all queries (of 113,793) Bottom: frequency of one: 9,790 terms –44.8% of all unique terms –8.6% of all terms in all queries In freq. of 100 or more (subject terms only): –63 subject terms: 0.29% of unique terms; 10.3% of all terms

11 © Tefko Saracevic, Rutgers University 10 Term distribution 1M97 sample

12 © Tefko Saracevic, Rutgers University 11 Top 15 terms (common excluded)

13 © Tefko Saracevic, Rutgers University 12 Top 10 co-occurring terms (only meaningful ones)

14 © Tefko Saracevic, Rutgers University 13 Classification of queries - a sample

15 © Tefko Saracevic, Rutgers University 14 Major findings ( across all three samples) Users: not many queries per search –2.4 mean Terms: not many per query –2.4 mean –in traditional IR queries 3 to 7 times larger Boolean stuff not used much –used from 1 in 10 to 1 in 5 queries

16 © Tefko Saracevic, Rutgers University 15 Major findings... Users did not view many pages –mean 1.9 pages - percentage of views falling –1 in 2 or 1 in 3 of users did not go beyond the first page Relevance feedback (More Like This) not used much –used in about 1 in 20 queries Over time searching did NOT change much –use changed mostly in greater use of advances features

17 © Tefko Saracevic, Rutgers University 16 Major findings... Frequency of use of terms is highly skewed –highest 1/3 of 1% of terms accounted for 1 in every 10 terms used; terms that were used only once were 1/2 of unique terms –Web query language quite unique Lot of searching about sex, but queries in category Sex still represents a small proportion of all categories –great many other topics searched –diversity of subjects searched very high

18 © Tefko Saracevic, Rutgers University 17 Conclusions Web searching is still IR, but very different IR –Web users search in different & simplified ways Many Web search features need redesign to accommodate the way users use the Web Web is a marvelous new technology –but people are unpredictable in use of any new technology - –how are they really using the Web?

19 © Tefko Saracevic, Rutgers University 18 Thank you Gracias Danke Merci Hvala … until the next installment...


Download ppt "© Tefko Saracevic, Rutgers University 1 Vox populi: the public searching of the Web: A longitudinal study of large samples of Excite queries Dietmar Wofram."

Similar presentations


Ads by Google