Download presentation
Presentation is loading. Please wait.
Published byLynne Parker Modified over 9 years ago
1
Who Uses Web Search for What? And How?
2
Contribution Combine behavioral observation and demographic features of users Provide important insight on search behavior Design decision Search results Information flows
3
Key Ideas Who is searching Analyzing user’s demographics What are they searching for Analyzing query topics How they are searching Analyzing session information
4
User Modeling Query topics (what?): what are the topics that user issues queries on? Y! Directory classification for topics User demographics (who?): What is the demographics profile of the user? Used a mix of user-provided information (age and gender) and information derived from user’s zip code Session characteristics (how?) : Does a user have many/few, short/long and navigational/information sessions? Session length No. of queries per session
5
Data 2008-2009 web search query log of Yahoo! Search engine (2.3 million) Log from U.S. Yahoo! site Registered U.S. Yahoo users with an identifiable cookie Only active users 1. Issue at least 100 queries over the sample period 2. Remove users with more than 100,000 queries issues 3. Remove users who clicked fewer than 1/100 of their queries 4. Remove User who clicked more than 100 time per queries
6
“Who?” Data Using U.S. 2000 Census with zip code Per-capita income Level of education Ethnicity Using 2008 U.S. presidential election with zip code to obtain result
7
“Who?” Data
8
“What?” Data Topic distribution of the queries issued by user Use top 10 Yahoo! search results obtained for a given query Use unique proprietary classification to classify into 71 topics Classified queries issued at least 30 times 10 million distinct queries
9
“Who?” Data
10
“How?” Data Session length and no. of clicks per session Count no. of queries issued within session interval of 30 mins Classified queries into Navigational- seek a single website Informational - queries that cover a board topic Transactional- queries that reflect the intent of user to perform a particular action (Click Entropy) H(D|q) = ∑ d,q –p(d|q) log 2 p(d|q) Compute click entropy for queries issued >= 20 times H(D|q) ≤ 1 – focused query H(D|q) ≥ 3 – diverse query
11
“How?” Data 370k distinct queries for both focused and diverse queries Focused queries – navigational Diverse queries - informational
12
Method Unsupervised K-mean clustering K ranging from 8 to 20 Clustered users using topics distributions Using the result to induced the “Who” and “How”
13
Informational Users Used search engine as research engine to find information on a wide range of topics “Who” Well educated with above-average income “What” Do research on a wide range of topics with little interest in adult content “How” More likely to issue non-navigational queries Less likely to have a single-click session More likely to make use of the suggested query alternatives
14
Navigational Users Used search engine as a replacement for web page bookmarking to navigate to URLs that he already knows exist “Who” Background averages of topical cluster under consideration “What” Dominated by topic of popular website e.g FB, gmail “How” More likely to issue navigational queries More likely to click only on a single result within a session Less likely to make use of unnecessary suggested query alternatives
15
Transactional Users Used search engine to take him to some URL where he can perform desired transaction Little benefit in learning more about a subject URL generally not known in advance “Who” Depend heavily on the kind of transaction “What” Predominant topics are shopping, adult content and gaming sites “How” Diverse clicks
16
Close-Up Baby Boomers Avg age of 50 yrs old Simple navigational queries related to online banking and interested in finance Liberal Females Most likely to voted for Democrat in 2008 elections Biggest single query is shopping with longer session White Conservatives Voted for Republican in 2008 elections Search for automotive related topics, business pages & home and garden information
17
Conclusion Overall finding are stereotypical Future work Fine grained analysis in term of categories and search strategy Closer look at long-tail queries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.