A Large Scale Study of Wireless Search Behavior: Google Mobile Search By Maryam Kamvar, Shumeet Baluja Presented by Prashanth Kumar Muthoju, Aditya Varakantam
Overview Introduction Previous Work XHTML & PDA Interfaces Query Analysis –Top-level statistics –Query Categorization –Query Distributions Session Characteristics User Persistence Conclusion Future Work References Discussion
In 2004 more than 57% of US population owned a cell phone Huge Growth in Cell phone users –From June 2004 to Dec 2004 Growth = 2 million per month Introduction Sources: infoplease.cominfoplease.com npg.org census.gov factfinder.census.gov
Introduction Desktop Search – from conventional computer –desktop –laptop Wireless Search – from mobile devices –Cell phone –PDA
Introduction Goal of the paper: –To present a snapshot of the current state of mobile search It enables to understand unique needs of mobile searchers Analysis of –Google’s XHTML search logs –Google’s PDA search logs
Introduction Study Parameters: Data set – 1 million page view requests randomly sampled during a 1 month period in 2005 Only English “web” searches included Single, Large US carrier used Strictly anonymous data used All results are aggregated statistics
Previous Work Most of the previous methods concentrated on the differences between Information Retrieval and Web Search –They assumed queries were initiated from a conventional computer Very few mobile web studies were conducted – small datasets used The paper’s goal is to provide insight through large scale log analysis of how and for what mobile search is being used
XHTML & PDA Interfaces Only Web and Image searches offered Displays same text snippet as desktop No transcoding done in PDA Short snippets No horizontal scrolling No ads No cached or similar pages links A sample mobile phone simulator is here.. PDA XHTML
Query Analysis Difference between wireless and desktop search in terms of –Content –Variety –Descriptive statistics (# of words, # of characters etc.) Top level Statistics Query Categorization Query Distributions
Top-level Statistics –no of words & no of characters in a typical query Graph. taken from paper Query Analysis
..Cont.d XHTML: PDA: Mean and Median are approximately same in both cases In a small study done on a speech interface to search, average length of spoken queries to Google was 2.1 Users have learned to form queries that give neither too many nor too few results (words per query) (Characters per query) Average Median316 Max65396 Std. Deviation (words per query) (Characters per query) Average Median214 Max30502 Std. Deviation Query Analysis
..Cont.d Effort for 12-key keypad is double that of QWERTY keypad Queries with only ‘a-z’ and whitespaces ~ 74% of general queries Avg length = 14.5 characters Avg. no. of key presses/query = 30.7(Triple tap input method) median = 28 max = 237 S.D. = 17.8 Queries with mix of alpha numeric characters and symbols (such as URL’s) URL’s ~ 17% of general queries –Its beneficial to build address bar like behavior into search box Query Analysis
Fig. taken from paper Query Categorization
Observations: Adult queries - popular on XHTML –people might be feeling more secure on their cell phones Local Services - popular on PDA –which is used by business/corporate people Queries including zip code is very less (<1%) in both of the cases. Shortest Queries –Adult Longest Queries – Local Services Query Categorization
Variations in queries Measure(%) = Test Data –Random sampling of 50,000 queries from Desktop, XHTML and PDA searches during a month (top-N unique queries) (total query volume) Query Categorization
Query Distributions Top wireless query accounts for only 1.2% of all wireless queries Desktop queries have more variation –Top 1000 XHTML queries ~ 22% of all XHTML queries –Top 1000 desktop queries ~ 6% of all desktop queries Graph taken from paper Variation decreases
..Cont.d Possible Reasons : –People may have adapted their queries to those that return useable sites –There is smaller user base that may share similar properties XHTML searches are more likely technology savvy PDA users are more likely to share corporate business profile –Desktop browsers could display most advanced things like Java scripts PDA is less advanced (no Java scripts) Cell phone uses only XHTML Query Distributions
Session Characteristics Session – A series of queries by a single user made within a small range of time ( this time is called session delta ) –A session delta of 5 minutes was used for this paper Only 51.3% of XHTML logs (in the total data set used) had cookie information Cookies were present in all of the PDA logs
Graph taken from paper Session Characteristics Queries per HTML session
.. Cont.d Approximate time to input a query from a 12-key keypad ~ sec.s –It is the time duration between the user first hits Google XHTML home page and the first query received by Google = (time to download Google homepage) + (time to input query) + (time to upload http request to server) Avg = 66.3secs Med= 5 Max=300 SD=49.3 Time to upload /download content ~ 3 – 10 seconds Session Characteristics
Observation: Time is proportional to length of query and ease of input Fig. taken from paper Session Characteristics Seconds to result & time spent inputting query from XHTML and PDA interfaces XHTML PDA
Exploration of Result Links Low click-through rate across all categories –It shows users are relying heavily on snippets in wireless search for their information. For users who clicked through, Avg. no of clicks per query =1.7 Med=1 Max=37 SD= 1.8 Users spent seconds on the search results page before they click on any link
Only 8.5% queries had “more search results” request, –In those Avg. no of requests viewed =2.2 Med = 1 Max = 82 S.D. = 3.1 PDA users took approximately 15secs less to request more results than XHTML users (In XHTML interface, users might have confused between search button and next button) XHTML and PDA page views per query are significantly less than desktop statistics Exploration of Result Links More results facility on Desktop vs. wireless search
User Persistence To measure how persistent users are in finding what way they are looking for –Measure 1: We consider a pair of consecutive queries to be a refinement if –query 1 is subset of query 2 –query 2 is subset of query 1 –edit distance between q1 and q2 is < |q2|/2 Of all the consecutive queries in a XHTML query session –Approximately 28.7% were refinements –14% of them are refinements after spell check –31.7% are same queries –In the remaining 25%, second query was not related to the first PDA –33.6% are same queries –11.9% of them are refinements after spell check
Measure 2 : Refinement requirements are relaxed here. –Queries need not occur in same session –Users who make at least 2 queries within one month time period are considered. User Persistence
Fig. taken from paper User Persistence Only values > 3% are shown
Conclusion In depth examination of wireless search patterns for a major carrier in US is presented Large scale log analysis –Strength: Breadth of data analyzed –Weakness: Numbers won’t tell the story behind a user’s experience No demographic of users Still this study presented data on current state of wireless search
Future Work - Discussion Which aspects of search result (title/snippet/url/click through page) are most important for a wireless user? How does interface accessibility change search patterns? How much does items that require a scroll reduce the click-through rate? What would be the impact of (Dynamic) Query suggestion systems on Query length and User persistence