THE COMPLEX TASK OF MAKING SEARCH SIMPLE Jaime Teevan Microsoft Research UMAP 2015
THE WORLD WIDE WEB 20 YEARS AGO Content 2,700 websites (14%.com) Tools Mosaic only 1 year old Pre-Netscape, IE, Chrome 4 years pre-Google Search Engines 54,000 pages indexed by Lycos 1,500 queries per day
THE WORLD WIDE WEB TODAY Trillions of pages indexed. Billions of queries per day.
1996 We assume information is static. But web content changes!
SEARCH RESULTS CHANGE New, relevant content Improved ranking Personalization General instability Can change during a query!
SEARCH RESULTS CHANGE
BIGGEST CHANGE ON THE WEB Behavioral data.
It is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second, into two like powers. I have discovered a truly marvellous proof of this, which this margin is too narrow to contain. BEHAVIORAL DATA MANY YEARS AGO Marginalia adds value to books Students prefer annotated texts Do we lose marginalia when we move to digital documents? No! Scale makes it possible to look at experiences in the aggregate, and to tailor and personalize
PAST SURPRISES ABOUT WEB SEARCH Early log analysis Excite logs from 1997, 1999 Silverstein et al. 1999; Jansen et al. 2000; Broder 2002 Queries are not 7 or 8 words long Advanced operators not used or “misused” Nobody used relevance feedback Lots of people search for sex Navigational behavior common Prior experience was with library search
SEARCH IS COMPLEX, MULTI-STEPPED PROCESS Typical query involves more than one click 59% of people return to search page after their first click Clicked results often not the endpoint People orienteer from results using context as a guide Not all information needs can be expressed with current tools Recognition is easier than recall Typical search session involves more than one query 40% of sessions contain multiple queries Half of all search time spent in sessions of 30+ minutes Search tasks often involves more than one session 25% of queries are from multi-session tasks
IDENTIFYING VARIATION ACROSS INDIVIDUALS
WHICH QUERY HAS LESS VARIATION? campbells soup recipes v. vegetable soup recipe tiffany’s v. tiffany nytimes v. connecticut newspapers v. federal government jobs singaporepools.com v. singapore pools
NAVIGATIONAL QUERIES WITH LOW VARIATION Use everyone’s clicks to identify queries with low click entropy 12% of the query volume Only works for popular queries Clicks predicted only 72% of the time Double the accuracy for the average query But what is going on the other 28% of the time? Many typical navigational queries are not identified People visit interior pages craigslist – 3% visit People visit related pages weather.com – 17% visit
INDIVIDUALS FOLLOW PATTERNS Getting ready in the morning. Getting to a webpage.
FINDING OFTEN INVOLVES REFINDING Repeat query (33%) user modeling, adaptation, and personalization Repeat click (39%) Query umap Lots of repeats (43%) Repeat Query 33% New Query 67% Repeat Click New Click Repeat Query 33%29%4% New Query 67%10%57% 39%61%
IDENTIFYING PERSONAL NAVIGATION Use an individual’s clicks to identify repeat (query, click) pairs 15% of the query volume Most occur fewer than 25 times in the logs Queries more ambiguous Rarely contain a URL fragment Click entropy the same as for general Web queries Multiple meanings – enquirer Found navigation – bed bugs Serendipitous encounters – etsy National Enquirer Cincinnati Enquirer [Informational] Etsy.com Regretsy.com (parody) 95%
SUPPORTING PERSONAL NAVIGATION Tom Bosley - Wikipedia, the free encyclopedia Thomas Edward "Tom" Bosley (October 1, 1927 October 19, 2010) was an American actor, best known for portraying Howard Cunningham on the long-running ABC sitcom Happy Days. Bosley was born in Chicago, the son of Dora and Benjamin Bosley. en.wikipedia.org/wiki/tom_bosley Tom Bosley - Wikipedia, the free encyclopedia Bosley died at 4:00 a.m. of heart failure on October 19, 2010, at a hospital near his home in Palm Springs, California. … His agent, Sheryl Abrams, said Bosley had been battling lung cancer. en.wikipedia.org/wiki/tom_bosley
PATTERNS A DOUBLE EDGED SWORD Patterns are predictable. Changing a pattern is confusing.
CHANGE INTERRUPTS PATTERNS Example: Dynamic menus Put commonly used items at top Slows menu item access Does search result change interfere with refinding?
CHANGE INTERRUPTS REFINDING When search result ordering changes people are Less likely to click on a repeat result Slower to click on a repeat result when they do More likely to abandon their search Happens within a query and across sessions Even happens when the repeat result moves up! How to reconcile the benefits of change with the interruption?
USE MAGIC TO MINIMIZE INTERRUPTION
ABRACADABRA Magic happens.
YOUR CARD IS GONE!
CONSISTENCY ONLY MATTERS SOMETIMES
BIAS PERSONALIZATION BY EXPERIENCE
CREATE CHANGE BLIND WEB EXPERIENCES
THE COMPLEX TASK OF MAKING SEARCH SIMPLE Challenge: The web is complex Tools change, content changes Different people use the web differently Fortunately, individuals are simple We are predictable, follow patterns Predictability enables personalization Beware of breaking expectations! Bias personalization by expectations Create magic personal experiences
REFERENCES Broder. A taxonomy of web search. SIGIR Forum, 2002 Donato, Bonchi, Chi & Maarek. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW Dumais. Task-based search: A search engine perspective. NSF Task-Based Information Search Systems Workshop, Jansen, Spink & Saracevic. Real life, real users, and real needs: A study and analysis of user queries on the web. IP&M, Kim, Cramer, Teevan & Lagun. Understanding how people interact with web search results that change in real-time using implicit feedback. CIKM Lee, Teevan & de la Chica. Characterizing multi-click search behavior and the risks and opportunities of changing results during use. SIGIR Mitchell & Shneiderman. Dynamic versus static menus: An exploratory comparison. SIGCHI Bulletin, Selberg & Etzioni. On the instability of web search engines. RIAO Silverstein, Marais, Henzinger & Moricz. Analysis of a very large web search engine query log. SIGIR Forum, Somberg. A comparison of rule-based and positionally constant arrangements of computer menu items. CHI Svore, Teevan, Dumais & Kulkarni. Creating temporally dynamic web search snippets. SIGIR Teevan. The Re:Search Engine: Simultaneous support for finding and re- finding. UIST Teevan. How people recall, recognize and reuse search results. TOIS, Teevan, Alvarado, Ackerman & Karger. The perfect search engine is not enough: A study of orienteering behavior in directed search. CHI Teevan, Collins-Thompson, White & Dumais. Viewpoint: Slow search. CACM, Teevan, Collins-Thompson, White, Dumais & Kim. Slow search: Information retrieval without time constraints. HCIR Teevan, Cutrell, Fisher, Drucker, Ramos, Andrés & Hu. Visual snippets: Summarizing web pages for search and revisitation. CHI Teevan, Dumais & Horvitz. Potential for personalization. TOCHI, Teevan, Dumais & Liebling. To personalize or not to personalize: Modeling queries with variation in user intent. SIGIR Teevan, Liebling & Geetha. Understanding and predicting personal navigation. WSDM Tyler & Teevan. Large scale query log analysis of re-finding. WSDM More at:
THANK YOU! Jaime Teevan
EXTRA SLIDES How search engines can make use of change to improve search.
CHANGE CAN IDENTIFY IMPORTANT TERMS Divergence from norm cookbooks frightfully merrymaking ingredient latkes Staying power in page Time Sep. Oct. Nov. Dec.
CHANGE CAN IDENTIFY IMPORTANT SEGMENTS Page elements change at different rates Pages are revisited at different rates Resonance can serve as a filter for important content
EXTRA SLIDES Impact of change on refinding behavior.
Change to click Unsatisfied initially Gone > Down > Stay > Up Satisfied initially Stay > Down > Up > Gone Changes around click Always benefit NSAT users Best below the click for satisfied users NSATSAT Up Stay Down Gone NSATChangesStatic Above Below SATChangesStatic Above 4.93 Below BUT CHANGE HELPS WITH FINDING!
EXTRA SLIDES Privacy issues and behavioral logs.
PUBLIC SOURCES OF BEHAVIORAL LOGS Public Web service content Twitter, Facebook, Digg, Wikipedia Research efforts to create logs Lemur Community Query Log Project 1 year of data collection = 6 seconds of Google logs Publicly released private logs DonorsChoose.org Enron corpus, AOL search logs, Netflix ratings
EXAMPLE: AOL SEARCH DATASET August 4, 2006: Logs released to academic community 3 months, 650 thousand users, 20 million queries Logs contain anonymized User IDs August 7, 2006: AOL pulled the files, but already mirrored August 9, 2006: New York Times identified Thelma Arnold “A Face Is Exposed for AOL Searcher No ” Queries for businesses, services in Lilburn, GA (pop. 11k) Queries for Jarrett Arnold (and others of the Arnold clan) NYT contacted all 14 people in Lilburn with Arnold surname When contacted, Thelma Arnold acknowledged her queries August 21, 2006: 2 AOL employees fired, CTO resigned September, 2006: Class action lawsuit filed against AOL AnonIDQueryQueryTimeItemRankClickURL jitp :18:181http:// jipt submission process :18:183http:// computational social scinece :19: computational social science :20:042http://socialcomplexity.gmu.edu/phd.php seattle restaurants :25:502http://seattletimes.nwsource.com/rests perlman montreal :15:144http://oldwww.acm.org/perlman/guide.html jitp 2006 notification :13:13 …
EXAMPLE: AOL SEARCH DATASET Other well known AOL users User 927 how to kill your wife User i love alaska Anonymous IDs do not make logs anonymous Contain directly identifiable information Names, phone numbers, credit cards, social security numbers Contain indirectly identifiable information Example: Thelma’s queries Birthdate, gender, zip code identifies 87% of Americans
EXAMPLE: NETFLIX CHALLENGE October 2, 2006: Netflix announces contest Predict people’s ratings for a $1 million dollar prize 100 million ratings, 480k users, 17k movies Very careful with anonymity post-AOL May 18, 2008: Data de-anonymized Paper published by Narayanan & Shmatikov Uses background knowledge from IMDB Robust to perturbations in data December 17, 2009: Doe v. Netflix March 12, 2010: Netflix cancels second competition Ratings 1: [Movie 1 of 17770] 12, 3, [CustomerID, Rating, Date] 1234, 5, [CustomerID, Rating, Date] 2468, 1, [CustomerID, Rating, Date] … Movie Titles … 10120, 1982, “Bladerunner” 17690, 2007, “The Queen” … All customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy... Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one tenth of our complete dataset) and that data was subject to perturbation.