THE COMPLEX TASK OF MAKING SEARCH SIMPLE Jaime Teevan Microsoft Research UMAP 2015.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

© 2011 richrelevance, Inc. All Rights Reserved A System for eCommerce Recommender Research with Context and Feedback Darren Erik Vengroff RichRelevance.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft Research.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
DiffIE: Changing How You View Changes on the Web DiffIE: Changing How You View Changes on the Web Jaime Teevan, Susan T. Dumais, Daniel J. Liebling, and.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft Reseachdub 2013.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
CrowdLogging: Distributed, private, and anonymous search logging Henry Feild James Allan Joshua Glatt Center for Intelligent Information Retrieval University.
Visual Snippets Summarizing Web Pages for Search and Revisitation Jaime Teevan, Ed Cutrell, Danyel Fisher, Steven Drucker, Gonzalo Ramos, Paul André 1,
Finding and Re-Finding Through Personalization Jaime Teevan MIT, CSAIL David Karger (advisor), Mark Ackerman, Sue Dumais, Rob Miller (committee), Eytan.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
SLOW SEARCH Jaime Teevan, Kevyn Collins-Thompson, Ryen White, Susan Dumais and Yubin Kim.
HUMANE INFORMATION SEEKING: GOING BEYOND THE IR WAY JIN YOUNG IBM RESEARCH 1.
The Perfect Search Engine Is Not Enough Jaime Teevan †, Christine Alvarado †, Mark S. Ackerman ‡ and David R. Karger † † MIT, CSAIL ‡ University of Michigan.
TwitterSearch : A Comparison of Microblog Search and Web Search
Jaime Teevan Microsoft Research Finding and Re-Finding Personal Information.
Observational Approaches to Information Retrieval SIGIR 2014 Tutorial: Choices and Constraints (Part II) Diane Kelly, Filip Radlinski, Jaime Teevan Slides.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA
Facets of Personalization Jaime Teevan Microsoft Research (CLUES) with S. Dumais, E. Horvitz, D. Liebling, E. Adar, J. Elsas, R. Hughes.
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
ELECTRONIC RESOURCES WORKSHOP March 29, 2013 Databases and eBooks A subscription database is a collection of regularly updated scholarly and professional.
A Comparison of Microblog Search and Web Search.
Internet Searching Made Easy Last Updated: Lesson Plan Review Lesson 1: Finding information on the Internet –Web address –Using links –Search.
Internet Skills The World Wide Web (Web) consists of billions of interconnected pages of information from a wide variety of sources. In this section: Web.
CIS 430 November 6, 2008 Emily Pitler. 3  Named Entities  1 or 2 words  Ambiguous meaning  Ambiguous intent 4.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Understanding Query Ambiguity Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft ReseachUNC 2015.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Discovering and Using Groups to Improve Personalized Search Jaime Teevan, Merrie Morris, Steve Bush Microsoft Research.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Understanding and Predicting Personal Navigation.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
SLOW SEARCH WITH PEOPLE Jaime Teevan, Microsoft In collaboration with Michael S. Bernstein, Kevyn Collins- Thompson, Susan T. Dumais,
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Helping People Find Information Better Jaime Teevan, MIT with Christine Alvarado, Mark Ackerman and David Karger.
Thomas Grandell April 8 th, 2016 This work is licensed under the Creative Commons Attribution 4.0 International.
Jaime Teevan MIT, CSAIL The Re:Search Engine. “Pick a card, any card.”
Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
The Web Changes Everything How Dynamic Content Affects the Way People Find Online Jaime Teevan Microsoft Research (CLUES) with S. Dumais, D. Liebling,
SLOW SEARCH WITH PEOPLE Jaime Teevan, Microsoft In collaboration with Collins-Thompson, White, Dumais, Kim, Jeong, Morris, Liebling,
SLOW SEARCH Jaime Teevan, Microsoft In collaboration with Michael S. Bernstein, Kevyn Collins- Thompson, Susan T. Dumais, Shamsi T.
Jaime Teevan MIT, CSAIL. HCI at MIT ♠HCI Seminar ♥Fridays at 1:30 pm ♥URL: ♥Announcement mailing list ♠Applying HCI to.
Jaime Teevan Microsoft Research ECIR 2017
Evaluation Anisio Lacerda.
Simultaneous Support for Finding and Re-Finding
Search User Behavior: Expanding The Web Search Frontier
Strategies for improving Web site performance
Amazing The Re:Search Engine Jaime Teevan MIT, CSAIL.
WIRED Week 2 Syllabus Update Readings Overview.
Consistency During Search Without Stagnation
Presentation transcript:

THE COMPLEX TASK OF MAKING SEARCH SIMPLE Jaime Teevan Microsoft Research UMAP 2015

THE WORLD WIDE WEB 20 YEARS AGO Content 2,700 websites (14%.com) Tools Mosaic only 1 year old Pre-Netscape, IE, Chrome 4 years pre-Google Search Engines 54,000 pages indexed by Lycos 1,500 queries per day

THE WORLD WIDE WEB TODAY Trillions of pages indexed. Billions of queries per day.

1996 We assume information is static. But web content changes!

SEARCH RESULTS CHANGE New, relevant content Improved ranking Personalization General instability Can change during a query!

SEARCH RESULTS CHANGE

BIGGEST CHANGE ON THE WEB Behavioral data.

It is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second, into two like powers. I have discovered a truly marvellous proof of this, which this margin is too narrow to contain. BEHAVIORAL DATA MANY YEARS AGO Marginalia adds value to books Students prefer annotated texts Do we lose marginalia when we move to digital documents? No! Scale makes it possible to look at experiences in the aggregate, and to tailor and personalize

PAST SURPRISES ABOUT WEB SEARCH Early log analysis  Excite logs from 1997, 1999  Silverstein et al. 1999; Jansen et al. 2000; Broder 2002 Queries are not 7 or 8 words long Advanced operators not used or “misused” Nobody used relevance feedback Lots of people search for sex Navigational behavior common Prior experience was with library search

SEARCH IS COMPLEX, MULTI-STEPPED PROCESS Typical query involves more than one click  59% of people return to search page after their first click  Clicked results often not the endpoint  People orienteer from results using context as a guide  Not all information needs can be expressed with current tools  Recognition is easier than recall Typical search session involves more than one query  40% of sessions contain multiple queries  Half of all search time spent in sessions of 30+ minutes Search tasks often involves more than one session  25% of queries are from multi-session tasks

IDENTIFYING VARIATION ACROSS INDIVIDUALS

WHICH QUERY HAS LESS VARIATION? campbells soup recipes v. vegetable soup recipe tiffany’s v. tiffany nytimes v. connecticut newspapers v. federal government jobs singaporepools.com v. singapore pools

NAVIGATIONAL QUERIES WITH LOW VARIATION Use everyone’s clicks to identify queries with low click entropy  12% of the query volume  Only works for popular queries Clicks predicted only 72% of the time  Double the accuracy for the average query  But what is going on the other 28% of the time? Many typical navigational queries are not identified  People visit interior pages  craigslist – 3% visit  People visit related pages  weather.com – 17% visit

INDIVIDUALS FOLLOW PATTERNS Getting ready in the morning. Getting to a webpage.

FINDING OFTEN INVOLVES REFINDING Repeat query (33%)  user modeling, adaptation, and personalization Repeat click (39%)   Query  umap Lots of repeats (43%) Repeat Query 33% New Query 67% Repeat Click New Click Repeat Query 33%29%4% New Query 67%10%57% 39%61%

IDENTIFYING PERSONAL NAVIGATION Use an individual’s clicks to identify repeat (query, click) pairs  15% of the query volume  Most occur fewer than 25 times in the logs Queries more ambiguous  Rarely contain a URL fragment  Click entropy the same as for general Web queries  Multiple meanings – enquirer  Found navigation – bed bugs  Serendipitous encounters – etsy National Enquirer Cincinnati Enquirer [Informational] Etsy.com Regretsy.com (parody) 95%

SUPPORTING PERSONAL NAVIGATION Tom Bosley - Wikipedia, the free encyclopedia Thomas Edward "Tom" Bosley (October 1, 1927 October 19, 2010) was an American actor, best known for portraying Howard Cunningham on the long-running ABC sitcom Happy Days. Bosley was born in Chicago, the son of Dora and Benjamin Bosley. en.wikipedia.org/wiki/tom_bosley Tom Bosley - Wikipedia, the free encyclopedia Bosley died at 4:00 a.m. of heart failure on October 19, 2010, at a hospital near his home in Palm Springs, California. … His agent, Sheryl Abrams, said Bosley had been battling lung cancer. en.wikipedia.org/wiki/tom_bosley

PATTERNS A DOUBLE EDGED SWORD Patterns are predictable. Changing a pattern is confusing.

CHANGE INTERRUPTS PATTERNS Example: Dynamic menus  Put commonly used items at top  Slows menu item access Does search result change interfere with refinding?

CHANGE INTERRUPTS REFINDING When search result ordering changes people are  Less likely to click on a repeat result  Slower to click on a repeat result when they do  More likely to abandon their search Happens within a query and across sessions Even happens when the repeat result moves up! How to reconcile the benefits of change with the interruption?

USE MAGIC TO MINIMIZE INTERRUPTION

ABRACADABRA Magic happens.

YOUR CARD IS GONE!

CONSISTENCY ONLY MATTERS SOMETIMES

BIAS PERSONALIZATION BY EXPERIENCE

CREATE CHANGE BLIND WEB EXPERIENCES

THE COMPLEX TASK OF MAKING SEARCH SIMPLE Challenge: The web is complex  Tools change, content changes  Different people use the web differently Fortunately, individuals are simple  We are predictable, follow patterns  Predictability enables personalization Beware of breaking expectations!  Bias personalization by expectations  Create magic personal experiences

REFERENCES Broder. A taxonomy of web search. SIGIR Forum, 2002 Donato, Bonchi, Chi & Maarek. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW Dumais. Task-based search: A search engine perspective. NSF Task-Based Information Search Systems Workshop, Jansen, Spink & Saracevic. Real life, real users, and real needs: A study and analysis of user queries on the web. IP&M, Kim, Cramer, Teevan & Lagun. Understanding how people interact with web search results that change in real-time using implicit feedback. CIKM Lee, Teevan & de la Chica. Characterizing multi-click search behavior and the risks and opportunities of changing results during use. SIGIR Mitchell & Shneiderman. Dynamic versus static menus: An exploratory comparison. SIGCHI Bulletin, Selberg & Etzioni. On the instability of web search engines. RIAO Silverstein, Marais, Henzinger & Moricz. Analysis of a very large web search engine query log. SIGIR Forum, Somberg. A comparison of rule-based and positionally constant arrangements of computer menu items. CHI Svore, Teevan, Dumais & Kulkarni. Creating temporally dynamic web search snippets. SIGIR Teevan. The Re:Search Engine: Simultaneous support for finding and re- finding. UIST Teevan. How people recall, recognize and reuse search results. TOIS, Teevan, Alvarado, Ackerman & Karger. The perfect search engine is not enough: A study of orienteering behavior in directed search. CHI Teevan, Collins-Thompson, White & Dumais. Viewpoint: Slow search. CACM, Teevan, Collins-Thompson, White, Dumais & Kim. Slow search: Information retrieval without time constraints. HCIR Teevan, Cutrell, Fisher, Drucker, Ramos, Andrés & Hu. Visual snippets: Summarizing web pages for search and revisitation. CHI Teevan, Dumais & Horvitz. Potential for personalization. TOCHI, Teevan, Dumais & Liebling. To personalize or not to personalize: Modeling queries with variation in user intent. SIGIR Teevan, Liebling & Geetha. Understanding and predicting personal navigation. WSDM Tyler & Teevan. Large scale query log analysis of re-finding. WSDM More at:

THANK YOU! Jaime Teevan

EXTRA SLIDES How search engines can make use of change to improve search.

CHANGE CAN IDENTIFY IMPORTANT TERMS Divergence from norm  cookbooks  frightfully  merrymaking  ingredient  latkes Staying power in page Time Sep. Oct. Nov. Dec.

CHANGE CAN IDENTIFY IMPORTANT SEGMENTS Page elements change at different rates Pages are revisited at different rates Resonance can serve as a filter for important content

EXTRA SLIDES Impact of change on refinding behavior.

Change to click  Unsatisfied initially  Gone > Down > Stay > Up  Satisfied initially  Stay > Down > Up > Gone Changes around click  Always benefit NSAT users  Best below the click for satisfied users NSATSAT Up Stay Down Gone NSATChangesStatic Above Below SATChangesStatic Above 4.93 Below BUT CHANGE HELPS WITH FINDING!

EXTRA SLIDES Privacy issues and behavioral logs.

PUBLIC SOURCES OF BEHAVIORAL LOGS Public Web service content  Twitter, Facebook, Digg, Wikipedia Research efforts to create logs  Lemur Community Query Log Project   1 year of data collection = 6 seconds of Google logs Publicly released private logs  DonorsChoose.org   Enron corpus, AOL search logs, Netflix ratings

EXAMPLE: AOL SEARCH DATASET August 4, 2006: Logs released to academic community  3 months, 650 thousand users, 20 million queries  Logs contain anonymized User IDs August 7, 2006: AOL pulled the files, but already mirrored August 9, 2006: New York Times identified Thelma Arnold  “A Face Is Exposed for AOL Searcher No ”  Queries for businesses, services in Lilburn, GA (pop. 11k)  Queries for Jarrett Arnold (and others of the Arnold clan)  NYT contacted all 14 people in Lilburn with Arnold surname  When contacted, Thelma Arnold acknowledged her queries August 21, 2006: 2 AOL employees fired, CTO resigned September, 2006: Class action lawsuit filed against AOL AnonIDQueryQueryTimeItemRankClickURL jitp :18:181http:// jipt submission process :18:183http:// computational social scinece :19: computational social science :20:042http://socialcomplexity.gmu.edu/phd.php seattle restaurants :25:502http://seattletimes.nwsource.com/rests perlman montreal :15:144http://oldwww.acm.org/perlman/guide.html jitp 2006 notification :13:13 …

EXAMPLE: AOL SEARCH DATASET Other well known AOL users  User 927 how to kill your wife  User i love alaska  Anonymous IDs do not make logs anonymous  Contain directly identifiable information  Names, phone numbers, credit cards, social security numbers  Contain indirectly identifiable information  Example: Thelma’s queries  Birthdate, gender, zip code identifies 87% of Americans

EXAMPLE: NETFLIX CHALLENGE October 2, 2006: Netflix announces contest  Predict people’s ratings for a $1 million dollar prize  100 million ratings, 480k users, 17k movies  Very careful with anonymity post-AOL May 18, 2008: Data de-anonymized  Paper published by Narayanan & Shmatikov  Uses background knowledge from IMDB  Robust to perturbations in data December 17, 2009: Doe v. Netflix March 12, 2010: Netflix cancels second competition Ratings 1: [Movie 1 of 17770] 12, 3, [CustomerID, Rating, Date] 1234, 5, [CustomerID, Rating, Date] 2468, 1, [CustomerID, Rating, Date] … Movie Titles … 10120, 1982, “Bladerunner” 17690, 2007, “The Queen” … All customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy... Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one tenth of our complete dataset) and that data was subject to perturbation.