THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft

Slides:



Advertisements
Similar presentations
Using the Self Service BMC Helpdesk
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Back to Table of Contents
Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.
FindAll: A Local Search Engine for Mobile Phones Aruna Balasubramanian University of Washington.
Personalization and Search Jaime Teevan Microsoft Research.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft Research.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
MSN and competing Web portals Presented by : Meiyu Xie Course : COMP 1631, Winter 2011 Date:
DiffIE: Changing How You View Changes on the Web DiffIE: Changing How You View Changes on the Web Jaime Teevan, Susan T. Dumais, Daniel J. Liebling, and.
Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.
Information Re-Retrieval: Repeat Queries in Yahoo’s Logs Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts SIGIR 2007.
T EMPORAL -I NFORMATICS R ESEARCH Eytan Adar University of Michigan, School of Information and Computer Science & Engineering April 23, 2010.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft Reseachdub 2013.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
WageIndicator SEO, December 10, 2008 Irene van Beveren Today: 0.Why SEO is important 1.Keyword Strategies 2.Title Tags 3.Internal Links 4.Duplicate Content.
Finding and Re-Finding Through Personalization Jaime Teevan MIT, CSAIL David Karger (advisor), Mark Ackerman, Sue Dumais, Rob Miller (committee), Eytan.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
SLOW SEARCH Jaime Teevan, Kevyn Collins-Thompson, Ryen White, Susan Dumais and Yubin Kim.
TwitterSearch : A Comparison of Microblog Search and Web Search
n Introduction Introduction n Making a source list Making a source list Making a source list n Preparing a Works Cited list Preparing a Works Cited.
Jaime Teevan Microsoft Research Finding and Re-Finding Personal Information.
Agents Know-bots, Robots & A.I. By: Brandy S.N. Ervin.
Observational Approaches to Information Retrieval SIGIR 2014 Tutorial: Choices and Constraints (Part II) Diane Kelly, Filip Radlinski, Jaime Teevan Slides.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Slide No. 1 Searching the Web H Search engines and directories H Locating these resources H Using these resources H Interpreting results H Locating specific.
Facets of Personalization Jaime Teevan Microsoft Research (CLUES) with S. Dumais, E. Horvitz, D. Liebling, E. Adar, J. Elsas, R. Hughes.
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Internet Searching Made Easy Last Updated: Lesson Plan Review Lesson 1: Finding information on the Internet –Web address –Using links –Search.
Task 1 Research on any 2 of the following: Online shopping Online banking Web broadcasting Social networking sites Discuss the disadvantages and advantages.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Understanding Query Ambiguity Jaime Teevan, Susan Dumais, Dan Liebling Microsoft Research.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft ReseachUNC 2015.
December 16, 2014Research Administrators Workgroup.
NSA - Nov 4, 2010 Susan Dumais Microsoft Research Information Analysis in a Dynamic and Data-Rich World.
1 Resperate Order Process Analysis & Recommendations. October 2006 Version 1.
Blogs, Wikis and Podcasting  By Zach, Andrew and Sam.
SIMSWeb “Internet Remote Access” The most advanced central station software in the universe !
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
THE COMPLEX TASK OF MAKING SEARCH SIMPLE Jaime Teevan Microsoft Research UMAP 2015.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Understanding and Predicting Personal Navigation.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
© 2007 Marketwire Presented By: Darin Wolter Marketwire SEO for Corporate News SEARCH.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Jaime Teevan MIT, CSAIL The Re:Search Engine. “Pick a card, any card.”
Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
The Web Changes Everything How Dynamic Content Affects the Way People Find Online Jaime Teevan Microsoft Research (CLUES) with S. Dumais, D. Liebling,
Web Content And Customer Relationship Management Solution. Transforming web sites into a customer-focused, revenue generating channel with less stress.
SLOW SEARCH Jaime Teevan, Microsoft In collaboration with Michael S. Bernstein, Kevyn Collins- Thompson, Susan T. Dumais, Shamsi T.
Jaime Teevan Microsoft Research ECIR 2017
Click It, No More Tick It: Using “Gimlet” Desk Statistics to Improve Services at the Charles W. Chesnutt Library Velappan velappan, M.S., M.L.I.S., Head.
Simultaneous Support for Finding and Re-Finding
USER’S GUIDE SERVICE CAPABILITIES Explore the eLibrary
The Four Dimensions of Search Engine Quality
Presentation transcript:

THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft

The Web Changes Everything Content Changes JanuaryFebruaryMarch April May JuneJuly August September

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Today’s tools focus on the present But there’s so much more information available!

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes  Large scale Web crawl over time  Revisited pages 55,000 pages crawled hourly for 18+ months  Judged pages (relevance to a query) 6 million pages crawled every two days for 6 months

Measuring Web Page Change  Summary metrics  Number of changes  Time between changes  Amount of change Top level pages change by more and faster than pages with long URLS..edu and.gov pages do not change by very much or very often News pages change quickly, but not as drastically as other types of pages

Measuring Web Page Change  Summary metrics  Number of changes  Time between changes  Amount of change  Change curves  Fixed starting point  Measure similarity over different time intervals Knot point

Measuring Within-Page Change  DOM structure changes  Term use changes  Divergence from norm cookbooks frightfully merrymaking ingredient latkes  Staying power in page Time Sep. Oct. Nov. Dec.

Accounting for Web Dynamics  Avoid problems caused by change  Caching, archiving, crawling  Use change to our advantage  Ranking Match term’s staying power to query intent  Snippet generation Tom Bosley - Wikipedia, the free encyclopedia Thomas Edward "Tom" Bosley (October 1, 1927 October 19, 2010) was an American actor, best known for portraying Howard Cunningham on the long-running ABC sitcom Happy Days. Bosley was born in Chicago, the son of Dora and Benjamin Bosley. en.wikipedia.org/wiki/tom_bosley Tom Bosley - Wikipedia, the free encyclopedia Bosley died at 4:00 a.m. of heart failure on October 19, 2010, at a hospital near his home in Palm Springs, California. … His agent, Sheryl Abrams, said Bosley had been battling lung cancer. en.wikipedia.org/wiki/tom_bosley

Revisitation on the Web JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September What’s the last Web page you visited?  Revisitation patterns  Log analysis Browser logs for revisitation Query logs for re-finding  User survey for intent

Measuring Revisitation  Summary metrics  Unique visitors  Visits/user  Time between visits  Revisitation curves  Revisit interval histogram  Normalized Time Interval

Four Revisitation Patterns  Fast  Hub-and-spoke  Navigation within site  Hybrid  High quality fast pages  Medium  Popular homepages  Mail and Web applications  Slow  Entry pages, bank pages  Accessed via search engine

Search and Revisitation  Repeat query (33%)  university of michigan  Repeat click (39%)   Query  um ann arbor  Lots of repeats (43%)  Many navigational Repeat Click New Click Repeat Query 33%29%4% New Query 67%10%57% 39%61%

6 th

How Revisitation and Change Relate JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Why did you revisit the last Web page you did?

Possible Relationships  Interested in change  Monitor  Effect change  Transact  Change unimportant  Find  Change can interfere  Re-find

Understanding the Relationship  Compare summary metrics  Revisits: Unique visitors, visits/user, interval  Change: Number, interval, similarity 2 visits/user 3 visits/user 4 visits/user 5 or 6 visits/user 7+ visits/user Number of changesTime between changesSimilarity 2 visits/user visits/user visits/user or 6 visits/user visits/user

Comparing Change and Revisit Curves  Three pages  New York Times  Woot.com  Costco  Similar change patterns  Different revisitation  NYT: Fast (news, forums)  Woot: Medium  Costco: Slow (retail)

Comparing Change and Revisit Curves  Three pages  New York Times  Woot.com  Costco  Similar change patterns  Different revisitation  NYT: Fast (news, forums)  Woot: Medium  Costco: Slow (retail) Time

Within-Page Relationship  Page elements change at different rates  Pages revisited at different rates  Resonance can serve as a filter for interesting content

Exposing Change Diff-IE toolbar Changes to page since your last visit

Interesting Features Always on In-situ New to you Non-intrusive

Studying Diff-IE JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o InstallDiff-IE SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o

Seeing Change Changes Web Use  Changes to perception  Diff-IE users become more likely to notice change  Provide better estimates of how often content changes  Changes to behavior  Diff-IE users start to revisit more  Revisited pages more likely to have changed  Changes viewed are bigger changes  Content gains value when history is exposed 14% 51% 53%

Change Can Cause Problems  Dynamic menus  Put commonly used items at top  Slows menu item access  Search result change  Results change regularly  Inhibits re-finding Fewer repeat clicks Slower time to click

Change During a Single Query  Results even change as you interact with them

Change During a Single Query  Results even change as you interact with them  Many reasons for change  Intentional to improve ranking  General instability  Analyze behavior when people return after clicking

Understanding When Change Hurts  Metrics  Abandonment  Satisfaction  Click position  Time to click  Mixed impact  Results change Above: 4.5% increase  Results change Below: 1.9% decrease AbandonmentAboveBelow Static36.6%43.1% Change41.4%42.3%

Use Experience to Bias Presentation

Change Blind Search Experience

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Web content changes provide valuable insight People revisit and re-find Web content Explicit support for Web dynamics can impact how people use and understand the Web Relating revisitation and change enables us to  Identify pages for which change is important  Identify interesting components within a page

Thank you. Web Content Change Adar, Teevan, Dumais & Elsas. The Web changes everything: Understanding the dynamics of Web content. WSDM Kulkarni, Teevan, Svore & Dumais. Understanding temporal query dynamics. WSDM Svore, Teevan, Dumais & Kulkarni. Creating temporally dynamic Web search snippets. SIGIR Web Page Revisitation Teevan, Adar, Jones & Potts. Information re-retrieval: Repeat queries in Yahoo’s logs. SIGIR Adar, Teevan & Dumais. Large scale analysis of Web revisitation patterns. CHI Tyler & Teevan. Large scale query log analysis of re-finding. WSDM Teevan, Liebling & Ravichandran. Understanding and predicting personal navigation. WSDM Relating Change and Revisitation Adar, Teevan & Dumais. Resonance on the Web: Web dynamics and revisitation patterns. CHI Teevan, Dumais, Liebling & Hughes. Changing how people view changes on the Web. UIST Teevan, Dumais & Liebling. A longitudinal study of how highlighting Web content change affects people’s web interactions. CHI Lee, Teevan & de la Chica. Characterizing multi-click behavior and the risks and opportunities of changing results during use. SIGIR Jaime

Extra Slides

Sources of Logs to Study Change  Temporal snapshots of content  Picture of what web content looks like  Billions of pages with billions of changes  Difficult to capture personalization and interaction  Behavioral data  Picture of how people interact with that content  Need to relate behavior with actual content seen  Issues with privacy and sharing  Adversarial system use

Ways to Study Impact of Change  Experimental  Intentionally introduce change  May involve degrading experience  Naturalistic  Look for natural change  Source of change can also impact behavior  Logs only show actions  The intention behind actions not captured  Need to complement data collected

Example: AOL Search Dataset  August 4, 2006: Logs released to academic community  3 months, 650 thousand users, 20 million queries  Logs contain anonymized User IDs  August 7, 2006: AOL pulled the files, but already mirrored  August 9, 2006: New York Times identified Thelma Arnold  “A Face Is Exposed for AOL Searcher No ”  Queries for businesses, services in Lilburn, GA (pop. 11k)  Queries for Jarrett Arnold (and others of the Arnold clan)  NYT contacted all 14 people in Lilburn with Arnold surname  When contacted, Thelma Arnold acknowledged her queries  August 21, 2006: 2 AOL employees fired, CTO resigned  September, 2006: Class action lawsuit filed against AOL AnonIDQueryQueryTimeItemRankClickURL jitp :18:181http:// jipt submission process :18:183http:// computational social scinece :19: computational social science :20:042http://socialcomplexity.gmu.edu/phd.php seattle restaurants :25:502http://seattletimes.nwsource.com/rests perlman montreal :15:144http://oldwww.acm.org/perlman/guide.html jitp 2006 notification :13:13 …

Example: AOL Search Dataset  Other well known AOL users  User 927 how to kill your wife  User i love alaska  Anonymous IDs do not make logs anonymous  Contain directly identifiable information Names, phone numbers, credit cards, social security numbers  Contain indirectly identifiable information Example: Thelma’s queries Birthdate, gender, zip code identifies 87% of Americans

Example: Netflix Challenge  October 2, 2006: Netflix announces contest  Predict people’s ratings for a $1 million dollar prize  100 million ratings, 480k users, 17k movies  Very careful with anonymity post-AOL  May 18, 2008: Data de-anonymized  Paper published by Narayanan & Shmatikov  Uses background knowledge from IMDB  Robust to perturbations in data  December 17, 2009: Doe v. Netflix  March 12, 2010: Netflix cancels second competition Ratings 1: [Movie 1 of 17770] 12, 3, [CustomerID, Rating, Date] 1234, 5, [CustomerID, Rating, Date] 2468, 1, [CustomerID, Rating, Date] … Movie Titles … 10120, 1982, “Bladerunner” 17690, 2007, “The Queen” … All customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy... Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one tenth of our complete dataset) and that data was subject to perturbation.

Examples of Diff-IE in Action

Expected New Content

Monitor

Unexpected Important Content

Serendipitous Encounters

Unexpected Unimportant Content

Understand Page Dynamics

Attend to Activity

Edit

Unexpected Unimportant Content Attend to Activity Edit Understand Page Dynamics Serendipitous Encounter Unexpected Important Content Expected New Content Monitor Expected Unexpected

Monitor

Find Expected New Content

Example: Click Entropy  Question: How ambiguous is a query?  Approach: Look at variation in clicks  Click entropy  Low if no variation human computer interaction  High if lots of variation hci Governmentcontractor Recruiting Academic field

Find the Lower Click Variation  v. federal government jobs  find phone number v. msn live search  singapore pools v. singaporepools.com Click entropy = 1.5Click entropy = 2.0 Result entropy = 5.7Result entropy = 10.7 Results change

Find the Lower Click Variation  v. federal government jobs  find phone number v. msn live search  singapore pools v. singaporepools.com  tiffany v. tiffany’s  nytimes v. connecticut newspapers Click entropy = 2.5Click entropy = 1.0 Click position = 2.6Click position = 1.6 Result quality varies

Find the Lower Click Variation  v. federal government jobs  find phone number v. msn live search  singapore pools v. singaporepools.com  tiffany v. tiffany’s  nytimes v. connecticut newspapers  campbells soup recipes v. vegetable soup recipe  soccer rules v. hockey equipment Click entropy = 1.7Click entropy = 2.2 Clicks/user = 1.1Clicks/user = 2.1 Task affects # of clicks Results change Result quality varies