THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft Research.

Slides:



Advertisements
Similar presentations
Using the Self Service BMC Helpdesk
Advertisements

Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
GP2013 (R2) New features in GP2013 (R2). New Ribbon for windows Edit List is the Print button on the right without the paper background Action pane can.
FindAll: A Local Search Engine for Mobile Phones Aruna Balasubramanian University of Washington.
Personalization and Search Jaime Teevan Microsoft Research.
Objectives Moodle is an online learning environment where instructors & their students interact. In this workshop you will learn: 1.Configure system requirements.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
MSN and competing Web portals Presented by : Meiyu Xie Course : COMP 1631, Winter 2011 Date:
DiffIE: Changing How You View Changes on the Web DiffIE: Changing How You View Changes on the Web Jaime Teevan, Susan T. Dumais, Daniel J. Liebling, and.
Information Re-Retrieval: Repeat Queries in Yahoo’s Logs Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts SIGIR 2007.
T EMPORAL -I NFORMATICS R ESEARCH Eytan Adar University of Michigan, School of Information and Computer Science & Engineering April 23, 2010.
Google Account Basics: Getting Started with free Google applications.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft Reseachdub 2013.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
WageIndicator SEO, December 10, 2008 Irene van Beveren Today: 0.Why SEO is important 1.Keyword Strategies 2.Title Tags 3.Internal Links 4.Duplicate Content.
Finding and Re-Finding Through Personalization Jaime Teevan MIT, CSAIL David Karger (advisor), Mark Ackerman, Sue Dumais, Rob Miller (committee), Eytan.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
SLOW SEARCH Jaime Teevan, Kevyn Collins-Thompson, Ryen White, Susan Dumais and Yubin Kim.
TwitterSearch : A Comparison of Microblog Search and Web Search
n Introduction Introduction n Making a source list Making a source list Making a source list n Preparing a Works Cited list Preparing a Works Cited.
Jaime Teevan Microsoft Research Finding and Re-Finding Personal Information.
How Are You? San Clemente Presbyterian Church Kathy McConnell Sept 9, 2009.
2 Session S105 FISAP On The Web n eCB’s FISAP On The Web
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Slide No. 1 Searching the Web H Search engines and directories H Locating these resources H Using these resources H Interpreting results H Locating specific.
CCF ADL, Jul Susan Dumais Microsoft Research Temporal Dynamics and Information Retrieval In collaboration.
Facets of Personalization Jaime Teevan Microsoft Research (CLUES) with S. Dumais, E. Horvitz, D. Liebling, E. Adar, J. Elsas, R. Hughes.
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Personalized Search Adrian Salido-Moreno. Outline  Introduction  Implementations (google, yahoo, eurekster)  Search Engine Optimization  Privacy 
#TwitterSearch: A Comparison Of Microblog Search And Web Search ( WSDM’11 ) Speaker:Chiang,guang-ting Advisor: Dr. Koh. Jia-ling 1.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Task 1 Research on any 2 of the following: Online shopping Online banking Web broadcasting Social networking sites Discuss the disadvantages and advantages.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
USING LARGE SCALE LOG ANALYSIS TO UNDERSTAND HUMAN BEHAVIOR Jaime Teevan, Microsoft ReseachUNC 2015.
December 16, 2014Research Administrators Workgroup.
NSA - Nov 4, 2010 Susan Dumais Microsoft Research Information Analysis in a Dynamic and Data-Rich World.
Blogs, Wikis and Podcasting  By Zach, Andrew and Sam.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
THE COMPLEX TASK OF MAKING SEARCH SIMPLE Jaime Teevan Microsoft Research UMAP 2015.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Understanding and Predicting Personal Navigation.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
Social Media and First Financial Bankshares Dave Hogan September 21, 2009.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
NASBLA Social Media: What is it for? NASBLA is involved in numerous Social Media that all serve a distinct purpose. So, what are they all for?
Site Speed: The Ultimate UX Feature… for SEO. A case study on how to increase search engine crawling and online conversion Jonathon Colman In-House SEO.
Jaime Teevan MIT, CSAIL The Re:Search Engine. “Pick a card, any card.”
Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
The Web Changes Everything How Dynamic Content Affects the Way People Find Online Jaime Teevan Microsoft Research (CLUES) with S. Dumais, D. Liebling,
SLOW SEARCH Jaime Teevan, Microsoft In collaboration with Michael S. Bernstein, Kevyn Collins- Thompson, Susan T. Dumais, Shamsi T.
HCI problems in computer security Mark Ryan. Electronic voting.
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Click It, No More Tick It: Using “Gimlet” Desk Statistics to Improve Services at the Charles W. Chesnutt Library Velappan velappan, M.S., M.L.I.S., Head.
Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.
Simultaneous Support for Finding and Re-Finding
Search User Behavior: Expanding The Web Search Frontier
USER’S GUIDE SERVICE CAPABILITIES Explore the eLibrary
Presentation transcript:

THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft Research

The Web Changes Everything Content Changes JanuaryFebruaryMarch April May JuneJuly August September

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Today’s tools focus on the present But there’s so much more information available!

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes  Large scale Web crawl over time  Revisited pages 55,000 pages crawled hourly for 18+ months  Judged pages (relevance to a query) 6 million pages crawled every two days for 6 months

Measuring Web Page Change  Summary metrics  Number of changes  Time between changes  Amount of change Top level pages change by more and faster than pages with long URLS..edu and.gov pages do not change by very much or very often News pages change quickly, but not as drastically as other types of pages

Measuring Web Page Change  Summary metrics  Number of changes  Time between changes  Amount of change  Change curves  Fixed starting point  Measure similarity over different time intervals Knot point

Measuring Within-Page Change  DOM structure changes  Term use changes  Divergence from norm cookbooks frightfully merrymaking ingredient latkes  Staying power in page Time Sep. Oct. Nov. Dec.

Accounting for Web Dynamics  Avoid problems caused by change  Caching, archiving, crawling  Use change to our advantage  Ranking Match term’s staying power to query intent  Snippet generation Tom Bosley - Wikipedia, the free encyclopedia Thomas Edward "Tom" Bosley (October 1, 1927 October 19, 2010) was an American actor, best known for portraying Howard Cunningham on the long-running ABC sitcom Happy Days. Bosley was born in Chicago, the son of Dora and Benjamin Bosley. en.wikipedia.org/wiki/tom_bosley Tom Bosley - Wikipedia, the free encyclopedia Bosley died at 4:00 a.m. of heart failure on October 19, 2010, at a hospital near his home in Palm Springs, California. … His agent, Sheryl Abrams, said Bosley had been battling lung cancer. en.wikipedia.org/wiki/tom_bosley

Revisitation on the Web JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September What’s the last Web page you visited?  Revisitation patterns  Log analysis Browser logs for revisitation Query logs for re-finding  User survey for intent

Measuring Revisitation  Summary metrics  Unique visitors  Visits/user  Time between visits  Revisitation curves  Revisit interval histogram  Normalized Time Interval

Four Revisitation Patterns  Fast  Hub-and-spoke  Navigation within site  Hybrid  High quality fast pages  Medium  Popular homepages  Mail and Web applications  Slow  Entry pages, bank pages  Accessed via search engine

Search and Revisitation  Repeat query (33%)  web science conference  Repeat click (39%)   Query  websci 11  Lots of repeats (43%)  Many navigational Repeat Click New Click Repeat Query 33%29%4% New Query 67%10%57% 39%61%

7th

How Revisitation and Change Relate JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Why did you revisit the last Web page you did?

Possible Relationships  Interested in change  Monitor  Effect change  Transact  Change unimportant  Find  Change can interfere  Re-find

Understanding the Relationship  Compare summary metrics  Revisits: Unique visitors, visits/user, interval  Change: Number, interval, similarity 2 visits/user 3 visits/user 4 visits/user 5 or 6 visits/user 7+ visits/user Number of changesTime between changesSimilarity 2 visits/user visits/user visits/user or 6 visits/user visits/user

Comparing Change and Revisit Curves  Three pages  New York Times  Woot.com  Costco  Similar change patterns  Different revisitation  NYT: Fast (news, forums)  Woot: Medium  Costco: Slow (retail) Time

Within-Page Relationship  Page elements change at different rates  Pages revisited at different rates  Resonance can serve as a filter for interesting content

Building Support for Web Dynamics JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September

Exposing Change with Diff-IE Diff-IE toolbar Changes to page since your last visit

Interesting Features of Diff-IE Always on In-situ New to you Non-intrusive

Examples of Diff-IE in Action

Expected New Content

Monitor

Unexpected Important Content

Serendipitous Encounters

Unexpected Unimportant Content

Understand Page Dynamics

Attend to Activity

Edit

Unexpected Unimportant Content Attend to Activity Edit Understand Page Dynamics Serendipitous Encounter Unexpected Important Content Expected New Content Monitor Expected Unexpected

Monitor

Find Expected New Content

Studying Diff-IE JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o InstallDiff-IE SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o SURVEY How often do pages change? o o o o o How often do you revisit? o o o o o

Seeing Change Changes Web Use  Changes to perception  Diff-IE users become more likely to notice change  Provide better estimates of how often content changes  Changes to behavior  Diff-IE users start to revisit more  Revisited pages more likely to have changed  Changes viewed are bigger changes  Content gains value when history is exposed 14% 51% 53%

The Web Changes Everything JanuaryFebruaryMarch April May JuneJuly August September Content Changes People Revisit JanuaryFebruaryMarch April May JuneJuly August September Web content changes provide valuable insight People revisit and re-find Web content Explicit support for Web dynamics can impact how people use and understand the Web Relating revisitation and change enables us to  Identify pages for which change is important  Identify interesting components within a page

Thank you. Web Content Change Adar, Teevan, Dumais & Elsas. The Web changes everything: Understanding the dynamics of Web content. WSDM Elsas & Dumais. Leveraging temporal dynamics of doc. content in relevance ranking. WSDM Kulkarni, Teevan, Svore & Dumais. Understanding temporal query dynamics. WSDM Web Page Revisitation Teevan, Adar, Jones & Potts. Information re-retrieval: Repeat queries in Yahoo’s logs. SIGIR Adar, Teevan & Dumais. Large scale analysis of Web revisitation patterns. CHI Tyler & Teevan. Large scale query log analysis of re-finding. WSDM Teevan, Liebling & Ravichandran. Understanding and predicting personal navigation. WSDM Relating Change and Revisitation Adar, Teevan & Dumais. Resonance on the Web: Web dynamics and revisitation patterns. CHI Studying Diff-IE Teevan, Dumais, Liebling & Hughes. Changing how people view changes on the Web. UIST Teevan, Dumais & Liebling. A longitudinal study of how highlighting Web content change affects people’s web interactions. CHI 2010.

Extra Slides

Example: AOL Search Dataset  August 4, 2006: Logs released to academic community  3 months, 650 thousand users, 20 million queries  Logs contain anonymized User IDs  August 7, 2006: AOL pulled the files, but already mirrored  August 9, 2006: New York Times identified Thelma Arnold  “A Face Is Exposed for AOL Searcher No ”  Queries for businesses, services in Lilburn, GA (pop. 11k)  Queries for Jarrett Arnold (and others of the Arnold clan)  NYT contacted all 14 people in Lilburn with Arnold surname  When contacted, Thelma Arnold acknowledged her queries  August 21, 2006: 2 AOL employees fired, CTO resigned  September, 2006: Class action lawsuit filed against AOL AnonIDQueryQueryTimeItemRankClickURL jitp :18:181http:// jipt submission process :18:183http:// computational social scinece :19: computational social science :20:042http://socialcomplexity.gmu.edu/phd.php seattle restaurants :25:502http://seattletimes.nwsource.com/rests perlman montreal :15:144http://oldwww.acm.org/perlman/guide.html jitp 2006 notification :13:13 …

Example: AOL Search Dataset  Other well known AOL users  User 927 how to kill your wife  User i love alaska  Anonymous IDs do not make logs anonymous  Contain directly identifiable information Names, phone numbers, credit cards, social security numbers  Contain indirectly identifiable information Example: Thelma’s queries Birthdate, gender, zip code identifies 87% of Americans

Example: Netflix Challenge  October 2, 2006: Netflix announces contest  Predict people’s ratings for a $1 million dollar prize  100 million ratings, 480k users, 17k movies  Very careful with anonymity post-AOL  May 18, 2008: Data de-anonymized  Paper published by Narayanan & Shmatikov  Uses background knowledge from IMDB  Robust to perturbations in data  December 17, 2009: Doe v. Netflix  March 12, 2010: Netflix cancels second competition Ratings 1: [Movie 1 of 17770] 12, 3, [CustomerID, Rating, Date] 1234, 5, [CustomerID, Rating, Date] 2468, 1, [CustomerID, Rating, Date] … Movie Titles … 10120, 1982, “Bladerunner” 17690, 2007, “The Queen” … All customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy... Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one tenth of our complete dataset) and that data was subject to perturbation.