NSA - Nov 4, 2010 Susan Dumais Microsoft Research Information Analysis in a Dynamic and Data-Rich World.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Predicting User Interests from Contextual Information
® Microsoft Office 2010 Browser and Basics.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Personalization and Search Jaime Teevan Microsoft Research.
IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research A System for Personal Information Retrieval and.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Information Retrieval in Practice
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft Research.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
An Interdisciplinary Perspective on IR Susan Dumais Microsoft Research SIGIR 2009 Salton Award Lecture.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Information Retrieval in Practice
How Do People Get Back to Information on the Web? How Can They Do It Better? William Jones, Harry Bruce The Information School University of Washington.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
DiffIE: Changing How You View Changes on the Web DiffIE: Changing How You View Changes on the Web Jaime Teevan, Susan T. Dumais, Daniel J. Liebling, and.
T EMPORAL -I NFORMATICS R ESEARCH Eytan Adar University of Michigan, School of Information and Computer Science & Engineering April 23, 2010.
Overview of Search Engines
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
By Kyle Rector Senior, EECS, OSU. Agenda Background My Approach Demonstration How it works The Survey Plans for User Evaluation Future Plans.
TwitterSearch : A Comparison of Microblog Search and Web Search
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
CCF ADL, Jul Susan Dumais Microsoft Research Temporal Dynamics and Information Retrieval In collaboration.
Facets of Personalization Jaime Teevan Microsoft Research (CLUES) with S. Dumais, E. Horvitz, D. Liebling, E. Adar, J. Elsas, R. Hughes.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
COMPREHENSIVE Windows Tutorial 4 Working with the Internet and .
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
1 nlresearch.com The First ReSearch Engine: Northern Light® Susan M. Stearns Director of Enterprise Marketing March, 1999.
인지구조기반 마이닝 소프트컴퓨팅 연구실 박사 2 학기 박 한 샘 2006 지식기반시스템 응용.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Information Architecture & Design Week 6 Schedule -Group Project Plan Due -Browsing and Searching for IA -Other Readings -Research Topic Presentations.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Search and Access Technologies for Large Scale Web Archives Joseph JaJa, Sangchul Song, and Mike Smorul Institute for Advanced Computer Studies Department.
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
THE WEB CHANGES EVERYTHING Jaime Teevan, Microsoft
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
introductionwhyexamples What is a Web site? A web site is: a presentation tool; a way to communicate; a learning tool; a teaching tool; a marketing important.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
The Web Changes Everything How Dynamic Content Affects the Way People Find Online Jaime Teevan Microsoft Research (CLUES) with S. Dumais, D. Liebling,
Information Retrieval in Practice
Information Retrieval in Practice
Search Engine Architecture
Search User Behavior: Expanding The Web Search Frontier
Introducing the World Wide Web
SIS: A system for Personal Information Retrieval and Re-Use
Presentation transcript:

NSA - Nov 4, 2010 Susan Dumais Microsoft Research Information Analysis in a Dynamic and Data-Rich World

NSA - Nov 4, 2010 Change is Everywhere in Information Systems New documents appear all the time New documents appear all the time Query volume changes over time Query volume changes over time Document content changes over time Document content changes over time So does metadata and extracted info.. So does metadata and extracted info.. User interaction changes over time User interaction changes over time E.g., anchor text, query-click streams, page visits E.g., anchor text, query-click streams, page visits What’s relevant to a query changes over time What’s relevant to a query changes over time E.g., U.S. Open 2010 (in May vs. Sept) E.g., U.S. Open 2010 (in May vs. Sept) E.g., Hurricane Earl (in Sept 2010 vs. before/after) E.g., Hurricane Earl (in Sept 2010 vs. before/after) Change is pervasive in IR systems … yet, we’re not doing much about it ! Change is pervasive in IR systems … yet, we’re not doing much about it !

NSA - Nov 4, 2010 Information Dynamics Content Changes Today’s Browse and Search Experiences But, ignores … User Visitation/ReVisitation

NSA - Nov 4, 2010 Digital Dynamics Easy to Capture Easy to capture Easy to capture But … few tools support dynamics But … few tools support dynamics

NSA - Nov 4, 2010 Overview Characterize change in digital content Characterize change in digital content Content changes over time Content changes over time People re-visit and re-find over time People re-visit and re-find over time Improve retrieval and understanding Improve retrieval and understanding Data -> Models/Systems -> Decision Analysis Data -> Models/Systems -> Decision Analysis Examples Examples Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie) News: Analysis of novelty (e.g., NewsJunkie) Web: Tools for understanding change (e.g., Diff-IE) Web: Tools for understanding change (e.g., Diff-IE) Web: IR models that leverage dynamics Web: IR models that leverage dynamics Examples from our work on search and browser support … but more general Examples from our work on search and browser support … but more general

NSA - Nov 4, 2010 Stuff I’ve Seen (SIS) Many silos of information Many silos of information SIS: SIS: Unified access to distributed, heterogeneous, content (mail, files, web, tablet notes, rss, etc.) Unified access to distributed, heterogeneous, content (mail, files, web, tablet notes, rss, etc.) Index full content + metadata Index full content + metadata Fast, flexible search Fast, flexible search Information re-use Information re-use Stuff I’ve Seen Windows-DS [Dumais et al., SIGIR 2003] SIS -> SIS -> Windows Desktop Search Windows Desktop Search

Example searches Looking for: recent from Fedor that contained a link to his new demo Initiated from: Start menu Query: from:Fedor Looking for: the pdf of a SIGIR paper on context and ranking (not sure it used those words) that someone (don’t remember who) sent me a month ago Initiated from: Outlook Query: SIGIR Looking for: meeting invite for the last intern handoff Initiated from: Start menu Query: intern handoff kind:appointment Looking for: C# program I wrote a long time ago Initiated from: Explorer pane Query: QCluster*.* NSA - Nov 4, 2010

Stuff I’ve Seen: Lessons Learned Personal stores: 5–1500k items Retrieved items: Age: Today (5%), Last week (21%), Last month (47%) Need to support episodic access to memory Information needs: People are important – 25% queries involve names/aliases Date by far the most common sort order, even for people who had best-match Rank as the default Few searches for “best” matching object Many other criteria (e.g., time, people, type), depending on task Need to support flexible access Abstractions important – “useful” date, people, pictures Desktop search != Web search NSA - Nov 4, 2010

Beyond Stuff I’ve Seen Better support for human memory Better support for human memory Memory Landmarks Memory Landmarks LifeBrowser LifeBrowser Phlat Phlat Beyond search Beyond search Proactive retrieval Proactive retrieval Stuff I Should See (IQ) Stuff I Should See (IQ) Temporal Gadget Temporal Gadget Using desktop index as a rich “user model” Using desktop index as a rich “user model” PSearch PSearch News Junkie News Junkie Diff-IE Diff-IE NSA - Nov 4, 2010

Memory Landmarks Importance of episodes in human memory Importance of episodes in human memory Memory organized into episodes (Tulving, 1983) Memory organized into episodes (Tulving, 1983) People-specific events as anchors (Smith et al., 1978) People-specific events as anchors (Smith et al., 1978) Time of events often recalled relative to other events, historical or autobiographical (Huttenlocher & Prohaska, 1997) Time of events often recalled relative to other events, historical or autobiographical (Huttenlocher & Prohaska, 1997) Identify and use landmarks facilitate search and information management Identify and use landmarks facilitate search and information management Timeline interface, augmented w/ landmarks Timeline interface, augmented w/ landmarks Learn Bayesian models to identify memorable events Learn Bayesian models to identify memorable events Extensions beyond search, e.g., Life Browser Extensions beyond search, e.g., Life Browser NSA - Nov 4, 2010

Memory Landmarks Search Results Memory Landmarks - General (world, calendar) - Personal (appts, photos) Linked to results by time Distribution of Results Over Time [Ringle et al., 2003]

Memory Landmarks Learned models of memorability NSA - Nov 4, 2010

Images & videos Appts & events Desktop & search activity Whiteboard capture Locations LifeBrowser [Horvitz & Koch, 2010] NSA - Nov 4, 2010

LifeBrowser Learned models of selective memory NSA - Nov 4, 2010

Personalized news using information novelty Personalized news using information novelty Stream of news -> cluster groups of related articles Stream of news -> cluster groups of related articles Characterize what a user knows Characterize what a user knows Compute the novelty of new articles, relative to this background (relevant & novel) Compute the novelty of new articles, relative to this background (relevant & novel) Novelty = KLDivergence (article || current_knowledge) Novelty = KLDivergence (article || current_knowledge) Novelty score and user preferences guide what to show when and how to do so Novelty score and user preferences guide what to show when and how to do so [Gabrilovich et al., WWW 2004] News Junkie Evolution of Context over Time

NSA - Nov 4, 2010 NewsJunkie Types of novelty, via intra-article novelty dynamics Offshoot: SARS impact on Asian stock markets Word Position Novelty Score On-topic, recap Word Position Novelty Score On topic, elaboration: SARS patient’s wife held under quarantine Word Position Novelty Score Offshoot: Swiss company develops SARS vaccine Word Position Novelty Score

NSA - Nov 4, Content Changes User Visitation/ReVisitation Characterizing Web Change Large-scale Web crawls, over time Large-scale Web crawls, over time Revisited pages Revisited pages 55,000 pages crawled hourly for 18+ months 55,000 pages crawled hourly for 18+ months Unique users, visits/user, time between visits Unique users, visits/user, time between visits Pages returned by a search engine (for ~ 100k queries) Pages returned by a search engine (for ~ 100k queries) 6 million pages crawled every two days for 6 months 6 million pages crawled every two days for 6 months [Adar et al., WSDM 2009]

NSA - Nov 4, 2010 Measuring Web Page Change Summary metrics Summary metrics Number of changes Number of changes Amount of change Amount of change Time between changes Time between changes Change curves Change curves Fixed starting point Fixed starting point Measure similarity over different time intervals Measure similarity over different time intervals Within-page changes Within-page changes

NSA - Nov 4, 2010 Measuring Web Page Change Summary metrics Summary metrics Number of changes Number of changes Amount of change Amount of change Time between changes Time between changes 33% of Web pages change 33% of Web pages change 66% of visited Web pages change 66% of visited Web pages change 63% of these change every hr. 63% of these change every hr. Avg. Dice coeff. = 0.80 Avg. Dice coeff. = 0.80 Avg. time bet. change = 123 hrs. Avg. time bet. change = 123 hrs..edu and.gov pages change infrequently, and not by much.edu and.gov pages change infrequently, and not by much popular pages change more frequently, but not by much popular pages change more frequently, but not by much

NSA - Nov 4, 2010 Measuring Web Page Change Summary metrics Summary metrics Number of changes Number of changes Amount of change Amount of change Time between changes Time between changes Change curves Change curves Fixed starting point Fixed starting point Measure similarity over different time intervals Measure similarity over different time intervals Knot point Time from starting point

NSA - Nov 4, 2010 Measuring Within-Page Change DOM-level changes DOM-level changes Term-level changes Term-level changes Divergence from norm Divergence from norm cookbooks cookbooks salads salads cheese cheese ingredient ingredient bbq bbq … “Staying power” in page “Staying power” in page Time Sep. Oct. Nov. Dec.

NSA - Nov 4, 2010 Example Term Longevity Graphs

NSA - Nov 4, 2010 Revisitation on the Web Content Changes User Visitation/ReVisitation What was the last Web page you visited? Why did you visit (re-visit) the page? Revisitation patterns Revisitation patterns Log analyses Log analyses Toolbar logs for revisitation Toolbar logs for revisitation Query logs for re-finding Query logs for re-finding User survey to understand intent in revisitations User survey to understand intent in revisitations [Adar et al., CHI 2009]

60-80% of Web pages you visit, you’ve visited before 60-80% of Web pages you visit, you’ve visited before Many motivations for revisits Many motivations for revisits NSA - Nov 4, 2010 Measuring Revisitation Summary metrics Summary metrics Unique visitors Unique visitors Visits/user Visits/user Time between visits Time between visits Revisitation curves Revisitation curves Histogram of revisit intervals Histogram of revisit intervals Normalized Normalized Time Interval

NSA - Nov 4, 2010 Four Revisitation Patterns Fast Fast Hub-and-spoke Hub-and-spoke Navigation within site Navigation within site Hybrid Hybrid High quality fast pages High quality fast pages Medium Medium Popular homepages Popular homepages Mail and Web applications Mail and Web applications Slow Slow Entry pages, bank pages Entry pages, bank pages Accessed via search engine Accessed via search engine

NSA - Nov 4, 2010 Revisitation and Search (ReFinding) Repeat query (33%) Q: microsoft research Repeat click (39%) Q: microsoft research, msr … Big opportunity (43%) 24% “navigational revisits” [Teevan et al., SIGIR 2007] [Tyler et al., WSDM 2010]

NSA - Nov 4, 2010 Building Support for Web Dynamics Content Changes Diff IE Temporal IR User Visitation/ReVisitation

NSA - Nov 4, 2010 Diff-IE Changes to page since your last visit Diff-IE toolbar [Teevan et al., UIST 2009] [Teevan et al., CHI 2010]

NSA - Nov 4, 2010 Interesting Features of Diff-IE Always on In-situ New to you Non-intrusive

NSA - Nov 4, 2010 Examples of Diff-IE in Action

NSA - Nov 4, 2010 Expected New Content

NSA - Nov 4, 2010 Monitor

Unexpected Important Content

NSA - Nov 4, 2010 Serendipitous Encounters

NSA - Nov 4, 2010 Understand Page Dynamics

NSA - Nov 4, 2010 Unexpected Unimportant Content Attend to Activity Edit Understand Page Dynamics Serendipitous Encounter Unexpected Important Content Expected New Content Monitor Expected Unexpected

NSA - Nov 4, 2010 Studying Diff-IE Feedback buttons Feedback buttons Survey Survey Prior to installation Prior to installation After a month of use After a month of use Logging Logging URLs visited URLs visited Amount of change when revisited Amount of change when revisited Experience interview Experience interview In situ Representative Experience Longitudinal

NSA - Nov 4, 2010 People Revisit More Perception of revisitation remains constant Perception of revisitation remains constant How often do you revisit? How often do you revisit? How often are revisits to view new content? How often are revisits to view new content? Actual revisitation increases Actual revisitation increases First week: 39.4% of visits are revisits First week: 39.4% of visits are revisits Last week: 45.0% of visits are revisits Last week: 45.0% of visits are revisits Why are people revisiting more with DIFF-IE? Why are people revisiting more with DIFF-IE? 14%

NSA - Nov 4, 2010 Revisited Pages Change More Perception of change increases Perception of change increases What proportion of pages change regularly? What proportion of pages change regularly? How often do you notice unexpected change? How often do you notice unexpected change? Amount of change seen increases Amount of change seen increases First week: 21.5% revisits, changed by 6.2% First week: 21.5% revisits, changed by 6.2% Last week: 32.4% revisits, changed by 9.5% Last week: 32.4% revisits, changed by 9.5% Diff-IE is driving visits to changed pages Diff-IE is driving visits to changed pages It supports people in understanding change It supports people in understanding change 51+% 17% 8%

Other Examples of Dynamics and User Experience Content changes Content changes Diff-IE Diff-IE Zoetrope (Adar et al., 2008) Zoetrope (Adar et al., 2008) Diffamation (Chevalier et al., 2010) Diffamation (Chevalier et al., 2010) Temporal summaries and snippets … Temporal summaries and snippets … Interaction changes Interaction changes Explicit annotations, ratings, wikis, etc. Explicit annotations, ratings, wikis, etc. Implicit interest via interaction patterns Implicit interest via interaction patterns Edit wear and read wear (Hill et al., 1992) Edit wear and read wear (Hill et al., 1992) NSA - Nov 4, 2010

Leveraging Dynamics for Retrieval Content Changes User Visitation/ReVisitation Temporal IR

Temporal Retrieval Models Current IR algorithms look only at a single snapshot of a page Current IR algorithms look only at a single snapshot of a page But, Web pages change over time But, Web pages change over time Can we can leverage this to improved retrieval? Can we can leverage this to improved retrieval? Pages have different rates of change Pages have different rates of change Different priors (using change vs. link structure) Different priors (using change vs. link structure) Terms have different longevity (staying power) Terms have different longevity (staying power) Some are always on the page; some transient Some are always on the page; some transient Language modeling approach to ranking Language modeling approach to ranking NSA - Nov 4, 2010 Change prior Term longevity [Elsas et al., WSDM 2010]

Relevance and Page Change Page change is related to relevance judgments Page change is related to relevance judgments Human relevance judgments Human relevance judgments 5 point scale – Perfect/Excellent/Good/Fair/Bad 5 point scale – Perfect/Excellent/Good/Fair/Bad Rate of Change -- 60% Perfect pages; 30% Bad pages Rate of Change -- 60% Perfect pages; 30% Bad pages Use change rate as a document prior (vs. priors based on link structure like Page Rank) Use change rate as a document prior (vs. priors based on link structure like Page Rank) Shingle prints to measure change Shingle prints to measure change NSA - Nov 4, 2010 Change prior

Relevance and Term Change Terms patterns vary over time Terms patterns vary over time Represent a document as a mixture of terms with different “staying power” Represent a document as a mixture of terms with different “staying power” Long, Medium, Short Long, Medium, Short NSA - Nov 4, 2010 Term longevity

Evaluation: Queries & Documents 18K Queries, 2.5M Judged Documents 18K Queries, 2.5M Judged Documents 5-level relevance judgment (Perfect … Bad) 5-level relevance judgment (Perfect … Bad) 2.5M Documents crawled weekly for 10 wks 2.5M Documents crawled weekly for 10 wks Navigational queries Navigational queries 2k queries identified with a “Perfect” judgment 2k queries identified with a “Perfect” judgment Assume these relevance judgments are consistent over time Assume these relevance judgments are consistent over time NSA - Nov 4, 2010

Experimental Results Baseline Static Model Dynamic Model Dynamic Model + Change Prior Change Prior NSA - Nov 4, 2010

Temporal Retrieval, Ongoing Work Initial evaluation/model Initial evaluation/model Focused on navigational queries Focused on navigational queries Assumed their relevance is “static” over time Assumed their relevance is “static” over time But, there are many other cases … But, there are many other cases … E.g., US Open 2010 (in June vs. Sept) E.g., US Open 2010 (in June vs. Sept) E.g., World Cup Results (in 2010 vs. 2006) E.g., World Cup Results (in 2010 vs. 2006) Ongoing evaluation Ongoing evaluation Collecting explicit relevance judgments, interaction data, page content, and query frequency over time Collecting explicit relevance judgments, interaction data, page content, and query frequency over time NSA - Nov 4, 2010 [Kulkarni et al., WSDM 2011]

Relevance over Time Query: march madness [Mar 15 – Apr 4, 2010] Query: march madness [Mar 15 – Apr 4, 2010] NSA - Nov 4, 2010 Time (weeks ) Relevance Judgment (normalized) Mar 25, 2010 May 28, 2010

Other Examples of Dynamics and IR Models/Systems Temporal retrieval models Temporal retrieval models Elsas & Dumais (2010); Liu & Croft (2004); Efron (2010); Aji et al. (2010) Elsas & Dumais (2010); Liu & Croft (2004); Efron (2010); Aji et al. (2010) Document dynamics, for crawling and indexing Document dynamics, for crawling and indexing Efficient crawling and updates Efficient crawling and updates Query dynamics Query dynamics Jones & Diaz (2004); Kulkarni et al. (2011); Kotov et al. (2010) Jones & Diaz (2004); Kulkarni et al. (2011); Kotov et al. (2010) Integration of news into Web results Integration of news into Web results Diaz (2009) Diaz (2009) Extraction of temporal entities within documents Extraction of temporal entities within documents Protocols for retrieving versions over time Protocols for retrieving versions over time E.g., Memento (Von de Sompel et al., 2010) E.g., Memento (Von de Sompel et al., 2010) NSA - Nov 4, 2010

Summary Characterize change in information systems Characterize change in information systems Content changes over time Content changes over time People re-visit and re-find over time People re-visit and re-find over time Develop new algorithms and systems to improve retrieval and understanding of dynamic collections Develop new algorithms and systems to improve retrieval and understanding of dynamic collections Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser Desktop: Stuff I’ve Seen; Memory Landmarks; LifeBrowser News: Analysis of novelty (e.g., NewsJunkie) News: Analysis of novelty (e.g., NewsJunkie) Web: Tools for understanding change (e.g., Diff-IE) Web: Tools for understanding change (e.g., Diff-IE) Web: Retrieval models that leverage dynamics Web: Retrieval models that leverage dynamics

Key Themes Dynamics are pervasive in information systems Dynamics are pervasive in information systems Data-driven approaches increasingly important Data-driven approaches increasingly important Data -> Models/Systems -> Decision Analysis Data -> Models/Systems -> Decision Analysis Large volumes of data are available Large volumes of data are available Data scarcity is being replaced by data surfeit Data scarcity is being replaced by data surfeit Data exhaust is a critical resource in the internet economy, and can be more broadly leveraged Data exhaust is a critical resource in the internet economy, and can be more broadly leveraged Machine learning and probabilistic models are replacing hand-coded rules Machine learning and probabilistic models are replacing hand-coded rules Broadly applicable Broadly applicable Search … plus: machine translation, spam filtering, traffic flow, evidence-based health care, skill assessment, etc. Search … plus: machine translation, spam filtering, traffic flow, evidence-based health care, skill assessment, etc. NSA - Nov 4, 2010

Thank You ! Questions/Comments … More info, Diff-IE (coming soon),