IDM 2003 Workshop Stuff I’ve Seen: Susan Dumais Microsoft Research A System for Personal Information Retrieval and Re-Use
IDM 2003 Workshop Outline Search today Search with Stuff I’ve Seen (SIS) With: Edward Cutrell, JJ Cadiz, Gavin Jancke, Raman Sarin, Daniel Robbins Experiences with SIS Deployment Usage data UI innovations Next steps for SIS
IDM 2003 Workshop Search Today … Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, notes) Often slow “… the No.1 question we're trying to solve [in Longhorn] is ‘Where's my stuff?’ Right now, file space on any PC is a cesspool. “ Bill Gates, FORTUNE interview, June 23, 2002
IDM 2003 Workshop Search With SIS Unified index of stuff you’ve seen All types of information, e.g., files of all types, , calendar, contacts, web pages, etc. Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) Automatic and immediate update of index Rich UI possibilities, since it’s your content Get back to information you’ve seen Re-use vs. initial discovery
IDM 2003 Workshop Related Work Several systems for improving access for specific sources (e.g., web, mail, files, photos, music) Some integration across sources KFTF [Jones et al., 2002] Lifestreams/Scopeware [Fertig, Freeman, Gelernter, 1996] MyLife Bits [Gemmell et al., 2002] Haystack [Adar et al., 1999; Huynh et al. 2002] Commercial products OS: Mac Sherlock, Windows Indexing Service Apps: Enfish, retriever, dtSearch, X1, etc. What’s new with SIS … Full content and metadata for many different sources Extensible architecture Usage experiences and experimental data UI focus
IDM 2003 Workshop SIS Architecture Indexing infrastructure uses MS Search components (note: IR platform) Gatherer – interface to content sources, e.g., files, http, MAPI Filters – decode different file types, e.g., word, powerpoint, html, pdf, journal Tokenizer – break into words, including date normalization, stemming, etc. Indexer – standard inverted index Retriever – Boolean, best match (Okapi) User interface Client side indexing and storage
IDM 2003 Workshop SIS Design Principles Indexing … No additional work is required User sees something, and it gets indexed Retrieval … Fast, flexible Interactive refinement Sort and filter on metadata Note: Sort/filter automatically triggers query UI experiments Previews, Top/Side, Previews, Richer visualizations Richer visualizations
IDM 2003 Workshop SIS Demo
IDM 2003 Workshop SIS Demo Points Search Fast Integrates content from many places Search by full-text or properties, including null queries Sort and filter results Update index in real time, with no explicit user action ?Right-click and other advanced functionality ?Saved queries, queries from other apps, IQ UI alternatives Top/Side Preview/Not Default sort order
IDM 2003 Workshop Evaluating SIS Internal deployment ~1500 downloads Users include: program management, test, sales, development, administrative, executives, etc. Research techniques Free-form feedback Questionnaires; Structured interviews Usage patterns from log data UI experiments (randomly deploy different versions) Lab studies for richer UI (e.g., timeline, trends) But even here must work with users’ own content
IDM 2003 Workshop Top vs. Side Views Previews vs. Not Sort By Date vs. Rank
IDM 2003 Workshop SIS Usage Data Detailed analysis for 234 people, 6 weeks usage Personal store characteristics 5k – 100k items; index <150 meg Query characteristics Short queries (1.59 words) Few advanced operators or fielded search in query box (7.5%) Frequent use of query iteration (48%) 50% refined queries involve filters – type, date most common 35% refined queries involve changes to query 13% refined queries involve re-sort Query content Vs. Spink et al.’s analysis of web queries Importance of people 29% of the queries involve people’s names
IDM 2003 Workshop SIS Usage Data, cont’d Characteristics of items opened File types opened 76% 14% Web pages 10% Files Age of items opened 7% today 22% within the last week 46% within the last month Ease of finding information Easier after SIS for web, , files Non-SIS search decreases for web, , files Log(Freq) = * log(DaysSinceSeen)
IDM 2003 Workshop SIS Usage, cont’d UI Usage Small effects of Top/Side, Previews Sort order Date by far the most common sort field, even for people who had Okapi Rank as default Importance of time Few searches for “best” match; many other criteria … Number of Queries Issued
IDM 2003 Workshop SIS Usage, cont’d Observations about unified access Metadata quality is variable rich, pretty clean Web: little, not very useful for retrieval Files: some, but often wrong Human annotation: don’t depend on it … Need abstractions, e.g., “Useful date” Initially, used ‘date seen’ But … Appointment, when it happens and Web, seen Files, changed What do people remember about time? Memory landmarks
IDM 2003 Workshop SIS, Timeline w/ Landmarks Timeline interface Timeline interface Augmented with landmarks as pointers into human memory Augmented with landmarks as pointers into human memory General: holidays, world events General: holidays, world events Personal: important photos, appointments Personal: important photos, appointments Heuristics or Bayesian models to identify memorable events Heuristics or Bayesian models to identify memorable events
IDM 2003 Workshop SIS, Timeline w/ Landmarks Search ResultsDistribution of Results Over Time Memory Landmarks - General (world, calendar) - Personal (appts, photos)
IDM 2003 Workshop SIS, Timeline Experiment Dates OnlyLandmarks + Dates Search Time (s) With Landmarks Without Landmarks
IDM 2003 Workshop SIS, Visualizing Trends Summarize the results of a search Summarize the results of a search Abstraction beyond individual results Abstraction beyond individual results Grid-based design Grid-based design Axes represent topic, time, people Axes represent topic, time, people Cells encode frequency, recency Cells encode frequency, recency Supports activities like: Supports activities like: What newsgroups are active (on topic x)? What newsgroups are active (on topic x)? What people are active, authoritative (on topic x)? What people are active, authoritative (on topic x)? When did I last interact w/ people? When did I last interact w/ people?
IDM 2003 Workshop SIS, Visualizing Trends
IDM 2003 Workshop SIS, Grid vs. List Experiment Grid View List View
IDM 2003 Workshop Next Steps Continue explorations of rich UI Augment index with “usage” data SIS as service, with many entry points “Contextualize” retrieval Retrieve using Implicit Queries Identify Stuff I Should See Flat-land Good search makes filing less important Attributes rather than directory locations
IDM 2003 Workshop SIS Summary Unified index of stuff you’ve seen Fast access to full-text and metadata Heterogeneous content: files, , web, etc. Automatic and immediate update of index Studied usage with several techniques Ease of finding improves with SIS Importance of people and time Short queries, quick iteration Novel UI to leverage personal memories New capabilities for personal information management More info,
IDM 2003 Workshop Vannevar Bush’s Vision Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. V. Bush (1945). As we may think. Atlantic Monthly, 176, July 1945,