Haystack: an Adaptive Personalized Information Retrieval System David Karger, Lynn Stein, Eytan Adar, Mark Asdoorian, Aidan Low, Jing Qian, Orion Richardson MIT Laboratory for Computer Science and AI Laboratory, Cambridge MA 02139, USA What is Haystack? The Bookshelf Metaphor Integration An information retrieval system focused on exploiting interaction with individuals complements large search engines treats different people differently Interesting research issues: Heterogeneous Data: Deal with the variety of content individuals tend to collect User Interface: offer ubiquitous access Big Brother: Develop user interface tools to gather all possible information about users Learning: Develop mechanisms for letting past user reactions influence future system actions Collaboration: Share data and metadata among a large community of users Search Engines are Like Libraries Massive corpora: Mostly irrelevant, often out of date Anonymous: Treat all users exactly the same Rigid: Use librarians’ one-for-all ontology People prefer to start with bookshelves My data: Information gathered personally. High quality, easy to understand. Annotated. My organization: owner-chosen subject partition. Best items near desk. Then they turn to colleagues Trust: Colleagues are authorities on other topics; can recommend good data Leverage: Colleagues have organized their data; makes searching easy Haystack archives all user content, adds metadata Plug in components extract data from content Textifiers: ascii, html, postscript, ocr.... Field finders: author, title, date, summary.... Haystack mediates user-selected search tools Text search: mg, verity, isearch, grep.... Database: LORE Haystack can be used without thinking access during all standard activities (mail, web, edit) application-specific stubs talk to kernel keyword search for file to edit archive and annotate current web page Target Queries Adaptation Collaboration What research is being done on multicast? Goal: improved performance over time Annotations by user user can add to/change all data/metadata requires active intervention, so undesirable Observation of query process user performs query; haystack returns results user selects relevant result haystack records connection for future queries adapt using machine learning techniques Observation of general activity (proposed) items that are used a lot items that are used together items used after a search Where is the email about Haystack that I sent Lynn last month? Leverage individual users’ Haystacks simple RPC to query other Haystacks’ data gather data from several; combine evidence Exploits self interest individuals seek/organize info for own gain organization provides benefit to others Need to identify “expert colleagues” Haystacks of people I contact often Haystack with much overlapping data Haystacks that gave good answers in past referrals Collaborative Filtering model & techniques CF finds “stuff you’ll like” Haystack finds “stuff you’ll like for this query” Which DARPA BAAs should I read? Current Status Prototype completed Fall 1997 some functionality in all categories; limited extensibility Kernel reimplemented from scratch, due Summer 1998 Check out Web Site at http://www.ai.mit.edu/projects/haystack