November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies University of Maryland, College Park
Outline Search process Evaluation Research agenda
The E-Discovery Context Classic “Information Retrieval” –Goal: satisfy a visceral information need –Understanding of need evolves during search –Personal view of relevance E-Discovery –Goal: identify a set of responsive documents –Negotiated information need (with iteration) –Agreed / defensible / explainable process
Supporting Personal Searching Source Selection Search Query Selection Ranked List Examination Document Delivery Document Query Formulation IR System Query Reformulation and Relevance Feedback Source Reselection NominateChoosePredict
Current E-Discovery Support Search Query Result Set Review Delivery Responsive Documents Query Formulation Indexing Index Acquisition Collection
Future E-Discovery Support Source Selection Search Query Selection Ranked List Incremental Review Result Set Delivery Responsive Documents Query Formulation IR System Indexing Index Acquisition Collection
Incremental Ranked Review Limit to:Marlboro NOT “Upper Marlboro” Rank by:tobacco, {policy staff} Limit to:Marlboro NOT “Upper Marlboro” Rank by:regulation, manipulation
Evaluation Design Issues Real information needs are easily captured –Byproduct of negotiation process Repeatable process models lack some fidelity –Tune system to query (not query to system) –Pooling reveals relative (not absolute) recall –Solution: augment lab studies with real cases Confidentiality / privilege limit access to data –Corporate collections are incomplete –Public collections are less representative
TREC-2006 Legal Track Collection options –Tobacco settlement (scanned/OCR) –State department cables –Enron ( ) Schedule –Nov 05Planning conference –Mar 06Guidelines (collection, topics, …) –Jul 06Experiments –Sep 06Ground truth judgments –Nov 06Report Results
Research at Maryland Center for Information Policy –Best practices: records management NDIIPP –Law firm records –Policy: retention, access Joint Institute for Knowledge Discovery –Enron , phone calls, and databases –Linking: text-metadata, cross-source MALACH –Oral history recordings –Search: conversational speech
Looking Forward Continue this dialog –Technology drives process innovation –Process requirements drive technical innovation Craft a balanced investment strategy –Use TREC to explore system design Create reusable test collections Foster development of a research community –User studies –Process standards