E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science.

Slides:



Advertisements
Similar presentations
Electronic Evidence Joe Kashi. Todays Program Types of Electronically stored information Types of Electronically stored information Accessibility and.
Advertisements

Litigation Holds: Don’t Live in Fear of Spoliation Jason CISO – University of Connecticut October 30, 2014 Information Security Office.
INFORMATION WITHOUT BORDERS CONFERENCE February 7, 2013 e-DISCOVERY AND INFORMATION MANAGEMENT.
1 ELECTRONIC DATA & DISCRIMINATION INVESTIGATIONS Peter J. Constantine U.S. Department of Labor Office of the Solicitor.
Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo
Kevin Deardorff Assistant Division Chief, Decennial Management Division U.S. Census Bureau 2014 SDC / CIC Conference April 2, Census Updates.
E-Discovery Revisited: A Broader Perspective for IR Researchers Jack G. Conrad, Thomson R&D ICAIL07 / DESI Workshop June 4, 2007.
LBSC 708X The Record Nature of Electronic Records College of Information Studies.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Developing a Records & Information Retention & Disposition Program:
INFO 624 Week 3 Retrieval System Evaluation
©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade
1 E-Discovery Changes to Federal Rules of Civil Procedure Concerning Discovery of Electronically Stored Information (ESI) Effective Date: 12/01/2006 October,
Lessons Learned from Information Retrieval Chris Buckley Sabir Research
The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,
Semantic Interoperability Community of Practice (SICoP) Semantic Web Applications for National Security Conference Hyatt Regency Crystal City, Regency.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Thinking Outside the Boolean Box: Metastrategies for Conducting Legally Defensible Searches in an Expanding ESI Universe ICAIL 2007, Palo Alto, California.
Web 3.0 or The Semantic Web By: Konrad Sit CCT355 November 21 st 2011.
Electronic Discovery (eDiscovery) Chad Meyer & John Vyhlidal ConAgra Foods.
Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
WikiQuery.org -- An interactive collaboration interface for creating, storing and sharing effective CNF queries Le Zhao*, Xiaozhong Liu #, Jamie Callan*
Electronic Record Retention and eDiscovery Peter Pepiton eDiscovery Product Manager CA Information Governance.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Session 12: Receiving a Production LBSC 708X/INFM 718X Seminar on E-Discovery Jason R. Baron Adjunct Faculty University of Maryland April 12, 2012.
Lawyers, Language and Legal Risk: Emerging Issues in E-Discovery Ontolog-Forum Jason R. Baron Director of Litigation Office of General Counsel U.S. National.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Electronic Public Record What is it, and Where Can Agency Lawyers Find It?
Overview of the research process. Purpose of research  Research with us since early days (why?)  Main reasons: Explain why things are the way they are.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Alexander Consulting Enterprise 9/7/2015 Marketing Research in Global Markets Why Bother?
Project web site: old.libqual.org LibQUAL+™ from a Technological Perspective: A Scalable Web-Survey Protocol across Libraries Spring 2003 CNI Task Force.
1 The BT Digital Library A case study in intelligent content management Paul Warren
The Sedona Principles 1-7
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Marco Nasca Senior Director, Client Solutions TRANSFORMING DISCOVERY THROUGH DATA MANAGEMENT.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
WISER Social Sciences: Politics & International Relations Gillian Beattie (Social Science Library) Jane Rawson (Vere Harmsworth Library)
George M. Aloth. Definitions  E-Discovery = The collection, preparation, review and production of electronic documents in litigation discovery. This.
OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Against: The Liberal Definition and use of Litigation Holds Team 9.
P RINCIPLES 1-7 FOR E LECTRONIC D OCUMENT P RODUCTION Maryanne Post.
Jimmy Coleman.  The Sedona Conference  The Electronic Discovery Reference Model Project  The Federal Judicial.
The Challenge of Rule 26(f) Magistrate Judge Craig B. Shaffer July 15, 2011.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Electronic Records Management: A New Understanding of Policy, Compliance, and Discovery Robert J. Sobie, Ph.D. Director Information Systems Department.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
ITGS Databases.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Records Management for Paper and ESI Document Retention Policies addressing creation, management and disposition Minimize the risk and exposure Information.
UNIT 8 SEMINAR COLLABORATION IN THE WORKPLACE SUSAN HARRELL KAPLAN UNIVERSITY CM 415 Effective and Appropriate Communication in the Workplace.
Information Retrieval
The Sedona Principles November 16, Background- What is The Sedona Conference The Sedona Conference is an educational institute, established in 1997,
Legal Holds Department of State Division of Records Management Kevin Callaghan, Director.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
GSA COLLABORATIVE EXPEDITIONARY WORKSHOP #69: Transcending Socio-Cultural Boundaries in Virtual Work Settings: Creative Collaboration Efforts At the Intersection.
EDiscovery Also known as “ESI” Discovery of “Electronically Stored Information” Same discovery, new form of storage.
Complex legal requirements in electronic discovery search and retrieval: A NARA case study NSF Collaborative Expedition Workshop # 45 Advancing Information.
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
Electronic Discovery Guidelines FRCP 26(f) mandates that parties “meaningfully meet and confer” to consider the nature of their respective claims and defenses.
What is Multimedia Anyway? David Millard and Paul Lewis.
Leveraging the Data Map – A Case Study November 15, 2016
Remote Collection What, When & Why.
Unit# 5: Internet and Worldwide Web
Introduction to Information Retrieval
Electronic Discovery Sabrina Jones 4/14/2011.
Jack G. Conrad, Thomson R&D
Presentation transcript:

E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science Foundation Washington, D.C. Jason R. Baron Director of Litigation Office of General Counsel U.S. National Archives and Records Administration July 18, 2007

National Archives and Records Administration2 Searching Through E-Haystacks: A Big Challenge

National Archives and Records Administration3 Overview Introduction: Myth, Hype, Reality – Information Retrieval and the Problem of Language Case Study: U.S. v. Philip Morris The TREC Legal Track: Initial Results Strategic Challenges in Thinking About Search Issues Across The Enterprise References

The Myth of Search & Retrieval When lawyers request production of “all” relevant documents (and now ESI), all or substantially all will in fact be retrieved by existing manual or automated methods of search. Corollary: in conducting automated searches, the use of “keywords” alone will reliably produce all or substantially all documents from a large document collection.

The “Hype” on Search & Retrieval Claims in the legal tech sector that a very high rate of “recall” *(i.e., finding all relevant documents) is easily obtainable provided one uses a particular software product or service.

The Reality of Search & Retrieval + Past research (Blair & Maron, 1985) has shown a gap or disconnect between lawyers’ perceptions of their ability to ferret out relevant documents, and their actual ability to do so: --in a 40,000 document case (350,000 pages), lawyers estimated that a manual search would find 75% of relevant documents, when in fact the research showed only 20% or so had been found.

More Reality: IR is Hard + Information retrieval (IR) is a hard problem: difficult even with English- language text, and even harder with non- textual forms of ESI (audio, video, etc.) caught up in litigation. + A vast field of IR research exists, including some fundamental concepts and terminology, that lawyers would benefit from having greater exposure with.

Why is IR hard (in general)? + Fundamental ambiguity of language + Human errors + OCR problems + Non-English language texts + Nontextual ESI (in.wav,.mpg,.jpg formats, etc.) + Lack of helpful metadata

Problems of language Polysemy: ambiguous terms (e.g., “George Bush,” “strike,”) Synonymy: variation in describing same person or thing in multiplicity of ways (e.g., “diplomat,” “consul,” “official,” ambassador,” etc.) Pace of change: text messaging, computer gaming as latest examples (e.g., “POS,” “1337”)

Why is IR hard (for lawyers)? + Lawyers not technically grounded + Traditional lawyering doesn’t emphasize front- end “process” issues that would help simplify or focus search problem in particular contexts + The reality is that huge sources of heterogeneous ESI exist, presenting an array of technical issues + Deadlines and resource constraints + Failure to employ best strategic practices

Snapshot of 2007 ESI Heterogeneity , integrated with voice mail & VOIP, word processing (including not in English), spreadsheets, dynamic databases, instant messaging, Web pages including intraweb sites, Blogs, wikis, and RSS feeds, backup tapes, hard drives, removable media, flash drives, new storage devices, remote PDAs, and audit logs and metadata of all types.

National Archives and Records Administration12 Sedona Guideline 11 (2007) A responding party may satisfy its good faith obligation to preserve and produce potentially relevant electronically stored information by using electronic tools ad processes, such as data sampling, searching, or the use of selection criteria, to identify data reasonably likely to contain relevant information.

Litigation Targets + Defining “relevance” + Maximizing # responsive docs + Minimizing retrieval “noise” or false positives (non-responsive docs)

National Archives and Records Administration14 Not Relevant and Retrieved Relevant and Retrieved Relevant and Not Retrieved Not Relevant and Not Retrieved FINDING RESPONSIVE DOCUMENTS IN A LARGE DATA SET: FOUR LOGICAL CATEGORIES DOCUMENT SETFALSE POSITIVES FALSE NEGATIVES

National Archives and Records Administration15 FINDING RESPONSIVE DOCUMENTS IN A LARGE DATA SET: THE REALITY OF LARGE SCALE DISCOVERY RELEVANT DOCUMENTS “HITS” ON NONRELEVANT DOCUMENTS ??????? ?????? ????? The Great Unknown

Measures of Information Retrieval Recall = # of responsive docs retrieved # of responsive docs in collection

Measures of Information Retrieval Precision = # of responsive docs retrieved # of docs retrieved

National Archives and Records Administration18 RECALL PRECISIONPRECISION 0100% THE RECALL-PRECISION TRADEOFF

Three Questions (1) How can one go about improving rates of recall and precision (so as to find a greater number of relevant documents, while spending less overall time, cost, etc., sifting through noise?) (2) What alternatives to keyword searching exist? (3) Are there ways in which to benchmark alternative search methodologies so as to evaluate their efficacy?

Beyond Reliance on Keywords Alone: Alternative Search Methods Greater Use Made of Boolean Strings Fuzzy Search Models Probabilistic models (Bayesian) Statistical methods (clustering) Machine learning approaches to semantic representation Categorization tools: taxonomies and ontologies Social network analysis

National Archives and Records Administration21 What is TREC? Conference series co-sponsored by the National Institute of Standards and Technology (NIST) and the Advanced Research and Development Activity (ARDA) of the Department of Defense Designed to promote research into the science of information retrieval First TREC conference was in th Conference held November 15-17, 2006 in U.S. in Gaithersburg, Maryland (NIST headquarters)

National Archives and Records Administration22 TREC 2006 Legal Track The TREC 2006 Legal Track was designed to evaluate th effectiveness of search technologies in a real-world legal context First of a kind study using nonproprietary data since Blair/Maron research in hypothetical complaints and 43 “requests to produce” drafted by Sedona Conference members “Boolean negotiations” conducted as a baseline for search efforts Documents to be searched were drawn from a publicly available 7 million document tobacco litigation Master Settlement Agreement database 6 Participating teams submitted 33 runs. Teams consisted of: Hummingbird, National University of Singapore, Sabir Research, University of Maryland, University of Missouri- Kansas City, and York University

National Archives and Records Administration All documents discussing, referencing, or relating to company guidelines or internal approval for placement of tobacco products, logos, or signage, in television programs (network or cable), where the documents expressly refer to the programs being watched by children. - (guide! OR strateg! OR approval) AND (place! OR promot! OR logos OR sign! OR merchandise) AND (TV OR "T.V." OR televis! OR cable OR network) AND ((watch! OR view!) W/5 (child! OR teen! OR juvenile OR kid! OR adolescent!)) - TREC 2006 LEGAL TRACK XML ENCODED TOPICS WITH NEGOTIATION HISTORY – ONE EXAMPLE

National Archives and Records Administration24 TREC Legal Track 2006: Percentage of Unique Documents By Topic Found By Boolean, Expert Searcher, and Other Combined Methods of Search

National Archives and Records Administration25 TREC Legal Track 2006: Sort by Increasing Percentage of Unique Documents Per Topic Found By Combined Methods Other Than A Baseline Boolean Search

National Archives and Records Administration26 INCREASING EFFORT (time, resources expended, etc.) Boolean Run Alternative Search Run Boolean vs. Hypothetical Alternative Search Method B C D SUCCESS (in retrieving relevant docs) A x y

National Archives and Records Administration27 Managing Litigation Risk ↑ Success per amount of effort = ↓ Litigation Risk

Strategic Challenges Convincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands.

Challenges (con’t) Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection.

Challenges (con’t) Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used), and using sampling and other analytic techniques to test hypotheses early on.

Challenges (con’t) Parties making a good faith attempt to collaborate on the use of particular search methods, including utilizing multiple “meet and confers” as necessary based on initial sampling or surveying of retrieved ESI, based on whatever methods are used.

Challenges (con’t) Being open to using new and evolving search and information retrieval methods and tools.

Leading U.S. Case Precedent on Automated Searches -- Current cases emphasize parties discussing proposed keyword searches under the rubric of coming up with a “search protocol.” Treppel v. Biovail, 233 F.R.D. 363 (S.D.N.Y. 2006) -- First case discussing parties’ employing or relying on concept searches as an alternative search methodologies. Disability Rights Council v. WMATA, 2007 WL (June 1, 2007)

Future Research TREC 2007 Legal Track The Sedona Conference Search & Retrieval Commentary & additional activities

National Archives and Records Administration35 References J. Baron, “Toward A Federal Benchmarking Standard for Evaluating Information Retrieval Products Used in E-Discovery,” 6 Sedona Conference Journal 237 (2005) (available on Westlaw and LEXIS) J. Baron, “Information Inflation: Can The Legal System Adapt?” (with co-author George L. Paul), 13 Richmond J. Law & Technology (2007), vol. 3, article 10 (available online at

National Archives and Records Administration36 References J. Baron, D. Oard, and D. Lewis, “TREC 2006 Legal Track Overview,” available at dings.html (document 3) dings.html The Sedona Conference, Best Practices Commentary on The Use of Search & Retrieval Methods in E-Discovery (forthcoming 2007) TREC 2007 Legal Track Home Page, see

National Archives and Records Administration37 References ICAIL 2007 (International Conference on Artificial Intelligence and the Law), Workshop on Supporting Search and Sensemaking for ESI in Discovery Proceedings, see see also J. Baron and P. Thompson, “The Search Problem Posed By Large Heterogeneous Data Sets in Litigation: Possible Future Approaches to Research,” ICAIL 2007 Conference Paper, June 4-8, 2007, available at ws/ (click link to conference paper). ws/

National Archives and Records Administration38 Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration 8601 Adelphi Road Suite 3110 College Park, MD (301)