Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Technology Assisted Review: Trick or Treat? Ralph Losey, Esq., Jackson Lewis 1.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of.
1 Best Practices in Legal Holds Effectively Managing the e-Discovery Process and Associated Costs.
Reasoning Methodologies in Information Technology R. Weber College of Information Science & Technology Drexel University.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Repeatability and Workability for the Software Community: challenges, experiences and the future. Dennis Shasha,
Search Engines and Information Retrieval
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş.
Modern Information Retrieval
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
Lessons Learned from Information Retrieval Chris Buckley Sabir Research
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Near-Duplicate Detection by Instance-level Constrained Clustering Hui Yang, Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
Electronic Discovery (eDiscovery) Chad Meyer & John Vyhlidal ConAgra Foods.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Get Off of My I-Cloud: Role of Technology in Construction Practice Sanjay Kurian, Esq. Trent Walton, CTO U.S. Legal Support.
 A set of objectives or student learning outcomes for a course or a set of courses.  Specifies the set of concepts and skills that the student must.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Presented by Rebecca Shwayri. Introduction to Predictive Coding and its benefits How can records managers use Predictive Coding Predictive Coding in Action.
2009 CHANGES IN CALIFORNIA DISCOVERY RULES The California Electronic Discovery Act Batya Swenson E-discovery Task Force
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
The Challenge of Rule 26(f) Magistrate Judge Craig B. Shaffer July 15, 2011.
Marketing Campaign for St. Michael School Gabrielle Hebauf EDT 631 Simulation Project.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Conducting Modern Investigations Analytics & Predictive Coding for Investigations & Regulatory Matters.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Defensible Records Retention and Preservation Linda Starek-McKinley Director, Records and Information Management Edward Jones
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Planning for the TREC 2008 Legal Track Douglas Oard Stephen Tomlinson Jason Baron.
Performance Measurement. 2 Testing Environment.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
November 8, 2005NSF Expedition Workshop Supporting E-Discovery with Search Technology Douglas W. Oard College of Information Studies and Institute for.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Developments in Evaluation of Search Engines
Information Organization: Overview
Lecture 1: Introduction and the Boolean Model Information Retrieval
Speakers: Ian Campbell, Claire Hass,
Cambridge Upper Secondary Science Competition
Evaluation.
Modern Information Retrieval
Text Categorization Rong Jin.
Feature Selection for Ranking
Retrieval Evaluation - Reference Collections
Information Organization: Overview
Current Methodologies for Supervised Machine Learning in E-Discovery
Presentation transcript:

Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review Gordon V. Cormack University of Waterloo (519) x34450 Maura R. Grossman Wachtell, Lipton, Rosen & Katz (212)

2 Watson Versus Jennings and Ritter

3 Debunking the Myth of Manual Review  The Myth:  That “eyeballs-on” review of each and every document in a massive collection of ESI will identify essentially all responsive (or privileged) documents; and  That computers are less reliable than humans in identifying responsive (or privileged) documents.  The Facts:  Humans miss a substantial number of responsive (or privileged) documents;  Computers—aided by humans—find at least as many responsive (or privileged) documents as humans alone; and  Computers—aided by humans—make fewer errors on responsiveness (or privilege) than humans alone, and are far more efficient than humans.

4 Human Assessors Disagree!  Suppose two assessors, A and B, review the same set of documents;  Overlap = # documents coded responsive by both A and B # documents coded responsive by A or B, or both A and B Example: Primary and secondary assessors both code 2,504 documents as responsive. One or both code 2, , = 5,498 documents as responsive. Overlap = 2,504 ∕ 5,498 = 45.5%.

5 More Human Assessors Disagree Even More!  Suppose three assessors, A, B, and C, review the same set of documents;  Overlap = # documents coded responsive by A and B and C # documents coded responsive by one or more of A, B, or C Example: Primary, secondary, and tertiary assessors all code 1,972 documents as responsive. One or more code 1, , , = 6,020 documents as responsive. Overlap = 1,972 / 6,020 = 32.8%.

6 Pairwise Assessor Overlap in the TREC 4 IR Task (Voorhees 2000)

7 Assessor Overlap With the Original Response to a DOJ Second Request (Roitblat et al. 2010)

8 Assessor Overlap: IR Versus Legal Tasks

9 What is the “Truth”? Option #1: Deem Someone Correct Deem the primary reviewer as the gold standard (Voorhees 2000).

10 What is the “Truth”? Option #2: Take the Majority Vote Deem the majority vote as the gold standard.

11 What is the “Truth”? Option #3: Have all Disagreements Adjudicated by a Topic Authority Have a senior attorney adjudicate all but only cases of disagreement (Roitblat et al. 2010; TREC Interactive Task 2009).

12 How Good are Human Eyeballs?  What do we mean by “How Good”?  Recall;  Precision; and  F1.

13 Measures of Information Retrieval  Recall = # of responsive documents retrieved Total # of responsive documents in the entire document collection (“How many of the responsive documents did I find?”)  Precision = # of responsive documents retrieved Total # of documents retrieved (“How much of what I retrieved was junk?”)  F1 = The harmonic mean of Recall and Precision.

14 Recall and Precision

15 10%20%30%40%50%60%70%80%90%100% Recall 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 0% Precision TREC Best Benchmark (Best performance on Precision at a given Recall) Perfection Typical result in a manual responsiveness review Blair & Maron (1985) The Recall-Precision Trade-Off

16 How Good is Manual Review?

17 Effectiveness of Manual Review

18 How Good is Technology-Assisted Review?

19 What is “Technology-Assisted Review”?

20 Defining “Technology-Assisted Review”  The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.  Think of a spam filter that reviews and classifies into “ham,” “spam,” and “questionable.”

21 Types of Machine Learning  SUPERVISED LEARNING = where a human chooses the document exemplars (“seed set”) to feed to the system and requests that the system rank the remaining documents in the collection according to their similarity to, or difference from, the exemplars (i.e., “find more like this”).  ACTIVE LEARNING = where the system chooses the document exemplars to feed to the human and requests that the human make responsiveness determinations from which the system then learns and applies that learning to the remaining documents in the collection.

22 Source: Servient Inc. Document Set for Review Machine Learning Step #1: Achieving High Precision

23 Documents Set Excluded From Review Source: Servient Inc. Machine Learning Step #2: Improving Recall

24 How Do We Evaluate Technology- Assisted Review?

25 The Text REtrieval Conference (“TREC”): Measuring the Effectiveness of Technology- Assisted Review  International, interdisciplinary research project sponsored by the National Institute of Standards and Technology (NIST), which is part of the U.S. Department of Commerce.  Designed to promote research into the science of information retrieval.  First TREC conference was held in 1992; the TREC Legal Track began in  Designed to evaluate the effectiveness of search technologies in the context of e- discovery.  Employs hypothetical complaints and requests for production drafted by members of The Sedona Conference®.  For the first three years ( ), documents were from the publicly available 7 million document tobacco litigation Master Settlement Agreement database.  Since 2009, publicly available Enron data sets have been used.  Participating teams of information scientists from around the world and U.S. litigation support service providers have contributed computer runs attempting to identify responsive (or privileged) documents.

26 The TREC Interactive Task  The Interactive Task was introduced in 2008, and repeated in 2009 and  It models a document review for responsiveness.  It begins with a mock complaint and associated requests for production (“topics”).  It has a single Topic Authority (“TA”) for each topic.  Teams may interact with the Topic Authority for up to 10 hours.  Each team must submit a binary (“responsive” / “unresponsive”) decision for each and every document in the collection for their assigned topic(s).  It provides for a two-step assessment and adjudication process for the gold standard: where the team and assessor agree on coding, the coding decision is deemed correct; where the team and assessor disagree on coding, appeal is made to the Topic Authority who determines which coding decision is correct. TREC

27 Effectiveness of Technology-Assisted Review at TREC 2009

28 Manual Versus Technology-Assisted Review

29 But!  Roitblat, Voorees, and the TREC 2009 Interactive Task all used different datasets, different topics, and different gold standards, so we cannot directly compare them.  While technology-assisted review appears to be at least as good as manual review, we need to control for these differences.

30 Effectiveness of Manual Versus Technology-Assisted Review

31 So, Technology-Assisted Review is at Least as Effective as Manual Review, But is it More Efficient?

32 Efficiency of Technology-Assisted Versus Exhaustive Manual Review  Exhaustive manual review involves coding 100% of the documents, while technology- assisted review involves coding of between 0.5% (Topic 203) and 5% (Topic 207) of the documents.  Therefore, on average, technology-assisted review is 50 times more efficient than exhaustive manual review.

33 Why Are Humans So Lousy at Document Review?

34 Topic 204 (TREC 2009)  Document Request  All documents or communications that describe, discuss, refer to, report on, or relate to any intentions, plans, efforts, or activities involving the alteration, destruction, retention, lack of retention, deletion, or shredding of documents or other evidence, whether in hard-copy or electronic form.  Topic Authority  Maura R. Grossman (Wachtell, Lipton, Rosen & Katz)

35 Inarguable Error for Topic 204

36 Interpretation Error for Topic 204

37 Arguable Error for Topic 204

38 Topic 207 (TREC 2009)  Document Request  All documents or communications that describe, discuss, refer to, report on, or relate to fantasy football, gambling on football, and related activities, including but not limited to, football teams, football players, football games, football statistics, and football performance.  Topic Authority  K. Krasnow Waterman (LawTechIntersect, LLC)

39 Inarguable Error for Topic 207

40 Interpretation Error for Topic 207

41 Arguable Error for Topic 207

42 Types of Manual Coding Errors

43 Take-Away Messages  Technology-assisted review finds at least as many responsive documents as exhaustive manual review (meaning that recall is at least as good).  Technology-assisted review is more accurate than exhaustive manual review (meaning that precision is much better).  Technology-assisted review is orders of magnitude more efficient than manual review (meaning that it is quicker and cheaper).

44 Measurement is Key  Not all technology-assisted review (and not all exhaustive manual review) is created equal.  Measurement is important in selecting and defending an e-discovery strategy.  Measurement also is critical in discovering better search methods and tools.

45 Additional Resources  TREC   TREC Legal Track   TREC 2008 Overview   TREC 2009 Overview   TREC 2010 Overview  Forthcoming (April 2011) at  Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review Can be More Effective and More Efficient Than Exhaustive Manual Review, XVII:3 Richmond Journal of Law & Technology (Spring 2011) (in press).

46 Questions? Thank You!