Download presentation
Presentation is loading. Please wait.
Published byNathaniel Jefferson Modified over 9 years ago
1
Presented by Rebecca Shwayri
2
Introduction to Predictive Coding and its benefits How can records managers use Predictive Coding Predictive Coding in Action Limitations of keyword searches & human review Predictive Coding Defensibility
3
What is predictive coding? How does it work?
4
NOT Magic NOT a cure for cancer NOT based on voodoo
5
Keyword searching Concept searching E-mail threading These methods can be useful but do not predict relevance of future documents based on past documents
6
Expert (you) develops an understanding of the documents and classifies the documents Old tech In common use today Example: Spam Filter, Amazon.com Math and Statistics
7
Algorithms Mathematical model built Accuracy depends on quality of training set
8
Random Sample Single person reviews & codes the Sample Non- Responsive Responsive Computer learns & predicts Computer categorizes all remaining documents ResponsiveNon-Responsive Repeat as needed Review 2000-5000 randomly selected documents One person’s time for 15-39 hours Predictive Coding in Practice
9
Dramatic Reduction in e-discovery costs More accurate than human review and keyword search Light years faster than human review and keyword search
10
Fact driven, not fear driven, settlements Learn the facts of the case in a few days rather than over months or years using traditional methods of review Helps avoid litigation – uncovers the facts more quickly Use as an information governance tool
11
MethodRecall RatioCostSpeed Keywords20 percentHigh $$$Slow - Misses content Human Review60 percentVery High $$$$60 docs / hr Predictive Coding75-98 percentLow $>80-250x faster
12
Information Governance Tool (proactive) Litigation Tool (reactive)
13
Encompasses a variety of disciplines Records Management Knowledge Management Information Security and Privacy
14
Data breach risks E-discovery costs Unable to locate documents needed for the business units
15
Standardized IG policies Reduce the need to review every single document to determine the importance of the document to the company Locate data within the company’s IT infrastructure and categorize it appropriately for the business units Locate data that needs to be destroyed
16
Example: Company is sued in a dispute involving fraud and breach of contract Custodians: 20 Potential Custodians with average e- mail box of 40 GB each (800 total GB of e-mail data) Other electronic Files: 200 GB Total Data: 1 Terabyte
17
Company is served with a Request for Production of Documents by Plaintiffs’ Counsel Plaintiffs’ Counsel demands searching through ESI of custodians Plaintiffs’ Counsel makes a broad demand for accounting records
18
What do you do? Keyword search 1TB of data? How do you keyword search fraud? Information disadvantage! Human review? It will take many, many months and millions of dollars to review 1TB of data!
19
Use Predictive Coding Should you disclose? One school of thought suggests disclosing use of predictive coding to opposing counsel, agreeing to precision and recall rates (Full Agreement and Full Disclosure) The other school of thought suggests making no disclosures (Avoid litigation associated with use of predictive coding)
20
Recall (Completeness) Recall measures how successful the system was in finding all of the responsive documents. If 1,000 documents in the full set were actually responsive, but the system only marked 750 of those documents responsive, then the recall would be 75 percent. Precision (Accuracy) Precision measures how often the documents that were marked responsive were actually responsive. If the system marked 10 documents responsive, and only six of them were actually responsive, then the precision would be 60 percent.
21
Depends on collection “richness” 2-5 days – one person & one only! 500-5000 documents reviewed Stop when system exhibits: High rates of Precision & Recall – above the agreed to rates No longer discovering new topics to teach the computer about Computer is predicting with consistency
22
It is like Exit Polling…. Statistics Truth: Sample of a certain size yields a certain level of confidence and a certain margin of error. 400 randomly selected docs provides 95% confidence level in the estimate of Predictive Coding accuracy, with a ± 5% margin of error. Reference: Cochran, WG 1977. Sampling Techniques, 3 rd Ed. John Wiley & Sons, New York, New York, USA.
23
When you are out of time If you want to save money Consider using CAR for cases involving 5 GB or more of data Predictive coding makes sense when you have 20,000 documents or more
24
Judge Facciola (D.DC): “If you are practicing e-discovery without a clawback, you are committing malpractice.” Parties agree in writing that inadvertent production of privileged material does not automatically constitute a waiver
25
What if the other side won’t agree to the clawback agreement? Go to the Court! Rajala v. McGuire Woods, 2010 WL 294582 (D. Kan. July 22, 2010): Court issued clawback order with no need to show reasonable efforts
26
Consider Clawback Agreement during “meet and confer” conference Embody agreement in Court Order (Rule 502(d))
27
Predictive coding should be used to cull down data set to a manageable level This should occur AFTER predictive coding Attorneys should conduct privilege review Attorneys need to decide what is privileged: Do not put this on auto-pilot
28
Why Linear Review is Ineffective Linear Review compared to other methods
29
Catches only 20 percent of relevant evidence Therefore…misses 80 percent The “Google” phenomenon Limitations of Keywords
30
Failure of imagination (Example: Nasdaq versus Stock Market) How many synonyms for the word “think”? Precise Terms of Art Misspellings (Example: Mangment, Mangemnt…) Problems With Keywords
31
Human problem People express concepts differently Difficulties in learning to adopt another party’s language style TREC (Text Retrieval Conference) was a competition and it showed a complete failure in keyword searches
32
Human keyword based review is expensive It is slow & inaccurate It unnecessarily complicates a simple process Is widely used as until now, there were no alternatives Predictive coding – when “done right” – can save a corporation 80-90% of review costs.
33
Keyword searches missed 96 percent of relevant documents (recall ratio averaged less than 4 percent) TREC Legal Track Study 2009
34
97 percent of relevant documents not found Only a 3 percent recall ratio (76,373 relevant documents not discovered) Boolean searches reduced the initial corpus from 685,592 to 2,715 documents 87 percent precision ratio (2,362 documents out of 2,715 are relevant) TREC Legal Track Study 2010
35
Involved a San Francisco Bay Area Rapid Transit Accident Discovery database contained 40,000 documents and 350,000 pages Attorneys believed keyword searches uncovered 75 percent of relevant documents In reality: Only 20 percent of relevant documents uncovered Blair and Maron Study
36
Human eyeballs on every document Judge Peck: The “gold” standard does not have any gold Human assessors disagree on the relevance of a document to a single topic The “Gold” Standard
37
TREC Conclusion: 65% Recall and 65% Precision is best retrieval effectiveness for human reviewers Human eyeballs on every document is not working Reviewers disagree as frequently as 50 percent
38
Monique Da Silva Moore v. Publicis Groupe & MSL Group (SDNY) (endorsed using predictive coding) Complicated and confusing protocol – DO NOT USE Defendants offered plaintiffs everything they wanted – protocol was so confusing they could not see they got everything they ask for – so they went after the Judge. Global Aerospace, Inc. v. Landow Aviation Limited Partnership (Circuit Court of Loudoun County Virginia) (authorized use of predictive coding over objection) Nothing in news – as no controversy – everything worked!
39
Expensive Kleen case – 1400 attorney hours to determine search terms – and plaintiff was not satisfied – and neither was aware of overall effectiveness of terms Not effective Over or Under produces Known to be very problematic “Ostrich approach” is no longer advisable – technology has evolved Judges know it exists, plaintiffs know it exists and ask for it
40
EORHB, Inc., et. al. v. HOA Holdings, LLC (Delaware Chancery Court) Court ordered the parties sua sponte to use predictive coding and ordered the parties to use the same vendor Judge may have over stepped bounds
41
Technology is your friend Make data driven decisions We are living in the “MoneyBall” age If you are unsure, please ask – this is not going away
42
For more information contact Rebecca Shwayri Email: rebecca.shwayri@akerman.com Tel: (813) 209-5029
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.