Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014.

Similar presentations


Presentation on theme: "Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014."— Presentation transcript:

1 Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

2 Discussion Overview  Pushing the Boundaries of Early Data Analysis (EDA)  Examining Traditional EDA Tools  Leveraging Predictive Coding (PC) for Analysis  Using PC in an EDA Environment 2

3 Pushing the Boundaries of EDA

4 EDA | an acronym worth defining 4  Early Data Analysis (EDA) aides fact-finding and narrows the data scope by helping attorneys understand their datasets »Triage data into critical and non-critical groupings »Identify and reduces number of key players »Test search terms »Identify critical case arguments »Categorize documents as efficiently as possible for production  A true methodology – technology fuels human decisions

5 »Filter »Search »Cluster »Processing »Ensure portability of groups and tags »Ensure production/ search capabilities of review platform »Search »Tag »Redact 5 Identify, Collect & Process Analysis Export to Review Platform »Log »Route »Report Import & Perform Early Analysis »Test »QC Document Review Traditional EDA | Overview

6 »Filter »Search »Cluster »Processing »Ensure portability of groups and tags »Ensure production/ search capabilities of review platform 6 Identify, Collect & Process Analysis Export to Review Platform Import & Perform Analysis »Test »QC Where does Predictive Coding fit in? Predictive Coding! »Search »Tag »Redact »Log »Route »Report Document Review

7 »Filter »Search »Cluster »Ensure portability of groups and tags »Ensure production/ search capabilities of review platform »Search »Tag »Redact Predictive Coding! 7 Identify, Collect & Process Analysis Export to Review Platform »Log »Route »Report Import & Perform Analysis »Test »QC Review Traditional EDA | How efficient is it? The Bermuda Triangle of ediscovery »PC is massively underused »The tools used during analysis and review overlap substantially »Pointless inefficiencies are created by jockeying data between two standalone platforms

8 8 Identify, Collect & Process Analyze and Review EDA + Review | Could it look like this? »Process »PC »Filter »Search »Cluster »Test »QC »Route »Report »Tag

9 Examining Traditional EDA Tools

10 Keyword Search & Concept Search 10 »Uses search terms and Boolean operators (&, or, not) to retrieve documents that contain those exact terms »Standard practice »Generally accepted in the courts “baseball & field” »Technology alternative »Allows reviewers to find documents with similar conceptual terms even if they do not contain exact search terms »Seldom used for filtering; increasingly used for review “baseball”  diamond, MLB, hit, out

11 11 Finance »Documents automatically grouped by theme without human input Topic Grouping & »Identify all languages in a document »Used to group and sort documents for review by multilingual reviewers Topic Grouping & Language Identification

12 12 »Identifies and groups e-mail conversations based on content Topic Grouping & »Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates Email Threading & Near Deduplication Start-Point Email RE: FWD: End-Point Email

13 Finding a Common Thread 13  At their cores, these tools help attorneys learn more about their data »Does PC fit the bill? Topic Group Key Word Search Language ID Dedupe Email Threading Concept Search Analytical Tools Predictive Coding

14 Leveraging PC for Analysis

15 15 Predictive Coding for Production

16 Predictive Coding For Analysis 16  PC has been praised for its ability to reduce the amount of documents manually reviewed during first pass  But at least three critical components of PC empower attorneys with unrivaled knowledge about their case: »Prioritization »Categorization »Active Learning

17 The Prioritization Component 17 74,000 480,000 ResponsiveNon-responsive  Learns from reviewer decisions and escalates documents based on two binary categories »Responsive or nonresponsive »Works based on modest amount of learning  Increases the ratio of responsive documents that get routed to reviewers

18 The Prioritization Component 18  How does this help attorneys analyze their case? »When attorneys ‘check out’ documents to review, they are seeing those documents most likely to be responsive »For the same reasons this speeds up production, attorneys who put eyes on these richly relevant documents will know more about their case earlier – driving arguments and filling knowledge gaps »It runs in the background, you don’t need to carve into billable hours to test keywords Request batch Entire Corpus

19 19  Learns from trainer decisions and suggests coding on multiple categories for an entire collection of documents  Assigns a predicted responsiveness score  Improves speed and quality of categorization decisions 75% Predicted Responsive Non-responsive Privileged 67% Predicted 89% Predicted The Categorization Component

20 20  How does this help attorneys analyze their case? »Allows attorneys to segregate data at user-defined predicted responsiveness ratings after modest training »Empowers attorneys to route certain categories of documents (e.g. “hot” docs) to certain sub-groups within the team 0% 100% 1,427 docs 9,522 docs Post Round One Categorization Results (65% cutoff) 65% % likelihood to be responsive To: Brief-writer Bryan Re: Good Luck on the first draft!

21  Key component of any true PC solution »Automatically escalates focus documents for training (as opposed to just handpicked, or just randomly selected training documents)  Focus Documents: »Come from grey areas in the classifier because the machine is currently uncertain whether they are responsive or not responsive »Ideal candidates to improve machine learning »Not random, but queried 21 100% responsive 0% non-responsive 90% 80% 70% 60% 50% 40% 30% 20% 10% The Active Learning Component

22  How does this help attorneys analyze their case? »Introduces attorneys to the documents on the fringe of relevancy –These could be case-changing documents that the machine just doesn’t know enough about yet »Most effective way to boost metrics and improve results between early training rounds –Reduces false positives; improves accuracy of machine’s concept of relevancy 22 The Active Learning Component Precision Recall Precision TR 1 TR 2

23 Additional Efficiencies 23  Production »Can easily transition into production whether leveraging PC, or not –Most practical form of PC for EDA  Reporting »Even if just one or two training rounds are performed, metrics will show where you stand –In this vein, no other EDA tool comes close to PC’s automatic reporting –There’s a reason courts often ask for recall and precision - these indicate whether you’re understanding of the data set is accurate

24 Additional Efficiencies 24  Other ECA tools complement predictive coding »Predictive coding requires reviewing a few thousand documents in training –Most PC solutions also come equipped with all other EDA tools available –This helps you navigate the training set as well as during review  Intra-team quality control »Can compare reviewer-machine agreement rates side-by-side »Identify points of disagreement and inconsistency

25 Additional Efficiencies 25  The small case conundrum »The analytical value from PC is greater where the same subject-matter expert who trains the system is the same attorney who is forming case strategy –This is most likely true in small-medium cases where one attorney may be in charge of a case through trial »The production value from using PC to aid review is greater where high upfront costs can be recouped from applying the machine’s logic to a large amount of documents –Traditionally, this has been true only in large cases

26 Additional Efficiencies 26  This is all changing  The “portfolio approach” to ediscovery »Pay yearly for PC (and everything that preceded it) in all your cases for a data hosting fee (process on the vendor’s side) –Upload on day one, train on day one, see a list of documents ranked by relevancy on day one

27 Using PC in an EDA Environment

28 Overview 28  It’s not that crazy »EDA tools let you learn more about your data—so does PC »Many of the tools discussed today (e.g. de-duplication, concept searching) already exist in standalone “PC solutions”  Aggressive culling via keywords can have an impact on training in PC  Any search strategy must be well designed according to the matter at hand  The producing party has substantial deference in conducting its search

29  In re Biomet »Defendant’s search strategy: »Plaintiffs argued: the defendant should have used PC on the whole 19.5 million document corpus; the keywords tainted the training. We want joint review of training docs. »Court held: defendant’s search was reasonable Pre-PC Keyword Cull? 29 3 million documents 19.5 million documents Production Keyword PC

30 Parting Thoughts 30  There are many ways to learn about data »Different tools on the same belt; multi-modal search  Solutions are emerging that offer all of these tools in one location »No more data jockeying »More information for better decisions  Quality control is essential whenever you use one of these tools to remove documents from production

31


Download ppt "Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014."

Similar presentations


Ads by Google