Background: The Big Data era Companies Generate Data at Staggering Rates Global email traffic creates 183 billion new messages every day. ExxonMobil’s employees create 5.2 million new messages every day. 30% annual growth rate of corporate data. By 2020: 26 billion devices will be connected to the internet (that’s more than three devices for every person on the planet).
Case Study 1 Reducing Data Volumes in Cross-Border Litigations and Investigations
Analytics in cross-border matters The Client: A Taiwanese Fortune 500 manufacturer facing a global corruption investigation and represented by US-based counsel. The Challenge: Approximately 8.5 million documents in various languages requiring analysis and review under very tight deadlines. The Solution: Traditional Culling: 8,462,117 documents reduced to 1,651,094 through deduplication and application of search terms and date filters (80.5% cull rate) Advanced Analytics Culling: using “concept clustering” and “find more like this” functionality, an additional 1,387,350 documents were removed from the review set, leaving only 263,744 to be reviewed (96.4% total cull rate)
Analytics in cross-border matters
Analytics in cross-border matters The Results: Volume Reduction: 8,462,117 initial documents reduced to only 263,744 over the course of two days (a total cull rate of 96.4% ) Cost Savings: $1,165,374 in attorney fees saved (at average rate of 50 documents/hour charged at $42/hour for contract review attorneys). Time Savings: 27,747 hours of attorney review saved (at an average review rate of 50 documents/hour)
Case Study 2 Establishing Personal Jurisdiction Through Analytics
Establishing Personal Jurisdiction Through Analytics The Client: Global Plaintiffs’ Litigation Boutique The Case: Large class-action against dozens of international banks alleged to have participated in a global rate-fixing scandal. The Challenge: At the outset of the case, the defendants produced over 1.5 million documents. Shortly thereafter, certain defendants moved to dismiss for lack of personal jurisdiction. Plaintiffs had only 60 days to oppose the motion; not enough time to review the documents to identify contacts between the foreign defendants and the United States. The Solution: TransPerfect searched all 1.5 million documents for any communications (emails, text messages, Bloomberg chats, audio recordings, IMs, etc.) involving US-based custodians (based on a list provided by outside counsel). The search returned 324,830 documents, which were provided to Praescient to visually “map” the communications based on IP addresses, phone numbers and addresses in signature blocks to identify all communications between the US and the foreign defendants.
Establishing Personal Jurisdiction Through Analytics The Analytic Process: Raw case data was ingested into cutting edge link-analysis software, facilitating more robust and efficient analysis of the data. Specifically, large data sets were broken into smaller networks of associates and communications where specific actions (emails, phone calls etc.) could be identified and investigated based on temporal and geographic indicators. The Results: Of 324,830 documents sent to Praescient, 76,800 were identified as containing communications between the US and the foreign defendants. Outside counsel was provided with analytic insights and a graphical representation of the communication flow of those documents, which were also loaded onto an online review platform for review. U.S. Based UK Based U.S. Based
Establishing Personal Jurisdiction Through Analytics Outcome: Ultimately, the defendants motion was denied, and the analysis had resulted in better investigative insights in less time, reducing hundreds of thousands of documents, emails, and internal chats down to a consolidated list of “high-value” incidents and actors.
Case Study 3 Reusing Assets to Save Time and Money
Reusing Assets to Save Time and Money What is it? Central datastore for eDiscovery metadata across related matters for a large financial institution. Maintain prior privilege and responsiveness calls. Generate custom metadata properties. Analyze Document-level Metadata Across Matters Evaluate privilege screens Identify inconsistent calls for reconciliation Dense Document Analysis Identifying frequently occurring documents Workflow Modeling and Management Inform review workflow based on prior reviews and document properties
Reusing Assets to Save Time and Money View Discovery Calls Across Matters Over 1.4 million documents had been previously coded (3.5 million coding decisions) in 8 related litigations 82% non-responsive; 18% responsive 76% non-privileged; 14% privileged Bank Examination Privilege, Non-Public Personal Information and Hot Use Prior Coding to Alter Workflow in Current Litigation Reduce Tier 2 review by law firm Identified ~10,000 previously Tier 2 reviewed “responsive/non-privileged” documents that hit new search terms and tagged “responsive” by contract attorneys in current review Produced without Tier 2 review except “Hot” Over $75,000 in savings
Reusing Assets to Save Time and Money Enhance Privilege Screen/Quality Control Tested privilege search term list against previously Tier 2 coded privileged documents (~80,000 privileged documents) Very high recall by email family (over 98%) Low precision (less than 20%) Eliminating low precision terms had little impact on recall or precision Modifying terms had little impact on recall or precision Worst offenders: “Legal”, “Privilege”, “Privilege”, “Confidential”, “Lawyer” and “Counsel” Conclusion Search terms are not efficient way to identify privileged documents—resulting in costly review process Search terms can be designed with high recall for QC to reduce risk, but with high review costs
Reusing Assets to Save Time and Money Additional Workflow Alterations Identified ~20,000 privileged documents by Tier 1 review Less than 2,000 previously tagged as privileged by Tier 2 Developing additional analytics to help identify privileged and non-privileged documents using metadata For future cases, use data analytics from earlier cases Eliminate Dense Documents from Tier 1 review Very high rate of non-responsiveness (more than 98%) Sampling possible to confirm Use analytics (metadata) to help identify other highly likely non-responsive documents and eliminate from Tier 1 review Use analytics (metadata) to better identify (high precision) privileged, bank examination privileged and hot documents For business line of this financial institution, use for other cases and uses (e.g., compliance, information governance)
Case Study 4 Analytics Based Review
Analytics-Based Review 1. Identify production set based on objective criteria 2. Remove privileged documents 3. Identify key documents through iterative targeted searches Entity search Concept categorization Sentiment analysis Trends & anomalies Machine learning Iterate until reasonably complete