6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365.

Slides:



Advertisements
Similar presentations
Session 1.
Advertisements

Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Feature: Suggested Item Enhancements – Analysis and Assignment © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and.
Microsoft Teams Behind the Scenes – Q&A
Azure Machine Learning Deploying and Managing Models in production
New web experiences in Office 365 that empower your users
Use any Amazon S3 application with Azure Blob Storage
6/5/2018 1:30 PM THR1029 Spend less time managing data and more time with customers: Quick tour of Outlook Customer Manager Welly Lee
Azure Cloud Shell Magic of Modern Command-line Management
6/19/2018 2:57 AM THR3092 Monitor and investigate actions on your user and data with alerts, insights and reports Binyan Chen Program Manager II, Office.
Do more with Microsoft Word and Office 365
Decoding audit events in Microsoft Office 365
What a Real, Functioning DevOps Team Looks Like
Azure Machine Learning Algorithm Accuracy Enhancement, Tips and Tricks
SQL Server on Linux on All-Flash Arrays
Excel and Power BI Better Together Democratization of data
Workflow Orchestration with Adobe I/O
Customize Office 365 Search and create result sources
How we got a traditional bank collaborating across boundaries
The utility belt for managing security and compliance in Office 365
Find, try and get line-of-business apps on Microsoft AppSource
9/12/2018 7:18 AM THR1081 Don’t be the first victim of new malware Turn Windows Defender AV Cloud Protection on! Amitai Senior Program.
Automate all things! Microsoft Azure continuous deployment
Agile Planning with Visual Studio Team Services (VSTS)
Office and Everyday AI Carol Grant & Scott Shapiro – Office Marketing
Servicing Windows 10 in the Real World
Возможности Excel 2010, о которых следует знать
Seamlessly add video into O365 app or other apps with Microsoft Stream
9/22/2018 3:49 AM BRK2247 Learn from MVPs: Panel discussion on all things SharePoint and OneDrive © Microsoft Corporation. All rights reserved. MICROSOFT.
Confidence at speed: Visual Studio 2017 and your CI pipeline
Azure PowerShell Aaron Roney Senior Program Manager Cormac McCarthy
11/17/2018 6:41 PM BRK3392 Windows 10 servicing explained (WAAS) Deploying Windows as an inplace upgrade Adnan Hendricks Microspecialist
Seamlessly add video into O365 app or other apps with Microsoft Stream
11/22/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
11/22/2018 1:43 PM THR3005 How to provide business insight from your data using Azure Analysis Services Peter Myers Bitwise Solutions © Microsoft Corporation.
Collaboration in the Office Apps
Continuous Delivery with Visual Studio Team Services
Azure Advisor: Optimization in the best way
Mobile Center and VSTS:​ Better together for your Mobile DevOps
Title of Presentation 12/2/2018 3:48 PM
Microsoft products for non-profits
Introduction to ASP.NET Core 1.0
Five cool things you can do with Windows PowerShell on Office 365
Microsoft To-Do Preview
Microsoft Exchange: Through the eyes of MVPs (Panel discussion)
MDM Migration Analysis Tool (MMAT)
Overview: Dynamics 365 for Project Service Automation
Virtual Reality with Azure and Unity
Understand your Azure cloud assets dependencies with BMC Discovery
Surviving identity management in a hybrid world
Sami Laiho AMA - Ask Me Anything
Breaking Down the Value of A Yammer Post: 20 Things to Do
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Cool Microsoft Edge Tips and Tricks
When Bad Things Happen to Good Applications
Getting the most out of Azure resources with Azure Advisor
Manage your App Service resources using Command line tools
“Hey Mom, I’ll Fix Your Computer”
4/21/2019 7:09 AM THR2098 Unlock New Opportunities with Nintex Hawkeye Process Intelligence and Workflow Analytics Sr. Product.
4/28/2019 3:30 AM THR1061 Learn how Dynamics 365, Office 365 and related applications work together to transform the workplace Donna Edwards Solution Architect.
Consolidate, manage, backup, and secure your cloud content
Designing Bots that Fit Your Organization
Ask the Experts: Windows 10 deployment and servicing
Passwordless Service Accounts
Azure Networking inside and out
Digital Transformation: Putting the Jigsaw Together
WCF and .NET Framework Microservices in Containers
Diagnostics and troubleshooting in Azure App Service Support Center
Optimizing your content for search and discovery
Title of Presentation 5/24/2019 1:26 PM
Presentation transcript:

6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365 Information Protection EJ Bastien Principal eDiscovery Program Manager, Legal © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

eDiscovery Features in O365 Preserve Scope Minimize Analyze Review In Place Preservation: Keep what you need where it is created, without impeding productivity. Top Locations Reports: Search the network to identify the most significant data sources Complex Query Filtering Predictive Coding Deduplication Threading Near Duplicate Identification Themes Format exports for direct ingestion to leading Review platforms Excel-based organization for small-scale projects

Average Microsoft case for FY17 Preserve Scope Minimize Analyze Review 3 TB 41 people 80-200 sources 1 TB 14 people ~60 sources 38 GB 96% cull Deduplication: -30% Threading: -26% Organized Prioritized Consistent

Pre-Advanced eDiscovery Workflow 6/22/2018 2:09 PM Pre-Advanced eDiscovery Workflow Custodians and search queries provided by counsel Native-format search results reprocessed for review ingestion Unique, inclusive content batched and promoted for review Search results exported with local downloads Deduplication, near-duplicate identification, threading, theme detection Search results assessment and iteration Downloaded content packaged and transmitted to review hosting vendor Review commences Search Export Process Analyze Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Advanced eDiscovery Workflow 6/22/2018 2:09 PM Advanced eDiscovery Workflow Custodians and search queries provided by counsel Search results promoted for AED analysis Unique, inclusive content batched for review Organized review set exported, already formatted for direct ingestion to review tool Deduplication, near-duplicate identification, threading, theme detection Search results assessment and iteration Review commences Relevance training/ Predictive coding (optional) Search Analyze Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Technology behind Advanced eDiscovery 6/22/2018 2:09 PM Technology behind Advanced eDiscovery Rachi Messing © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Office 365 Advanced eDiscovery Advanced eDiscovery solutions include: Near-duplicates Email threads Relevance Themes

Office 365 Advanced eDiscovery Advanced eDiscovery solutions include: Near-Duplicates Email threads Relevance Themes Identify near-duplicate (similar) documents Choose a representative of the set  Easier to review similar documents when grouped together

Office 365 Advanced eDiscovery Advanced eDiscovery solutions include: Near-Duplicates Email threads Relevance Themes Email Thread – an email that includes previous correspondences In each thread, identify the most inclusive email Flag all Unique content in the thread for review  Avoid reading many redundant mails

Office 365 Advanced eDiscovery Advanced eDiscovery solutions include: Near-Duplicates Email threads Relevance Themes Supervised learning Identify relevant documents using Active Learning with a Support-Vector-Machine classifier Need to review a much smaller set of documents Prioritize documents review by the relevance score

Relevance Software calculates relevance scores for documents 6/22/2018 2:09 PM Relevance Software calculates relevance scores for documents © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How Relevance learns ? 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved.

It’s not always that simple 6/22/2018 2:09 PM It’s not always that simple © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

It’s not always that simple 6/22/2018 2:09 PM It’s not always that simple © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Office 365 Advanced eDiscovery Advanced eDiscovery solutions include: Near-Duplicates Email threads Relevance Themes Unsupervised learning Automatically Identify the main themes (topics) in the dataset using Topic Modeling  Overview of the data collection  Uncover ‘buried data’

The idea is to imitate how a human writes a document THEMES – TOPIC MODELING The idea is to imitate how a human writes a document One first thinks in terms of subjects and themes then express them using words The model posits that each document is a mixture of latent themes (but typically not many) Each word in a document is associated with a theme

THEMES - TOPIC MODELING Output: List of themes (words distribution for each theme) Themes distributions for each document Input: The documents in the corpus Theme1: Microsoft (0.4), computer (0.2), Windows (0.1), … Theme2: Conference (0.3), Machine-Learning (0.3), … ---------------------------------------- ---------------------------------------- --------------------------------------------------------------- Theme10: Redmond (0.6), Seattle (0.2), Washington (0.2), … ---------------------------------------- ---------------------------------------- ---------------------------------------- Themes Doc1 Doc2 ----- DocN Theme3 (0.2) Theme5 (0.2) Theme6 (0.1) Theme8 (0.05 Theme1 (0.4) Theme4 (0.1) Theme5 (0.05) Theme7 (0.03) ... … Theme1 (0.2) Theme6 (0.2) Theme8 (0.1) Theme9 (0.1) Themes distributions in documents

THEMES – VISUALIZE Themes proportions in document and assignments Documents survive . 0.15 . 0.2 . 0.1 prediction

THEMES – SUGGEST THEMES Suggest the words ‘venture, company’ If the suggested words exist in the corpus… A new theme: ‘venture, company’, funds, partners, …. Use to help uncover ‘buried data’.

Business Case Comparison 6/22/2018 2:09 PM Business Case Comparison © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Case Scenario: Outsource All, No Analytics Assumptions Estimated Review Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ 200.00 Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Tech PM Rate /Hour Outsourced PM hours 300 Hours Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 50 Reviewer cost per hour $50.00 Category Formula Cost Culling 1024GB x $50 $ 51,200 Pre-Cull Hosting 1024GB x $1 x 24 Months $ 24,576 Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Months $ 41,933 PM Hours 300 hours x $200 $ 60,000 Tech Charges Subtotal   $ 187,949 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 30%) x 3,000) / 50) x $50 $ 349,440 1st Pass Review Total $ 537,389 Inside O365 Outside O365 Culling Collection Processing Analytics Review

Case Scenario: E3 Targeted Collections 6/22/2018 2:09 PM Case Scenario: E3 Targeted Collections Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ 200.00 Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours saved 80 Hours Saved Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 50 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ 0 Pre-Cull Hosting Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Months $ 41,933 PM Hours (300 hours – 80 hours) x $200 $ 44,000 Tech Charges Subtotal   $ 96,173 Tech Charges Savings 50% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 30%) x 3,000) / 50) x $50 $ 349,440 1st Pass Review Total $ 445,613 Overall Savings 18% Inside O365 Outside O365 Culling Collection Processing Analytics Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Case Scenario: E5 Advanced eDiscovery 6/22/2018 2:09 PM Case Scenario: E5 Advanced eDiscovery Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ 200.00 Pre-Processing Cull Reduction 95% Reduction by volume Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours saved 80 Hours Saved E5 hours saved 40 Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ 0 Pre-Cull Hosting Processing Post-Cull Hosting (((1024GB – 95%) x 3.25) – 45%) x $15 x 24 Months $ 32,947 PM Hours (300 hours – 120 hours) x $200 $ 36,000 Tech Charges Subotal   $ 68,947 Tech Charges Savings 63% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 251,987 Overall Savings 53% Inside O365 Outside O365 Culling Analytics Collection Review Processing © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Scenario Comparison Summary: 1TB Case Outsource All, No Analytics Advanced eDiscovery Targeted Collections Cost: $537,389 Savings: 0% Cost: $445,613 Technical Savings: 50% Overall Savings: 18% Improved Security Cost: $251,987 Technical Savings: 63% Overall Savings: 53% Improved Security Faster More Consistent

6/22/2018 2:09 PM Questions? © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Please evaluate this session Tech Ready 15 6/22/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite https://myignite.microsoft.com/evaluations Phone: download and use the Microsoft Ignite mobile app https://aka.ms/ignite.mobileapp Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Appendix: Additional Case Scenarios

Case Scenario: Outsource All, Offline Analytics Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ 200.00 Analytics Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Email Threading 25% Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling 1024GB x $50 $ 51,200 Pre-Cull Hosting 1024GB x $1 x 24 Months $ 24,576 Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Mos $ 41,933 Analytics ((1024GB – 95%) x 3.25) -30%) x $50 $ 5,824 PM Hours 300 hours x $200 $ 60,000 Tech Charges Subtotal   $ 193,773 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 376,813 Inside O365 Outside O365 Culling Review Collection Processing Analytics

Case Scenario: E3 Targeted Collections, Offline Analytics Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ 200.00 Analytics Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Email Threading 25% Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours 80 Hours Saved Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ 0 Pre-Cull Hosting Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Mos $ 41,933 Analytics ((1024GB – 95%) x 3.25) -30%) x $50 $ 5,824 PM Hours (300 hours – 80 hours) x $200 $ 44,000 Tech Charges Total   $ 101,997 Tech Charges Savings 46% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 285,037 Overall Savings 47% Inside O365 Outside O365 Culling Collection Review Processing Analytics