Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365.

Similar presentations


Presentation on theme: "6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365."— Presentation transcript:

1 6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365 Information Protection EJ Bastien Principal eDiscovery Program Manager, Legal © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 eDiscovery Features in O365
Preserve Scope Minimize Analyze Review In Place Preservation: Keep what you need where it is created, without impeding productivity. Top Locations Reports: Search the network to identify the most significant data sources Complex Query Filtering Predictive Coding Deduplication Threading Near Duplicate Identification Themes Format exports for direct ingestion to leading Review platforms Excel-based organization for small-scale projects

3 Average Microsoft case for FY17
Preserve Scope Minimize Analyze Review 3 TB 41 people sources 1 TB 14 people ~60 sources 38 GB 96% cull Deduplication: -30% Threading: -26% Organized Prioritized Consistent

4 Pre-Advanced eDiscovery Workflow
6/22/2018 2:09 PM Pre-Advanced eDiscovery Workflow Custodians and search queries provided by counsel Native-format search results reprocessed for review ingestion Unique, inclusive content batched and promoted for review Search results exported with local downloads Deduplication, near-duplicate identification, threading, theme detection Search results assessment and iteration Downloaded content packaged and transmitted to review hosting vendor Review commences Search Export Process Analyze Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Advanced eDiscovery Workflow
6/22/2018 2:09 PM Advanced eDiscovery Workflow Custodians and search queries provided by counsel Search results promoted for AED analysis Unique, inclusive content batched for review Organized review set exported, already formatted for direct ingestion to review tool Deduplication, near-duplicate identification, threading, theme detection Search results assessment and iteration Review commences Relevance training/ Predictive coding (optional) Search Analyze Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 Technology behind Advanced eDiscovery
6/22/2018 2:09 PM Technology behind Advanced eDiscovery Rachi Messing © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Office 365 Advanced eDiscovery
Advanced eDiscovery solutions include: Near-duplicates threads Relevance Themes

8 Office 365 Advanced eDiscovery
Advanced eDiscovery solutions include: Near-Duplicates threads Relevance Themes Identify near-duplicate (similar) documents Choose a representative of the set  Easier to review similar documents when grouped together

9 Office 365 Advanced eDiscovery
Advanced eDiscovery solutions include: Near-Duplicates threads Relevance Themes Thread – an that includes previous correspondences In each thread, identify the most inclusive Flag all Unique content in the thread for review  Avoid reading many redundant mails

10 Office 365 Advanced eDiscovery
Advanced eDiscovery solutions include: Near-Duplicates threads Relevance Themes Supervised learning Identify relevant documents using Active Learning with a Support-Vector-Machine classifier Need to review a much smaller set of documents Prioritize documents review by the relevance score

11 Relevance Software calculates relevance scores for documents
6/22/2018 2:09 PM Relevance Software calculates relevance scores for documents © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 How Relevance learns ? 6/22/2018 2:09 PM
© Microsoft Corporation. All rights reserved.

18 It’s not always that simple
6/22/2018 2:09 PM It’s not always that simple © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 It’s not always that simple
6/22/2018 2:09 PM It’s not always that simple © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Office 365 Advanced eDiscovery
Advanced eDiscovery solutions include: Near-Duplicates threads Relevance Themes Unsupervised learning Automatically Identify the main themes (topics) in the dataset using Topic Modeling  Overview of the data collection  Uncover ‘buried data’

21 The idea is to imitate how a human writes a document
THEMES – TOPIC MODELING The idea is to imitate how a human writes a document One first thinks in terms of subjects and themes then express them using words The model posits that each document is a mixture of latent themes (but typically not many) Each word in a document is associated with a theme

22 THEMES - TOPIC MODELING
Output: List of themes (words distribution for each theme) Themes distributions for each document Input: The documents in the corpus Theme1: Microsoft (0.4), computer (0.2), Windows (0.1), … Theme2: Conference (0.3), Machine-Learning (0.3), … Theme10: Redmond (0.6), Seattle (0.2), Washington (0.2), … Themes Doc1 Doc2 ----- DocN Theme3 (0.2) Theme5 (0.2) Theme6 (0.1) Theme8 (0.05 Theme1 (0.4) Theme4 (0.1) Theme5 (0.05) Theme7 (0.03) ... Theme1 (0.2) Theme6 (0.2) Theme8 (0.1) Theme9 (0.1) Themes distributions in documents

23 THEMES – VISUALIZE Themes proportions in document and assignments
Documents survive . 0.15 . 0.2 . 0.1 prediction

24 THEMES – SUGGEST THEMES
Suggest the words ‘venture, company’ If the suggested words exist in the corpus… A new theme: ‘venture, company’, funds, partners, …. Use to help uncover ‘buried data’.

25 Business Case Comparison
6/22/2018 2:09 PM Business Case Comparison © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

26 Case Scenario: Outsource All, No Analytics
Assumptions Estimated Review Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Tech PM Rate /Hour Outsourced PM hours 300 Hours Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 50 Reviewer cost per hour $50.00 Category Formula Cost Culling 1024GB x $50 $ 51,200 Pre-Cull Hosting 1024GB x $1 x 24 Months $ 24,576 Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Months $ 41,933 PM Hours 300 hours x $200 $ 60,000 Tech Charges Subtotal $ 187,949 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 30%) x 3,000) / 50) x $50 $ 349,440 1st Pass Review Total $ 537,389 Inside O365 Outside O365 Culling Collection Processing Analytics Review

27 Case Scenario: E3 Targeted Collections
6/22/2018 2:09 PM Case Scenario: E3 Targeted Collections Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours saved 80 Hours Saved Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 50 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ Pre-Cull Hosting Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Months $ 41,933 PM Hours (300 hours – 80 hours) x $200 $ 44,000 Tech Charges Subtotal $ 96,173 Tech Charges Savings 50% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 30%) x 3,000) / 50) x $50 $ 349,440 1st Pass Review Total $ 445,613 Overall Savings 18% Inside O365 Outside O365 Culling Collection Processing Analytics Review © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 Case Scenario: E5 Advanced eDiscovery
6/22/2018 2:09 PM Case Scenario: E5 Advanced eDiscovery Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ Pre-Processing Cull Reduction 95% Reduction by volume Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours saved 80 Hours Saved E5 hours saved 40 Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ Pre-Cull Hosting Processing Post-Cull Hosting (((1024GB – 95%) x 3.25) – 45%) x $15 x 24 Months $ 32,947 PM Hours (300 hours – 120 hours) x $200 $ 36,000 Tech Charges Subotal $ 68,947 Tech Charges Savings 63% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 251,987 Overall Savings 53% Inside O365 Outside O365 Culling Analytics Collection Review Processing © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 Scenario Comparison Summary: 1TB Case
Outsource All, No Analytics Advanced eDiscovery Targeted Collections Cost: $537,389 Savings: 0% Cost: $445,613 Technical Savings: 50% Overall Savings: 18% Improved Security Cost: $251,987 Technical Savings: 63% Overall Savings: 53% Improved Security Faster More Consistent

30 6/22/2018 2:09 PM Questions? © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 Please evaluate this session
Tech Ready 15 6/22/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite Phone: download and use the Microsoft Ignite mobile app Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 6/22/2018 2:09 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 Appendix: Additional Case Scenarios

34 Case Scenario: Outsource All, Offline Analytics
Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ Analytics Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Threading 25% Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling 1024GB x $50 $ 51,200 Pre-Cull Hosting 1024GB x $1 x 24 Months $ 24,576 Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Mos $ 41,933 Analytics ((1024GB – 95%) x 3.25) -30%) x $50 $ ,824 PM Hours 300 hours x $200 $ 60,000 Tech Charges Subtotal $ 193,773 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 376,813 Inside O365 Outside O365 Culling Review Collection Processing Analytics

35 Case Scenario: E3 Targeted Collections, Offline Analytics
Assumptions Estimated Technical Costs Category Quantity Unit Volume 1024 GBs Post-Processing Expansion Rate 325% Increase by volume Pre-Cull Hosting $ 1.00 /GB, per Month Post-Cull Hosting $ 15.00 Culling $ 50.00 /GB Processing $ Analytics Pre-Processing Cull Reduction 95% Reduction by volume Exact MD5 Deduplication Reduction 30% Threading 25% Blended Analytics Reduction (Deduplication, Thread Compression) 45% Tech PM Rate /Hour Outsourced PM hours per case 300 Hours E3 hours 80 Hours Saved Case Lifespan 24 Months Reviewable items per GB 3,000 Reviewable Items Items reviewed per hour 75 Reviewer cost per hour $50.00 Category Formula Cost Culling N/A $ Pre-Cull Hosting Processing (1024GB – 95%) x $200GB $ 10,240 Post-Cull Hosting (((1024GB – 95%) x 3.25) – 30%) x $15 x 24 Mos $ 41,933 Analytics ((1024GB – 95%) x 3.25) -30%) x $50 $ ,824 PM Hours (300 hours – 80 hours) x $200 $ 44,000 Tech Charges Total $ 101,997 Tech Charges Savings 46% 1st Pass Review Charges (((((1024GB – 95%) x 3.25) – 45%) x 3,000) / 75) x $50 $ 183,040 1st Pass Review Total $ 285,037 Overall Savings 47% Inside O365 Outside O365 Culling Collection Review Processing Analytics


Download ppt "6/22/2018 2:09 PM BRK3102 How Microsoft Legal drives down eDiscovery costs with machine learning in Office 365 Rachi Messing Senior Program Manager, O365."

Similar presentations


Ads by Google