Current Methodologies for Supervised Machine Learning in E-Discovery

Current Methodologies for Supervised Machine Learning in E-Discovery
Bill Dimm Text Analytics Forum 2018 November 8, 2018 Since supervised machine learning gained court acceptance for use in e-discovery 6 years ago, best practices have evolved. This talk describes the special circumstances of e-discovery and the best approaches that are currently in use. How robust is the Continuous Active Learning (CAL) approach? How much impact does the choice of seed documents have? What are SCAL and TAR 3.0?

E-Discovery

Supervised Machine Learning Predictive Coding
Technology-Assisted Review

TAR Acceptance My analysis (from the draft of my book) of the 2009 TREC Legal Track that Grossman & Cormack analyzed.

E-Discovery Considerations
Need to hit a certain recall Near-duplicates Variable data Dirty data Small part of document can be critical Non-standard word usage Prevalence can be low Difficulty getting data for experiments

Toy Example for Workflows
Jump out to animations after explaining this figure.

Weak Seed

Wrong Seed

Disjoint Relevance

References Maura Grossman and Gordon Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review. Richmond Journal of Law and Technology, XVII, 2011. William Dimm, Predictive Coding: Theory & Practice, [Draft December 9, 2015], Appendix A. William Dimm, TAR 3.0 Performance, Clustify Blog, January 28, 2016. Gordon Cormack and Maura Grossman, Scalability of Continuous Active Learning for Reliable High-Recall Text Classification, Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016. William Dimm, The Single Seed Hypothesis, Clustify Blog, April 25, 2015.

Current Methodologies for Supervised Machine Learning in E-Discovery

Similar presentations

Presentation on theme: "Current Methodologies for Supervised Machine Learning in E-Discovery"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Current Methodologies for Supervised Machine Learning in E-Discovery

Similar presentations

Presentation on theme: "Current Methodologies for Supervised Machine Learning in E-Discovery"— Presentation transcript:

Similar presentations

About project

Feedback