NLP vs ML in the classification of legislative texts Student Kai Krabben Supervisors Radboud Winkels & Emile de Maat Leibniz Centre for Law
Leibniz Centre for Law (UvA) AILaw
Automated Modelling of Sources of Law
Legal Texts Models of Law
Automated Modelling of Sources of Law Legal Texts Classification Models of Law
Text Classification in General Assign an electronic document to one or more categories, based on its contents. Machine Learning Approach Natural Language Approach ML better in general
Text Classification in Legal Texts Assign a legislative text fragment to a legal categorie, based on its content. Arguments for ML: – Flexibility – Simplicity – Performance Arguments for NLP: – Clear patterns – Next step: modelling – No black box
NLP approach Winkels and De Maat Distinguisable patterns for every category Accuracy of 91% ML approach Biagioli et al. Bag-of-words representation Multiclass Support Vector Machines Accuracy: up to 92%
ML vs. NLP Problem: Studies incomparable Italian Law vs. Dutch Law Different Categories Paragraphs vs. Sentences
Goal Bachelor Project Main Goal Compare ML and NLP approach Use techniques of Biagioli et. al on data of Winkels and De Maat Extra Further analysis of differences in approaches Further improvements to the current system
Planning AprilPreprocessing Corpus annotation Software testing MayExperiments … more experiments? JuneValidate results Write final report Prepare final presentation
Expected Results Good results for ML… … not as good as NLP!
Automated Modelling of Sources of Law