NLP vs ML in the classification of legislative texts Student Kai Krabben Supervisors Radboud Winkels & Emile de Maat Leibniz Centre for Law.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

APPLYING KNOWLEDGE Examples of practical results of using advanced (AI) solutions in law.
University of Sheffield NLP Module 4: Machine Learning.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
1 / 22 Issues in Text Similarity and Categorization Jordan Smith – MUMT 611 – 27 March 2008.
Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
XGTagger, a generic interface dealing with XML contents. September 19 th, 2005 Xavier Tannier, Jean-Jacques Girardot, Mihaela Mathieu Ecole des Mines de.
Semantic Annotation for Multilingual Search Shibamouli Lahiri
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Text Classification With Support Vector Machines
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
9 th Annual Bridge the Legal Research Gap Researching Federal Legislative History Bob Menanteaux.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Like.com vs. Ugmode Non-infringement arguments *** CONFIDENTIAL *** Prepared by Ugmode, Inc.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
A Language Independent Method for Question Classification COLING 2004.
Natural language processing tools Lê Đức Trọng 1.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Compiling, processing and accessing the collection of legal regulations of the Republic of Croatia T. Didak Prekpalaj, T. Horvat, D. Miletić, D. Mokriš.
Chapter 7. Organizing Your Information © 2010 by Bedford/St. Martin's1 Understand three principles for organizing technical information: Analyze your audience.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Product Project Self Evaluation Chapters 4 and 21.
Transductive Inference for Text Classification using Support Vector Machines - Thorsten Joachims (1999) 서울시립대 전자전기컴퓨터공학부 데이터마이닝 연구실 G 노준호.
Assignment Writing Skills Tutorials, Problems and Reviews.
R. Winkels Comparing XML standards Alexander Boer Leibniz Center for Law University of Amsterdam.
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
The Unreasonable Effectiveness of Data
© 2007 Pearson Education, Inc. publishing as Longman Publishers Efficient and Flexible Reading, 8/e by Kathleen T. McWhorter Chapter 7: Techniques for.
Text Annotation By: Harika kode Bala S Divakaruni.
R. Winkels An Interchange Format for Legal Documents Radboud Winkels Leibniz Center for Law University of Amsterdam.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
M M M M 5. Not Listed

Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay.
This is a sentence made using 10 point font. This is a sentence made using 12 point font. This is a sentence made using 14 point font. This is a sentence.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
WP4 Models and Contents Quality Assessment
Socratic Seminar Reflection
Auto Coding System Development and application
fMRI and neural encoding models: Voxel receptive fields (continued)
Lecture 24: Relation Extraction
P08a - Disclosure and Barring Service Applications
Sample MLA Research Paper
Writing Analytics Clayton Clemens Vive Kumar.
Lecture 9: Semantic Parsing
Chapter 3 Critically reviewing the literature
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
PUBLIC SCHOOL LAW Part 9: Primary Legal Sources: The Constitution
How to Summarize.
Presentation transcript:

NLP vs ML in the classification of legislative texts Student Kai Krabben Supervisors Radboud Winkels & Emile de Maat Leibniz Centre for Law

Leibniz Centre for Law (UvA) AILaw

Automated Modelling of Sources of Law

Legal Texts Models of Law

Automated Modelling of Sources of Law Legal Texts Classification Models of Law

Text Classification in General Assign an electronic document to one or more categories, based on its contents. Machine Learning Approach Natural Language Approach ML better in general

Text Classification in Legal Texts Assign a legislative text fragment to a legal categorie, based on its content. Arguments for ML: – Flexibility – Simplicity – Performance Arguments for NLP: – Clear patterns – Next step: modelling – No black box

NLP approach Winkels and De Maat Distinguisable patterns for every category Accuracy of 91% ML approach Biagioli et al. Bag-of-words representation Multiclass Support Vector Machines Accuracy: up to 92%

ML vs. NLP Problem: Studies incomparable Italian Law vs. Dutch Law Different Categories Paragraphs vs. Sentences

Goal Bachelor Project Main Goal Compare ML and NLP approach Use techniques of Biagioli et. al on data of Winkels and De Maat Extra Further analysis of differences in approaches Further improvements to the current system

Planning AprilPreprocessing Corpus annotation Software testing MayExperiments … more experiments? JuneValidate results Write final report Prepare final presentation

Expected Results Good results for ML… … not as good as NLP!

Automated Modelling of Sources of Law