Reflex 2: a Look at the Internals of an Automated Legislative C itator Marc-André Morissette Daniel Shane Valentin.

Slides:



Advertisements
Similar presentations
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Attorney-Editorial Case Enhancements. Editorial Enhancements This slip opinion appears just as written by the judge and processed and filed with the court.
The Convergence of Law and Technology The following slides will provide a brief overview of the Potomac Publishing online service. After viewing this presentation,
Compiler Construction
1 Regular Expressions and Automata September Lecture #2-2.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
Grammars Examples and Issues. Examples from Last Lecture a + b a b + a*bc* First draw a state diagram Then create a rule for each transition.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Rosen 5th ed., ch. 11 Ref: Wikipedia
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Copyright © 2005 Pearson Education Canada Inc. Business Law In Canada, 7/e, Chapter 1 Business Law in Canada, 7/e Chapter 1 Introduction to the Legal System.
Chapter 2. In Canada laws originate from three sources: 1.previous legal decisions (common law), 2.elected government representatives (statute law), 3.Canadian.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
CS-5800 Theory of Computation II PROJECT PRESENTATION By Quincy Campbell & Sandeep Ravikanti.
The Brussels I Regulation Jurisdiction in matters of insurance, consumers contracts and individual contracts of employment.
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
Federal Legal Print Materials Legal Writing Prof. Glassman - - Spring 2011.
Lexical Analysis Constructing a Scanner from Regular Expressions.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Natural Language Processing Lecture 2—1/15/2015 Susan W. Brown.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Finite Automata – Definition and Examples Lecture 6 Section 1.1 Mon, Sep 3, 2007.
CSE 311 Foundations of Computing I Lecture 27 FSM Limits, Pattern Matching Autumn 2012 CSE
Vicki Jay Leung, Reference Librarian Paul Martin Law Library October 2015.
Cecilia Tellis, Law Librarian Brian Dickson Law Library Principles of Legal Research Fall 2008 Week 11: Nov
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Copyright © 2008 Pearson Education Canada2-1 Chapter 2: Introduction to the Legal System.
Chapter 7 - Sequence patterns1 Chapter 7 – Sequence patterns (first part) We want a signature for a protein sequence family. The signature should ideally.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
Finite Automata Chapter 1. Automatic Door Example Top View.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
From Natural Language to LTL: Difficulties Capturing Natural Language Specification in Formal Languages for Automatic Analysis Elsa L Gunter NJIT.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Legal Research 1. Legislation 2. Case Law. What is Legislation Acts (also called Statutes) and regulations passed by the Parliament of Canada and by provincial.
Deterministic Finite Automata Nondeterministic Finite Automata.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter Two Classifying Law. Key Terms and Concepts administrative law p. 43 administrative law p. 43 bylaws p. 37 bylaws p. 37 civil law p. 44 civil.
Common Law Aka “Case Law” Decisions made by previous judges. Rule of precedent (what has been done in the past.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Requirements Specification
CSE 105 theory of computation
By Annette Demers Legislation Online By Annette Demers
Chapter 2 FINITE AUTOMATA.
CS 3304 Comparative Languages
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
CSE 105 theory of computation
Chapter 1 Regular Language
Lecture 5 Scanning.
CSE 105 theory of computation
CSE 105 theory of computation
Presentation transcript:

Reflex 2: a Look at the Internals of an Automated Legislative C itator Marc-André Morissette Daniel Shane Valentin Bujanca

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Context First, a look at CanLII The most popular resource for Canadian primary legal material > 1M court and tribunal decisions Federal statutes and regulations + statutes and regulations for 13 provinces and territories, most point in time > 30,000,000 pages of legal text Citator Automatically recognizes legal citations and adds hyperlinks Convenience Note-up Future improvements to search

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Traditional Architecture of a Citator Phase 1: Recognition of Citation Elements A) Titles B) Section Numbers C) Chapter Number / Formal Citations Phase 2: Heuristics Example: of the  tie section and legislation together Phase 3: Markup Add hyperlinks

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Titles (1) More than 40,000 titles in our database Those words are composed together to create a Nondeterministic Finite Automaton Agreement marketing cooperative act marketing act Agriculture programs products start on implementation trade internal act marketing act

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Titles (2) The text of the document is split into words For each word, a new path trough the automaton is attempted If a path is completed, then we have found the title of a document

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Sections Regular Expressions Based on formal language and automata theory Determine whether a given string of text matches a given pattern Examples: \d matches any digit \d+ matches any number of digits (s\.|ss\.|section|subsection) (\d+) matches a reference to a section (hopefully)

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Chapter Numbers (1) Typically done with Regular Expressions too RSC (\d{4}), c ([A-Z]\-\d+) matches RSC 1985, c C-46 For every match, check legislative databases for matches …but becomes a bother really quick Because citations vary greatly across jurisdiction Because they even vary greatly across the same jurisdiction RSC 1985, c C-46 (codified in 1985) SRC 1970, c P-33 (codified in 1970) SC 1997, c 36 (annual statute) RSC 1985, c 32 (4 th suppl.) (codified in the period) SC , c 37 (rare) SC 1992, c 46, Sch II (rare) SC 2003, c 22, s. 6 (damn)

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Chapter Numbers (2) Complexity  lots of errors by judges in citing statutes and regulations RSC 1985, c C-46 RS 1985, C-46 SC 1985, c46 RSC, 85, C, C46 More complex regular expressions can deal with this… to some degree

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Chapter Numbers (3) Things start to break down when all these variations come together # jurisdictions X # citation forms X # acceptable user errors = Massive headache Solution: invert the problem Don’t try to match every possible citation in the text Instead, generate every possible acceptable variation

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Chapter Numbers (4) For every citation in our database, add rules For every “RSC” in citation, generate a variation with “RS” instead RSC 1985, c C-46 RS 1985, c C-46 For every “c” after the year, generate a variation with “chapter” or “ch” RS 1985, c C-46 RSC, 1985 c C-46 RS 1985, ch C-46 RSC, 1985 ch C-46 RS 1985, chapter C-46 RSC, 1985 chapter C-46

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Recognition of Chapter Numbers (5) Better than Regular Expressions because We can limit what variations can combine together RSC C c. C46 C-46 One variation rule can be written to cover all jurisdictions every variation within that jurisdiction No need to know there are rare forms such as SC 1992, c 46, Sch II The variations are fed into the Nondeterministic Finite Automaton RSC 1985 chapter C c ch 46 RS C C 1985 start

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Phase 1: Recognition of Citation Elements A) Titles (NFA) B) Section Numbers (Regular Expressions, a form of Automata) C) Chapter Number / Formal Citations (NFA) Done using an implementation of Pike’s VM Creates a large virtual machine out of any DFA or Regular Expression Created by Russ Cox (Bell Labs)

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Phase 2: Association Heuristics Our objective for these rules: Be conservative and minimize the number of false positives A sample If multiple overlapping citations are recognized, use the longest one Criminal Code / Order Designating Saskatchewan for the Purposes of the Criminal Interest Rate Provisions of the Criminal Code Learn shorthand aliases (the “Act”): ExampleExample

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Phase 2: Association Heuristics (2) Section association Some sections are strongly associated Section 12 of the Criminal Code, RSC 1985, c C-46 Others are weakly associated 1.If one section is strongly associated, then every other section with the same number has the same association ExampleExample 2.If a section is followed by the words “of the” without a citation, then do not associate ExampleExample 3.If a section number follows another citation close enough

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Phase 2: Association Heuristics (3) Popular, alternative and previous legislation titles and citations added to our databases PIPEDA – Personal Information Protection and Electronic Documents Act Unemployment Insurance Act – Employment Insurance Act Gazette numbers for certain regulation collections Resolve ambiguous citations using basic jurisdictional rules Example

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 The End

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Conclusion CanLII’s rate of recognition for legislative citations massively improved Harder numbers forthcoming

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Legislative Citations Simple example Criminal Code, RSC 1985, c C-46, ss Not so simple examples Mangled chapter numbers 1985 C-46 RSC C46 RS, (1985), chapter C 46 Section numbers s., ss., sec., section, subsec, sub-sec, para, alinea, etc.

Reflex 2: A Look at the Internals of an Automated Legislative Citator at LVI 2012 Legislative Citations (2) Ambiguous citations Family Law Act (which jurisdiction?) Familiar names PIPEDA - Personal Information Protection and Electronic Documents Act Obamacare – PPACA – Patient Protection and Affordable Care Act Substitution Acronyms […] pursuant to s of the Environment Quality Act (“EQA”), to […] […] interpretation of s EQA, but […] The vagaries of human language […] not apply to section 25 of the Criminal Code […] Section 37, however […]