The Rule-based Parser of the NLP Group of the University of Torino

Slides:



Advertisements
Similar presentations
Day 1 Punctuation and Capitalization
Advertisements

CGMIL Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Augmented Transition Networks
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Chapter 4 Syntax.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
MORPHOLOGY - morphemes are the building blocks that make up words.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
1 Pertemuan 22 Natural Language Processing Syntactic Processing Matakuliah: T0264/Intelijensia Semu Tahun: Juli 2006 Versi: 2/2.
Stemming, tagging and chunking Text analysis short of parsing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Elicitation Corpus April 12, Agenda Tagging with feature vectors or feature structures Combinatorics Extensions.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
PARTS OF SPEECH 1 The principles of the traditional classification of the English vocabulary 2 Notional and functional parts of speech. 3 The field structure.
Dictionary.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Embedded Clauses in TAG
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Chapter 15 Natural Language Processing (cont)
A semantic based methodology to classify and protect sensitive data in medical records Flora Amato, Valentina Casola, Antonino Mazzeo, Sara Romano Dipartimento.
Context-Free Parsing Read J & M Chapter 10.. Basic Parsing Facts Regular LanguagesContext-Free Languages Required Automaton FSMPDA Algorithm to get rid.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
ENGLISH SYNTAX Introduction to Transformational Grammar.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Making it stick together…
Supertagging CMSC Natural Language Processing January 31, 2006.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NATURAL LANGUAGE PROCESSING
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Introduction to Linguistics
CSC 594 Topics in AI – Natural Language Processing
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Basic Parsing with Context Free Grammars Chapter 13
Dependency Parsing & Feature-based Parsing
Probabilistic and Lexicalized Parsing
Universal Dependencies
Machine Learning in Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Artificial Intelligence 2004 Speech & Natural Language Processing
A Link Grammar for an Agglutinative Language
Presentation transcript:

The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università di Torino, Italy Email: lesmo@di.unito.it

Goals Approach Wide-coverage tool Domain-independence Extensibility to semantics Approach Manually developed rules Two phases: Chunking and subcategorization Procedural analysis of conjunctions and of identification of verbal dependents

TULE (Turin University Linguistic Environment) TOKENIZER Tokens Text Token Automaton Splits the text into words, numbers, punctuation marks DICTIONARY LOOKUP Sets of lexical items Morphological dictionary Suffix tables Extracts all lexical interpretations of each token POS TAGGER Tagging rules Chooses one lexical interpretation Lexical items DEPENDENCY PARSER Parse Tree Parsing rules Verbal Caseframes Establishes the connections between lexical items

The grammar Rule-based dependency grammar Chunking (non-verbal groups) + verbal subcategorization frames Output: a projective tree represented as pointers to parents, including some null elements (understood items – e.g. pro-drop - and traces)

ANALYSIS OF CONJUNCTIONS Parser Architecture Lexical Items Splits the text into groups of strictly connected words Chunking rules CHUNKING Chunked text Connects chunks linked by conjunctions, to form larger chunks Procedural preference rules 1 ANALYSIS OF CONJUNCTIONS Chunked text Procedural preference rules 2 Determines the dependents of verbs SEGMENTATION Lexical items Verb classes VERBAL ATTACHMENT Determines the role (arc labels) of the verbal dependents Verbal Caseframes Parse Tree

An example Lexical Items Parse Tree Infos Example: Slitta a Tirana la decisione sullo stato di emergenza. (The decision on the emergency status in Tirana has been delayed) 1 Slitta (SLITTARE VERB MAIN IND PRES INTRANS 3 SING) 2 a (A PREP MONO) 3 Tirana (TIRANA NOUN PROPER F SING ££CITY) 4 la (IL ART DEF F SING) 5 decisione (DECISIONE NOUN COMMON F SING DECIDERE INTRANS) 6 sullo ((SU PREP MONO) 6.10 (IL ART DEF M SING)) 7 stato (STATO NOUN COMMON M SING) 8 di (DI PREP MONO) 9 emergenza (EMERGENZA NOUN COMMON F SING) 10 . (#\. PUNCT) Lexical Items [0;TOP-VERB] [1;PREP-RMOD] [2;PREP-ARG] [1;VERB-SUBJ] [4;DET+DEF-ARG] [5;PREP-RMOD] [6;PREP-ARG] [6.10;DET+DEF-ARG] [7;PREP-RMOD] [8;PREP-ARG] [1;END] Parse Tree Infos 1: Slitta Prep-rmod 2: a Verb-subj 4: la 3: Tirana Prep-arg 5; decisione Det+def-arg 6: su 6.10: lo Stato di emergenza

Chunking Chunking Rules Example: Puoi dirmi che spettacoli di cabaret posso vedere domani? (Can you tell me what cabaret plays I can see tomorrow?) PuoiV-modal-2nd-sing-pres dirV-inf [miPron-1st-dative]Pron [cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres vedereV-inf [domaniAdv]A-group? Chunking Rules Chunking rules are grouped in packets. Each packet is associated with a lexical category, and describes the “chunkable” possible dependents of words of that category. Chunkable means a dependent handled during chunking (e.g. auxiliaries, but not arguments of verbs)

A chunk rule (NOUN common (precedes (ADJ qualif T (#\- #\' #\")) Packet (governing word) feature (constrains applicability) Position of dep (and possible words separating head from dep) (NOUN common (precedes (ADJ qualif T (#\- #\' #\")) (ADJ ((type qualif) (agree))) ADJC+QUALIF-RMOD)) Category of possible dep (and constraints on it) Label of connecting arc

Conjunctions HoV-aux incontratoV-main When a coordinating conjunction is found, all following and preceding chunks are collected All pairs are built, and the best one is chosen according to criteria based on structural similarity and distance Special treatment for verbs Example: Ho incontrato Marco e Lucia e li ho salutati (I met Marco e Lucia and I greeted them) HoV-aux incontratoV-main [MarcoNoun-Proper]Noun eConj-coord [LuciaNoun-Proper]Noun eConj-coord [liPron-pers ]Pron hoV-aux salutatiV-main

Segmentation For each verb (going from left to right): Look for possible dependents (on its right and left) On the left, the search is blocked from the previous verb On the right, some “barriers” are defined to stop the search (for instance, a subordinating conjunction acts as a barrier) PuoiV-modal-2nd-sing-pres { dirV-inf [miPron-1st-dative]Pron {[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres {vedereV-inf [domaniAdv]A-group? } } } }

Verbal Subcategorization The subcategorization classes: verbs nosubj-verbs subj-verbs obj-verbs basic-trans empty-modal modal ssubj-inf-verbs trans indobj-verbs trans-indobj subcategorization classes bisognare camminare dovere dictionary potere need walk must can

Example subcategorization class definitions: (subj-verbs (intrans) (verbs) ; *** verbs with a subject. Definition of subject ( verb-subj ((noun (agree)) (art (agree)) (pron (not (word quale) (type relat)) (case lsubj) (agree)) (adj (type (indef demons deitt interr poss)) (agree)) (num (agree)) (prep (word in) (down (cat pron) (type indef)) (agree))))) (ssubj-inf-verbs () (verbs) ; *** verbs with an inf-verb sentential subject ( verb-subj ((verb (mood infinite) (agree))))) (empty-modal () (no-subj-verbs) ; *** modals without subject ( verb-indcompl-modal ((verb (mood infinite)))))

Example transformation: Transformations: basic class (e.g. trans) transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj))

Base Subcategorization Some statistics Chunking rules Total: 295 rules Common: 250 rules English: 34 rules Italian: 7 rules Spanish + Catalan: 4 rules Base Subcategorization Total: 118 classes Abstract: 21 classes plus verbal locutions Italian: 40 classes English: 1 class Derived surface case frames 2653 case frames

Conclusions Test of the parser on other languages, using the same grammar augmented with extra rules (see previous slide) Partial use of semantic information (about 400 words classified according to a semantic taxonomy) The parser has been used in a project involving spoken and written linguistic interaction with a user. It has been interfaced with an repository of semantic knowledge to build a meaning representation.