Universal Dependencies

Slides:



Advertisements
Similar presentations
The Rule-based Parser of the NLP Group of the University of Torino
Advertisements

Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
Chapter 4 Syntax.
Dr. Abdullah S. Al-Dobaian1 Ch. 2: Phrase Structure Syntactic Structure (basic concepts) Syntactic Structure (basic concepts)  A tree diagram marks constituents.
Statistical NLP: Lecture 3
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
MORPHOLOGY - morphemes are the building blocks that make up words.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Morphology I. Basic concepts and terms Derivational processes
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
PARTS OF SPEECH 1 The principles of the traditional classification of the English vocabulary 2 Notional and functional parts of speech. 3 The field structure.
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Chapter 4 Syntax Part II.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
Part D: multilingual dependency parsing. Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Inductive Dependency Parsing Joakim Nivre
Reasons to Study Lexicography  You love words  It can help you evaluate dictionaries  It might make you more sensitive to what dictionaries have in.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Parts of Speech (Lexical Categories). Parts of Speech n Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) n The building blocks of sentences n The.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Natural Language Processing Chapter 2 : Morphology.
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
MORPHOLOGY definition; variability among languages.
POS Tagger and Chunker for Tamil
ACTL 2008 Syntax: Introduction to LFG Peter Austin, Linguistics Department, SOAS with thanks to Kersti Börjars & Nigel Vincent.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Lecture 9: Part of Speech
contrastive linguistics
Name that syntactic category …
Annotating Urdu Corpus
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Lecture 3: Functional Phrases
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
contrastive linguistics
Constraint Grammar ESSLLI
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
A Statistical Model for Parsing Czech
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Universal Dependencies
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
TEITOK Dependency Grammar
Língua Inglesa - Aspectos Morfossintáticos
Introduction to Syntax
Natural Language Processing
Dependency Grammar & Stanford Dependencies
COMPARATIVE Linguistics 2018/2019
Hidden Markov Models Teaching Demo The University of Arizona
contrastive linguistics
contrastive linguistics
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Universal Dependencies Joakim Nivre Uppsala University

Universal Dependencies Background: Treebank annotation schemes vary across languages Hard to compare results across languages [Nivre et al. 2007] Hard to evaluate cross-lingual learning [McDonald et al. 2013] Hard to build multilingual systems Universal Dependencies (http://universaldependencies.github.io/docs/): Stanford universal dependencies [de Marneffe et al. 2014] Google universal part-of-speech tags [Petrov et al. 2012] Interset morphological features [Zeman 2008] First guidelines released Oct 1, 2014 First 10 treebanks released Jan 15, 2015

Universal Dependencies Syntactic words – explicit splitting of clitics and contractions Universal part-of-speech tags + morphological features Dependency tree + augmented dependencies (not shown)

Goals Cross-linguistically consistent grammatical annotation Support multilingual NLP and linguistic research Build on common usage and existing de-facto standards Complement – not replace – language-specific schemes Open community effort – anyone can contribute

Guiding Principles Maximize parallelism Don't annotate the same thing in different ways Don't make different things look the same Don't annotate things that are not there Languages select from a universal pool of categories Allow language-specific extensions

Design Principles Dependency Lexicalism Recoverability Widely used in practical NLP systems Available in treebanks for many languages Lexicalism Basic annotation units are words – syntactic words Words have morphological properties Words enter into syntactic relations Recoverability Transparent mapping from input text to word segmentation

Morphological Annotation Le La DET Definite=Def Gender=Masc Number=Sing chat NOUN chasse chasser VERB Mood=Ind Person=3 les le chiens chien Number=Plur . PUNCT Lemma represent the semantic content of a word Part-of-speech tag represent its grammatical class Features represent lexical and grammatical properties of the lemma or the particular word form

Syntactic Annotattion Content words are related by dependency relations Function words attach to the content word they modify Punctuation attach to head of phrase or clause

CoNLL-U Format 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root ID FORM LEMMA UPOSTAG XPOSTAG FEATS HEAD DEPREL DEPS MISC 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root 4-5 du 4 de De ADP 6 case 5 le DEP lait Lait obj SpaceAfter=no 7 . PUNCT Punct

Dependency Structure English Swedish Keeping content words as heads promotes parallelism Function words often correlate with morphology

Dependency Relations [de Marneffe et al. 2014] Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology Language specific subtypes can be added

Morphology: POS Open class words Closed class words Other ADJ ADV INTJ NOUN PROPN VERB ADP AUX CONJ DET NUM PART PRON SCONJ PUNCT SYM X Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset [Petrov et al. 2012]

Morphology: Universal Features Standardized inventory of morphological features, based on the Interset system [Zeman 2008] Lexical features Inflectional features Nominal* Verbal* PronType Gender VerbForm NumType Animacy Mood Poss Number Tense Reflex Case Aspect Foreign Definite Voice Abbr Degree Evident Polarity Person Polite

Morphology: Examples la Definite=Def|Gender=Fem|Number=Sing|PronType=Art hanno Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin fatto Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part casa Gender=Fem|Number=Sing