Download presentation
Presentation is loading. Please wait.
1
Universal Dependencies
Joakim Nivre Uppsala University
2
Universal Dependencies
Background: Treebank annotation schemes vary across languages Hard to compare results across languages [Nivre et al. 2007] Hard to evaluate cross-lingual learning [McDonald et al. 2013] Hard to build multilingual systems Universal Dependencies ( Stanford universal dependencies [de Marneffe et al. 2014] Google universal part-of-speech tags [Petrov et al. 2012] Interset morphological features [Zeman 2008] First guidelines released Oct 1, 2014 First 10 treebanks released Jan 15, 2015
3
Universal Dependencies
Syntactic words – explicit splitting of clitics and contractions Universal part-of-speech tags + morphological features Dependency tree + augmented dependencies (not shown)
4
Goals Cross-linguistically consistent grammatical annotation
Support multilingual NLP and linguistic research Build on common usage and existing de-facto standards Complement – not replace – language-specific schemes Open community effort – anyone can contribute
5
Guiding Principles Maximize parallelism
Don't annotate the same thing in different ways Don't make different things look the same Don't annotate things that are not there Languages select from a universal pool of categories Allow language-specific extensions
6
Design Principles Dependency Lexicalism Recoverability
Widely used in practical NLP systems Available in treebanks for many languages Lexicalism Basic annotation units are words – syntactic words Words have morphological properties Words enter into syntactic relations Recoverability Transparent mapping from input text to word segmentation
7
Morphological Annotation
Le La DET Definite=Def Gender=Masc Number=Sing chat NOUN chasse chasser VERB Mood=Ind Person=3 les le chiens chien Number=Plur . PUNCT Lemma represent the semantic content of a word Part-of-speech tag represent its grammatical class Features represent lexical and grammatical properties of the lemma or the particular word form
8
Syntactic Annotattion
Content words are related by dependency relations Function words attach to the content word they modify Punctuation attach to head of phrase or clause
9
CoNLL-U Format 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root
ID FORM LEMMA UPOSTAG XPOSTAG FEATS HEAD DEPREL DEPS MISC 1 Le DET _ 2 det chat NOUN 3 nsubj boit boire VERB Root 4-5 du 4 de De ADP 6 case 5 le DEP lait Lait obj SpaceAfter=no 7 . PUNCT Punct
10
Dependency Structure English Swedish Keeping content words as heads promotes parallelism Function words often correlate with morphology
11
Dependency Relations [de Marneffe et al. 2014]
Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology Language specific subtypes can be added
12
Morphology: POS Open class words Closed class words Other ADJ ADV INTJ NOUN PROPN VERB ADP AUX CONJ DET NUM PART PRON SCONJ PUNCT SYM X Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset [Petrov et al. 2012]
13
Morphology: Universal Features
Standardized inventory of morphological features, based on the Interset system [Zeman 2008] Lexical features Inflectional features Nominal* Verbal* PronType Gender VerbForm NumType Animacy Mood Poss Number Tense Reflex Case Aspect Foreign Definite Voice Abbr Degree Evident Polarity Person Polite
14
Morphology: Examples la Definite=Def|Gender=Fem|Number=Sing|PronType=Art hanno Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin fatto Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part casa Gender=Fem|Number=Sing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.