Universal Dependencies

Slides:



Advertisements
Similar presentations
The CLARIN INFRASTRUCTURE Jan Odijk MA Rotation Utrecht,
Advertisements

 adj (adjectif)  adv (adverbe)  det (déterminant)  nom  prep (préposition)  pron (pronom)  verbe.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Chapter 4 Syntax.
Introduction to Syntax Owen Rambow September 30.
Syntax: The Sentence Patterns of Language
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Noun. Noun - verb noun Noun - verb article- adj. - adj. - Noun - verb.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Syntax: The Sentence Patterns of Language Deny A. Kwary Airlangga University.
What is Syntax?  The rules that govern the structure of utterances; also called grammar  The basic organization of sentences is around syntax  build.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 11 Syntax 2.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
PARTS OF SPEECH 1 The principles of the traditional classification of the English vocabulary 2 Notional and functional parts of speech. 3 The field structure.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Phonetics, Phonology, Morphology and Syntax
Chapter 4 Syntax Part II.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
Part D: multilingual dependency parsing. Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual.
Corpus Lingustics 2013, Lancaster University, July 25th 2013 Digital corpora and other electronic resources for Maltese Albert Gatt Institute of Linguistics,
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
Today Phrase structure rules, trees Constituents Recursion Conjunction
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Head-driven Phrase Structure Grammar (HPSG)
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
CPSC 503 Computational Linguistics
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Innovation at Jockey Club Sarah Roe School The use of core vocabulary in alternative and augmented communication (AAC) and language learning Presented.
General characteristics As any other part of speech, the noun can be characterized by three criteria:  Semantic (the meaning)  Morphological (the form.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Computational lexicology, morphology and syntax
Lecture 9: Part of Speech
Name that syntactic category …
Annotating Urdu Corpus
Lecture 7 Summary Survey of English morphology
Syntax 1.
Basic Parsing with Context Free Grammars Chapter 13
Authorship Attribution Using Probabilistic Context-Free Grammars
Sample of Tagger Accuracy Testing
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
What is Syntax?  The rules that govern the structure of utterances; also called grammar  The basic organization of sentences is around syntax  build.
Universal Dependencies
Communicative Language Teaching
Tagging and Statistically Translating Latin Sentences
Parts of the speech and abbreviations
Guide for the Development of Language Education Policies in Europe
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
CPSC 503 Computational Linguistics
GUIDELINES FOR DESIGNING AND PRODUCING SLIDES
Introduction to Syntax
Natural Language Processing
Dependency Grammar & Stanford Dependencies
Hidden Markov Models Teaching Demo The University of Arizona
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Universal Dependencies Joakim Nivre Uppsala University Linguistics and Philology

Universal Dependencies Background: Treebank annotation schemes vary across languages Hard to compare results across languages [Nivre et al. 2007] Hard to evaluate cross-lingual learning [McDonald et al. 2013] Hard to build multilingual systems Universal Dependencies (http://universaldependencies.github.io/docs/): Stanford universal dependencies [de Marneffe et al. 2014] Google universal part-of-speech tags [Petrov et al. 2012] Interset morphological features [Zeman 2008] First guidelines released Oct 1, 2014 First 10 treebanks released Jan 15, 2015

Universal Dependencies Syntactic words – explicit splitting of clitics and contractions Universal part-of-speech tags + morphological features Dependency tree + augmented dependencies (not shown)

Guiding Principles Maximize parallelism Don't annotate the same thing in different ways Don't make different things look the same But don't overdo it Don't annotate things that are not there Languages select from a universal pool of categories Allow language-specific extensions

Dependency Structure Keeping content words as heads promotes parallelism Function words often correlate with morphology

Dependency Relations [de Marneffe et al. 2014] Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology Language specific subtypes can be added

Morphology Open class words Closed class words Other ADJ ADV INTJ NOUN PROPN VERB ADP AUX CONJ DET NUM PART PRON SCONJ PUNCT SYM X Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset [Petrov et al. 2012] Standardized inventory of morphological features, based on the Interset system [Zeman 2008]