Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Quranic Arabic Corpus Data Mining & Text Analytics By Ismail Teladia & Abdullah Alazwari.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Chapter 4 Syntax.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
The Language Model in Bulgarian Treebank (BulTreeBank) Petya Osenova (Sofia) , Prague.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Dependency Parsing Joakim Nivre. Dependency Grammar Old tradition in descriptive grammar Modern theroretical developments: –Structural syntax (Tesnière)
Statistical NLP: Lecture 3
MORPHOLOGY - morphemes are the building blocks that make up words.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
The Query Compiler Varun Sud ID: 104. Agenda Parsing  Syntax analysis and Parse Trees.  Grammar for a simple subset of SQL  Base Syntactic Categories.
DS-to-PS conversion Fei Xia University of Washington July 29,
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Syntax Lecture 4.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
SYNTAX Lecture -1 SMRITI SINGH.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 1: Overview
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
ENGLISH SYNTAX Introduction to Transformational Grammar.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
The Minimalist Program
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
SYNTAX.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
Language Language - a system for combining symbols (such as words) so that an unlimited number of meaningful statements can be made for the purpose of.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Morphology and Syntax- Week 5
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.
Grammar Grammar analysis.
CSC 594 Topics in AI – Natural Language Processing
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Natural Language Processing (NLP)
Writing Analytics Clayton Clemens Vive Kumar.
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
CS4705 Natural Language Processing
Levels of Linguistic Analysis
Linguistic Essentials
Natural Language Processing (NLP)
Syntax.
Owen Rambow 6 Minutes.
Natural Language Processing (NLP)
Presentation transcript:

Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU

MPŠ, The practice of adding interpretative, linguistic information to a corpus of spoken/written language data → Human language technologies → Linguistic research The added notations → transcriptions, part-of- speech tagging, semantic tagging, syntactic analysis, named entity recognition, anaphora resolution, etc. Corpus Annotation Syntactically annotated corpus → t reebank

MPŠ, → Way in which linguistic elements (as words) are put together to form larger units, constituents (as phrases, clauses, sentences) (morpheme) → → word → phrase → clause → sentence → (text) Syntax → Principles and rules for constructing (grammatical) sentences (with a certain meaning)

MPŠ, Roots → Panini’s grammar (Sanskrit), traditional grammar, medieval theories, Slavic linguistics, etc. Culmination: work of L. Tesnière (1959) → modern dependency grammar → Large and fairly diverse family of grammatical theories and formalisms that share certain basic assumptions about syntax Syntactic structure ↓ → Lexical elements linked by binary asymmetrical relations (dependencies, connexions) → Head/governor – dependent/subordinate → Valency Dependency Grammar (Syntax) Problems

MPŠ, Functional Generative Description ↓ Prague Dependency Treebank Multi-stratal framework → Analytical layer – surface syntactic annotation (subject, object, attribute, adverbial, coordination, etc.) → Tectogrammatical layer – deep syntactic/shallow semantic annotation → thematic roles, co- reference, topic-focus articulation (agent, patient, predicate, antecedent, etc.) FGD

MPŠ, → Dependency links are close to the semantic relationships ( → deep syntactic annotation, shallow semantic annotation) → Parsing is efficient (computationally) → Complexity of parsing – expressivity of syntactic representations ( → good compromise) → Each node is assigned one head at most (single-head constraint) → All nodes have to be connected (connectedness) → Chains of dependency links do not contain cycles (acyclicity constraint) ↓ Syntactic tree structures Dependency Parsing

MPŠ, A linguistically annotated corpus that includes some grammatical analysis beyond the part-of-speech Empirical syntactic analysis of language patterns in large quantity of naturally occurring texts Treebank

MPŠ, Syntactic Annotation (Models) Complexity of the annotation system → Chunking, skeletal, shallow parsing → Full parsing Human vs. no human rule creation → Rule-based parsing (obsolete?) → Stochastic, data-driven parsing ↓ Robustness

MPŠ, Syntactic Annotation (Types) Grammatical theories and formalisms/types of syntactic information → Dependency models Asymmetric binary relations (connexions) Governor – dependent(s) Functional analysis Inflectionally rich languages with free word order → Phrase structure/constituent models Hierarchically embedded subparts (constituents) Part – whole relations Structural analysis Languages with fixed word order, clear constituency structures → Hybrid models

MPŠ, Slovene Dependency Treebank Problem → complexity of the theoretical framework SDT → Dependency treebank of Slovene written texts → Modeled after the Prague Dependency Treebank → Surface syntactic annotation → Two subcorpora (1984, SVEZ-IJS) → 2800 sentences, words → Experiments in inductive parsing → Freely available for research use

Lingvisti č ni krožek, SDT: Syntactic Tree Structure I

MPŠ, SDT: Syntactic Tree Structure II

Linearity Three types of connexions → green, red, blue Connexions → intuitive names Arrows Connectedness Root Sentence → the maximal unit of parsing

MPŠ, JOS: Syntactic Tagset Automatic annotation → robust linguistic units with clearly defined boundaries Manageable tagset → (SDT: >100), JOS: 10 Combining of the data: MSD + syntactic tags + etc. PpnmetnSometnDmSommm

MPŠ, First Level Tags “Phrase structure connexions” (Green) Dol “attr” Del “part” Prir “coord” Vez “conj” Skup “together”

MPŠ, Second Level Tags “Functional connexions” (Red) Ena “one” Dve “two” Tri “three” Štiri “four”

MPŠ, Third Level Tags “Residual” (Blue) Modra “blue”

Thank you!