TEITOK Dependency Grammar

Slides:



Advertisements
Similar presentations
IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.
Advertisements

Demonstration of the Microcomparative Morphosyntactic Research Tool MIMORE Sjef Barbiers, Matthijs Brouwer, Jan Pieter Kunst, Folkert de Vriend Meertens.
Natural Language Processing Syntax. Syntactic structure John likes Mary PN VtVt NP VP S DetPNVtVt NP VP S Every man likes Mary Noun.
Alternative FILE formats
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Natural Language ToolKit ( What is nltk? A tool which allows you to do NLP stuff such as Finding similar words in context, POS tagging etc.
LemGen (Linguistic EMulation and Generation ENgine) CS491 Project Chris Lemcke.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Kirrkirr: a Bidirectional Warlpiri- English Dictionary Kristen Parton.
Context Free Grammar S -> NP VP NP -> det (adj) N
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
ELN – Natural Language Processing Giuseppe Attardi
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Introduction technology XSL. 04/11/2005 Script of the presentation Introduction the XSL The XSL standard Tools for edition of codes XSL Necessary resources.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
WHAT IS LINGUISTICS?. LINGUISTICS IS THE SCIENTIFIC STUDY OF HUMAN NATURAL LANGUAGE.
French Emblem books from the 16 th Century What is an emblem? 27 books in French and Latin Most held at GUL Special Collections Website offers facsimile.
Confidential, I.R.I.S. © 2005, All rights reserved I.R.I.S. new OCR Software suite: A full range for document conversion, for private and corporate users.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Grammar Bell Work Mrs. Malic 8 th Grade. John Silver, long you’ve been a mate of mine. Noun advpron/v v adj noun prep pron.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
NLP. Introduction to NLP Background –Developed by Jay Earley in 1970 –No need to convert the grammar to CNF –Left to right Complexity –Faster than O(n.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
GL15 Grey Literature Bratislava 2-3 december 2013 Industrial Philology: problems and techniques of data and archives preservation for future generations.
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
BRAT: a web based tool for manual annotation Hans Paulussen ITEC, KU Leuven KULAK.
Welcome to the flashcards tool for ‘The Study of Language, 5 th edition’, Chapter 8 This is designed as a simple supplementary resource for this textbook,
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Basic Parsing with Context Free Grammars Chapter 13
Spoken Meadow Mari Corpus: Data, Design, and Aims
Constraint Grammar ESSLLI
Databases.
--Mengxue Zhang, Qingyang Li
Text Analytics Giuseppe Attardi Università di Pisa
Universal Dependencies
CS 388: Natural Language Processing: Syntactic Parsing
TaBle-driven LL(1) Parsing
Thinking about grammars
TaBle-driven LL(1) Parsing
Topics in Linguistics ENG 331
Experience with XML – based production of publications Case of « Statistical yearbook 2005 and 2006  » Guy Zacharias Centralisation et Diffusion STATEC.
Universal Dependencies
Natural Language Processing
Extracting Recipes from Chemical Academic Papers
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Dependency Grammar & Stanford Dependencies
Chapter 13 Adding Slide Transitions
Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
David Kauchak CS159 – Spring 2019
Dependency parsing spaCy and Stanford nndep
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Thinking about grammars
Building an annotated Corpus
Artificial Intelligence 2004 Speech & Natural Language Processing
NLP.
Presentation transcript:

TEITOK Dependency Grammar Maarten Janssen CELGA-ILTEC, Univ. de Coimbra

Dependency Grammar in TEITOK Tool for working with TEI/XML in linguistics Fully XML driven corpus environment Full use of TEI, not just converting verticalized lines to <w> Modular design Modules for very different types of corpora NLP behind the scenes Search, visualize, edit GUI for dependency-parsed corpora Parse sentences Edit parsed data Visualized parse trees Search parsed data

Constituency Grammar

Dependency Grammar loves John Mary subj obj

CoNLL-U Format # newdoc # newpar # sent_id = 1 # text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind|... 5 cop _ _ 3 a a DET RI Definite=Ind|.. 5 det _ _ 4 test test ADJ A Degree=Pos 5 amod _ _ 5 sentence sentence NOUN S Number=Sing 0 root _ _ 6 . . PUNCT FS _ 5 punct _ _

TEITOK Online GUI for working with XML corpora A corpus consists of a set of XML files GUI for editing heavy XML files Corpora beyond mere numbers Data necessary for non-standard corpora Normalization, facsimile images, sound, grammar, etc. Tool for “small corpora” Historical corpora Learner corpora Less Resourced Languages Spoken corpora Dialect corpora

EWE Text

Adding Dependencies (1) <s>This is a test sentence.</s>

Adding Dependencies (1b) <s>This is a test sentence.</s> <s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>

Full XML <l id="s-1" bbox="57 195 759 275" gloss="Don Alfonso of Castile"><tok form="Don" id="w-1" pt="Dom"><hi type="dropcap" n="4" rend="black">D</hi>on</tok> <tok id="w-2" nform="Affonso" form="Afonsso" pt="Alfonso">Afonſſo</tok> <tok id="w-3">de</tok> <tok id="w-4" form="Castela">Caſtela</tok></l> <l id="s-2" bbox="57 265 759 345" gloss="of Toledo, of León,"><tok id="w-5">de</tok> <tok id="w-6">Toledo</tok> <tok id="w-7">de</tok> <tok id="w-8" pt="Leão">Leon</tok></l> <l id="s-3" bbox="57 335 775 415" gloss="King, indeed, of Compostela,"><tok id="w-9" pt="Rei">Rey</tok> <tok id="w-10">e</tok> <tok id="w-11" pt="bem">ben</tok> <tok id="w-12" form="des" pt="de">deſ</tok> <tok id="w-13" nform="Conpostela" pt="Compostela">Copostela</tok></l>

Adding Dependencies (1) <s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>

Adding Dependencies (2) # newdoc # newpar # sent_id = 1 # text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind|... 5 cop _ _ 3 a a DET RI Definite=Ind|.. 5 det _ _ 4 test test ADJ A Degree=Pos 5 amod _ _ 5 sentence sentence NOUN S Number=Sing 0 root _ _ 6 . . PUNCT FS _ 5 punct _ _

Adding Dependencies (3) <s> <tok id=”w-1” xpos=”PRON” deprel=”nsubj” head=”w-5”>This</tok> <tok id=”w-2” xpos=”AUX” deprel=”cop” head=”w-5”>is</tok> <tok id=”w-3” xpos=”DET” deprel=”det” head=”w-5”>a</tok> <tok id=”w-4” xpos=”ADJ” deprel=”amod” head=”w-5”>test</tok> <tok id=”w-5” xpos=”NOUN” deprel=”root” head=”0”>sentence</tok> <tok id=”w-6” xpos=”PUNC” deprel=”punct” head=”w-5”>.</tok> </s>

Searching Corpus automatically exported to CWB Customized CQP version Searchable in many aspects Results shown as XML fragments, created from CWB results Customized CQP version TT-CQP can use dependencies in search a:[word="prata" & deprel="obj"] :: head(a).upos="VERB"; sort head(a).lemma; tabulate a[-1].word, head(a).lemma, head(a).upos