Download presentation
Presentation is loading. Please wait.
1
TEITOK Dependency Grammar
Maarten Janssen CELGA-ILTEC, Univ. de Coimbra
2
Dependency Grammar in TEITOK
Tool for working with TEI/XML in linguistics Fully XML driven corpus environment Full use of TEI, not just converting verticalized lines to <w> Modular design Modules for very different types of corpora NLP behind the scenes Search, visualize, edit GUI for dependency-parsed corpora Parse sentences Edit parsed data Visualized parse trees Search parsed data
3
Constituency Grammar
4
Dependency Grammar loves John Mary subj obj
5
CoNLL-U Format # newdoc # newpar # sent_id = 1
# text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind| cop _ _ 3 a a DET RI Definite=Ind| det _ _ 4 test test ADJ A Degree=Pos amod _ _ 5 sentence sentence NOUN S Number=Sing root _ _ PUNCT FS _ punct _ _
6
TEITOK Online GUI for working with XML corpora
A corpus consists of a set of XML files GUI for editing heavy XML files Corpora beyond mere numbers Data necessary for non-standard corpora Normalization, facsimile images, sound, grammar, etc. Tool for “small corpora” Historical corpora Learner corpora Less Resourced Languages Spoken corpora Dialect corpora
10
EWE Text
12
Adding Dependencies (1)
<s>This is a test sentence.</s>
13
Adding Dependencies (1b)
<s>This is a test sentence.</s> <s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>
14
Full XML <l id="s-1" bbox=" " gloss="Don Alfonso of Castile"><tok form="Don" id="w-1" pt="Dom"><hi type="dropcap" n="4" rend="black">D</hi>on</tok> <tok id="w-2" nform="Affonso" form="Afonsso" pt="Alfonso">Afonſſo</tok> <tok id="w-3">de</tok> <tok id="w-4" form="Castela">Caſtela</tok></l> <l id="s-2" bbox=" " gloss="of Toledo, of León,"><tok id="w-5">de</tok> <tok id="w-6">Toledo</tok> <tok id="w-7">de</tok> <tok id="w-8" pt="Leão">Leon</tok></l> <l id="s-3" bbox=" " gloss="King, indeed, of Compostela,"><tok id="w-9" pt="Rei">Rey</tok> <tok id="w-10">e</tok> <tok id="w-11" pt="bem">ben</tok> <tok id="w-12" form="des" pt="de">deſ</tok> <tok id="w-13" nform="Conpostela" pt="Compostela">Copostela</tok></l>
15
Adding Dependencies (1)
<s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>
16
Adding Dependencies (2)
# newdoc # newpar # sent_id = 1 # text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind| cop _ _ 3 a a DET RI Definite=Ind| det _ _ 4 test test ADJ A Degree=Pos amod _ _ 5 sentence sentence NOUN S Number=Sing root _ _ PUNCT FS _ punct _ _
17
Adding Dependencies (3)
<s> <tok id=”w-1” xpos=”PRON” deprel=”nsubj” head=”w-5”>This</tok> <tok id=”w-2” xpos=”AUX” deprel=”cop” head=”w-5”>is</tok> <tok id=”w-3” xpos=”DET” deprel=”det” head=”w-5”>a</tok> <tok id=”w-4” xpos=”ADJ” deprel=”amod” head=”w-5”>test</tok> <tok id=”w-5” xpos=”NOUN” deprel=”root” head=”0”>sentence</tok> <tok id=”w-6” xpos=”PUNC” deprel=”punct” head=”w-5”>.</tok> </s>
20
Searching Corpus automatically exported to CWB Customized CQP version
Searchable in many aspects Results shown as XML fragments, created from CWB results Customized CQP version TT-CQP can use dependencies in search a:[word="prata" & deprel="obj"] :: head(a).upos="VERB"; sort head(a).lemma; tabulate a[-1].word, head(a).lemma, head(a).upos
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.