Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEITOK Dependency Grammar

Similar presentations


Presentation on theme: "TEITOK Dependency Grammar"— Presentation transcript:

1 TEITOK Dependency Grammar
Maarten Janssen CELGA-ILTEC, Univ. de Coimbra

2 Dependency Grammar in TEITOK
Tool for working with TEI/XML in linguistics Fully XML driven corpus environment Full use of TEI, not just converting verticalized lines to <w> Modular design Modules for very different types of corpora NLP behind the scenes Search, visualize, edit GUI for dependency-parsed corpora Parse sentences Edit parsed data Visualized parse trees Search parsed data

3 Constituency Grammar

4 Dependency Grammar loves John Mary subj obj

5 CoNLL-U Format # newdoc # newpar # sent_id = 1
# text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind| cop _ _ 3 a a DET RI Definite=Ind| det _ _ 4 test test ADJ A Degree=Pos amod _ _ 5 sentence sentence NOUN S Number=Sing root _ _ PUNCT FS _ punct _ _

6 TEITOK Online GUI for working with XML corpora
A corpus consists of a set of XML files GUI for editing heavy XML files Corpora beyond mere numbers Data necessary for non-standard corpora Normalization, facsimile images, sound, grammar, etc. Tool for “small corpora” Historical corpora Learner corpora Less Resourced Languages Spoken corpora Dialect corpora

7

8

9

10 EWE Text

11

12 Adding Dependencies (1)
<s>This is a test sentence.</s>

13 Adding Dependencies (1b)
<s>This is a test sentence.</s> <s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>

14 Full XML <l id="s-1" bbox=" " gloss="Don Alfonso of Castile"><tok form="Don" id="w-1" pt="Dom"><hi type="dropcap" n="4" rend="black">D</hi>on</tok> <tok id="w-2" nform="Affonso" form="Afonsso" pt="Alfonso">Afonſſo</tok> <tok id="w-3">de</tok> <tok id="w-4" form="Castela">Caſtela</tok></l> <l id="s-2" bbox=" " gloss="of Toledo, of León,"><tok id="w-5">de</tok> <tok id="w-6">Toledo</tok> <tok id="w-7">de</tok> <tok id="w-8" pt="Leão">Leon</tok></l> <l id="s-3" bbox=" " gloss="King, indeed, of Compostela,"><tok id="w-9" pt="Rei">Rey</tok> <tok id="w-10">e</tok> <tok id="w-11" pt="bem">ben</tok> <tok id="w-12" form="des" pt="de">deſ</tok> <tok id="w-13" nform="Conpostela" pt="Compostela">Copostela</tok></l>

15 Adding Dependencies (1)
<s> <tok id=”w-1”>This</tok> <tok id=”w-2”>is</tok> <tok id=”w-3”>a</tok> <tok id=”w-4”>test</tok> <tok id=”w-5”>sentence</tok> <tok id=”w-6”>.</tok> </s>

16 Adding Dependencies (2)
# newdoc # newpar # sent_id = 1 # text = This is a test sentence. 1 This this PRON PD Number=Sing|PronType=Dem 5 nsubj _ _ 2 is be AUX VA Mood=Ind| cop _ _ 3 a a DET RI Definite=Ind| det _ _ 4 test test ADJ A Degree=Pos amod _ _ 5 sentence sentence NOUN S Number=Sing root _ _ PUNCT FS _ punct _ _

17 Adding Dependencies (3)
<s> <tok id=”w-1” xpos=”PRON” deprel=”nsubj” head=”w-5”>This</tok> <tok id=”w-2” xpos=”AUX” deprel=”cop” head=”w-5”>is</tok> <tok id=”w-3” xpos=”DET” deprel=”det” head=”w-5”>a</tok> <tok id=”w-4” xpos=”ADJ” deprel=”amod” head=”w-5”>test</tok> <tok id=”w-5” xpos=”NOUN” deprel=”root” head=”0”>sentence</tok> <tok id=”w-6” xpos=”PUNC” deprel=”punct” head=”w-5”>.</tok> </s>

18

19

20 Searching Corpus automatically exported to CWB Customized CQP version
Searchable in many aspects Results shown as XML fragments, created from CWB results Customized CQP version TT-CQP can use dependencies in search a:[word="prata" & deprel="obj"] :: head(a).upos="VERB"; sort head(a).lemma; tabulate a[-1].word, head(a).lemma, head(a).upos

21


Download ppt "TEITOK Dependency Grammar"

Similar presentations


Ads by Google