Download presentation
Presentation is loading. Please wait.
Published byKristen Castillon Modified over 10 years ago
1
Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013
2
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Nederbooms project Exploitation of Dutch treebanks for research in linguistics
3
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Nederbooms project Exploitation of Dutch treebanks for research in linguistics Goals o User-friendly tools o Access to large data files o Fast and accurate
4
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Treebank = syntactically annotated corpus e.g. Penn Treebank (English), TüBa (German), LASSY, CGN (Dutch)
5
TREEBANKS
6
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Treebank = syntactically annotated corpus e.g. Penn Treebank (English), TüBa (German), LASSY, CGN (Dutch) Parser e.g. Alpino (Van Noord 2006)
7
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.”
8
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.” XML trees Query language: XPath
9
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
10
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
11
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
12
XPATH
13
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query treebanks by example
14
GrETEL Greedy Extraction of Trees for Empirical Linguistics Query treebanks by example First version => only for LASSY treebank New release => GrETEL for CGN treebank => update based on user reviews
15
GrETEL Example sentence Indicate relevant items of the sentence (Adapt XPath) Select treebank Inspect results Parser (Alpino) Automatically generate XPath expression Present results the user
16
OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work
17
CASE STUDY Verbs with fixed preposition o E.g. Hij keek met een bang hartje naar de heks. ‘he was looking at the witch with a heavy heart.’ o VERB + (…+) PREP LASSY: Xpath query //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
18
CASE STUDY Verbs with fixed preposition o E.g. Hij keek naar de heks. ‘he was looking at the witch.’ Discontinuous constructions! o E.g. Hij keek met een bang hartje naar de heks. ‘he was looking at the witch with a heavy heart.’ o VERB + (…+) PREP
19
GrETEL ONLINE
20
INPUT
21
ANNOTATION MATRIX
22
ANNOTATION GUIDELINES
23
XPATH GENERATOR
24
Other treebank, other format … Hij keek met een bank hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
25
Other treebank, other format … Hij keek met een bang hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
26
TREEBANK SELECTION
27
RESULTS Verb plus fixed preposition o E.g. Hij keek naar de heks. ‘A number of trees fell down.’ o VERB + (…+) PREP 4004 matches in 3881 sentences
28
RESULTS: table
29
RESULTS: data
30
RESULTS: trees
31
OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work
32
SEARCH OPTIONS Below annotation matrix
33
SEARCH OPTIONS Green versus red word order in Dutch o green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft o red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
34
SEARCH OPTIONS
39
OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work
40
CONCLUSIONS GrETEL: search engine for Dutch treebanks Input = natural language example Output = sample of similar sentences Syntactic concordancer Available online (via Mozilla Firefox) No installation required
41
FUTURE WORK GrETEL 2.0 o Include SoNaR corpus (ca 500M tokens) o More generic AfriBooms o GrETEL for Afrikaans o Include other treebank formats
42
CASE STUDY Collective noun constructions o E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’ o DET + NOUN + PLURAL NOUN Discontinuous constructions! o E.g. Een groot aantal oude bomen zijn omgevallen. ‘A large number of old trees fell down.’
43
Thanks for your attention! Try it yourself at http://nederbooms.ccl.kuleuven.be/eng/gretel
44
Waaraan vs Waar … aan Waar denk je aan ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results) Waar bemoei je je mee? Wanneer gaat een koortsstuip over in epilepsie ?
45
Waaraan denk je ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results) Waarom werken we ? Waartoe verbind ik mij als ouder door dit formulier in te vullen ? Vanwaar die gulle hand van een Turkse overheid die in de schulden zwemt ?
46
Hij klom de boom in //node[@cat="top" and node[@rel="--" and @cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results) Door haar winst komt Clijsters de top-20 binnen. In feite ging minder dan de helft van Dorsets de rivier over. Nederland gaat de bezettingstijd in.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.