Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013.

Similar presentations


Presentation on theme: "Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013."— Presentation transcript:

1 Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013

2 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Nederbooms project Exploitation of Dutch treebanks for research in linguistics

3 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Nederbooms project Exploitation of Dutch treebanks for research in linguistics Goals o User-friendly tools o Access to large data files o Fast and accurate

4 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Treebank = syntactically annotated corpus e.g. Penn Treebank (English), TüBa (German), LASSY, CGN (Dutch)

5 TREEBANKS

6 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query engine for treebanks Treebank = syntactically annotated corpus e.g. Penn Treebank (English), TüBa (German), LASSY, CGN (Dutch) Parser e.g. Alpino (Van Noord 2006)

7 ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.”

8 ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.” XML trees Query language: XPath

9 XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

10 XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

11 XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

12 XPATH

13 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query treebanks by example

14 GrETEL Greedy Extraction of Trees for Empirical Linguistics Query treebanks by example First version => only for LASSY treebank New release => GrETEL for CGN treebank => update based on user reviews

15 GrETEL Example sentence Indicate relevant items of the sentence (Adapt XPath) Select treebank Inspect results Parser (Alpino) Automatically generate XPath expression Present results the user

16 OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work

17 CASE STUDY Verbs with fixed preposition o E.g. Hij keek met een bang hartje naar de heks. ‘he was looking at the witch with a heavy heart.’ o VERB + (…+) PREP LASSY: Xpath query //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

18 CASE STUDY Verbs with fixed preposition o E.g. Hij keek naar de heks. ‘he was looking at the witch.’ Discontinuous constructions! o E.g. Hij keek met een bang hartje naar de heks. ‘he was looking at the witch with a heavy heart.’ o VERB + (…+) PREP

19 GrETEL ONLINE

20 INPUT

21 ANNOTATION MATRIX

22 ANNOTATION GUIDELINES

23 XPATH GENERATOR

24 Other treebank, other format … Hij keek met een bank hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

25 Other treebank, other format … Hij keek met een bang hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

26 TREEBANK SELECTION

27 RESULTS Verb plus fixed preposition o E.g. Hij keek naar de heks. ‘A number of trees fell down.’ o VERB + (…+) PREP  4004 matches in 3881 sentences

28 RESULTS: table

29 RESULTS: data

30 RESULTS: trees

31 OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work

32 SEARCH OPTIONS  Below annotation matrix

33 SEARCH OPTIONS Green versus red word order in Dutch o green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft o red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)

34 SEARCH OPTIONS

35

36

37

38

39 OUTLINE GrETEL in a nutshell GrETEL demo o Case study o Search options Conclusions and future work

40 CONCLUSIONS GrETEL: search engine for Dutch treebanks Input = natural language example Output = sample of similar sentences Syntactic concordancer Available online (via Mozilla Firefox) No installation required

41 FUTURE WORK GrETEL 2.0 o Include SoNaR corpus (ca 500M tokens) o More generic AfriBooms o GrETEL for Afrikaans o Include other treebank formats

42 CASE STUDY Collective noun constructions o E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’ o DET + NOUN + PLURAL NOUN Discontinuous constructions! o E.g. Een groot aantal oude bomen zijn omgevallen. ‘A large number of old trees fell down.’

43 Thanks for your attention! Try it yourself at http://nederbooms.ccl.kuleuven.be/eng/gretel

44 Waaraan vs Waar … aan Waar denk je aan ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results) Waar bemoei je je mee? Wanneer gaat een koortsstuip over in epilepsie ?

45 Waaraan denk je ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results) Waarom werken we ? Waartoe verbind ik mij als ouder door dit formulier in te vullen ? Vanwaar die gulle hand van een Turkse overheid die in de schulden zwemt ?

46 Hij klom de boom in //node[@cat="top" and node[@rel="--" and @cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results) Door haar winst komt Clijsters de top-20 binnen. In feite ging minder dan de helft van Dorsets de rivier over. Nederland gaat de bezettingstijd in.


Download ppt "Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013."

Similar presentations


Ads by Google