Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan Odijk LREC Miyazaki

Similar presentations


Presentation on theme: "Jan Odijk LREC Miyazaki"— Presentation transcript:

1 Jan Odijk LREC Miyazaki 2018-05-10
GrETEL 4 Jan Odijk LREC Miyazaki

2 Overview GrETEL 1,2,3 GrETEL 4 Illustration
Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration

3 GrETEL 1,2,3 GrETEL: KU Leuven
Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying

4 GrETEL 1,2,3 Treebanks: LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: V2: V3:

5 GrETEL 4 GrETEL 4: UU Utrecht
In CLARIAH and UU-internal AnnCor project New functionality that KU Leuven could not add: Upload a user’s own corpus incl. metadata Search in the user’s own automatically parsed corpus Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!)

6 Illustration Upload Corpus
Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger  knor knor, ik heb honger

7 Corpus Upload

8 Corpus Overview

9 Corpus Details

10 Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willen doen

11 Example Sentence

12 Parse Tree

13 Select Parts

14 Query Tree

15 Select Treebank

16 Query and and and and and

17 Example: Query Output

18 Utterance Details

19 Result Statistics

20 Analysis

21 Some Results 3 verbs: 2 verbs: 335 hits found
313 by adults, 12 by child 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by child as of month 23 (1;11).

22 Concluding remarks GrETEL is a very user-friendly search engine
Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results

23 Concluding remarks User-friendliness Automatic parsing
Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! Even if it is still under development

24 Thanks for your attention

25 More information http://portal.clarin.nl, http://www.clariah.nl
Recorded lecture on GrETEL: Educational Package: Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 269–280. London: Ubiquity Press. DOI: License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. Odijk & Van Hessen (eds.) CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI:


Download ppt "Jan Odijk LREC Miyazaki"

Similar presentations


Ads by Google