Download presentation
Presentation is loading. Please wait.
1
Jan Odijk LREC Miyazaki 2018-05-10
GrETEL 4 Jan Odijk LREC Miyazaki
2
Overview GrETEL 1,2,3 GrETEL 4 Illustration
Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration
3
GrETEL 1,2,3 GrETEL: KU Leuven
Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying
4
GrETEL 1,2,3 Treebanks: LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: V2: V3:
5
GrETEL 4 GrETEL 4: UU Utrecht
In CLARIAH and UU-internal AnnCor project New functionality that KU Leuven could not add: Upload a user’s own corpus incl. metadata Search in the user’s own automatically parsed corpus Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!)
6
Illustration Upload Corpus
Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger knor knor, ik heb honger
7
Corpus Upload
8
Corpus Overview
9
Corpus Details
10
Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willen doen
11
Example Sentence
12
Parse Tree
13
Select Parts
14
Query Tree
15
Select Treebank
16
Query and and and and and
17
Example: Query Output
18
Utterance Details
19
Result Statistics
20
Analysis
21
Some Results 3 verbs: 2 verbs: 335 hits found
313 by adults, 12 by child 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by child as of month 23 (1;11).
22
Concluding remarks GrETEL is a very user-friendly search engine
Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results
23
Concluding remarks User-friendliness Automatic parsing
Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! Even if it is still under development
24
Thanks for your attention
25
More information http://portal.clarin.nl, http://www.clariah.nl
Recorded lecture on GrETEL: Educational Package: Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 269–280. London: Ubiquity Press. DOI: License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. Odijk & Van Hessen (eds.) CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.