Jan Odijk LREC Miyazaki 2018-05-10 GrETEL 4 Jan Odijk LREC Miyazaki 2018-05-10
Overview GrETEL 1,2,3 GrETEL 4 Illustration Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration
GrETEL 1,2,3 GrETEL: KU Leuven Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying
GrETEL 1,2,3 Treebanks: LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: http://nederbooms.ccl.kuleuven.be/eng/gretel/ V2: http://gretel.ccl.kuleuven.be/gretel-2.0/ V3: http://gretel.ccl.kuleuven.be/gretel3/index.php
GrETEL 4 GrETEL 4: UU Utrecht In CLARIAH and UU-internal AnnCor project New functionality that KU Leuven could not add: Upload a user’s own corpus incl. metadata Search in the user’s own automatically parsed corpus Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!) http://gretel.hum.uu.nl/gretel4/
Illustration Upload Corpus Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger knor knor, ik heb honger
Corpus Upload
Corpus Overview
Corpus Details
Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willen doen
Example Sentence
Parse Tree
Select Parts
Query Tree
Select Treebank
Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"] and node[@rel="vc" and @cat="inf" and node[@pt="ww" and @rel="hd"]]]]
Example: Query Output
Utterance Details
Result Statistics
Analysis
Some Results 3 verbs: 2 verbs: 335 hits found 313 by adults, 12 by child 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by child as of month 23 (1;11).
Concluding remarks GrETEL is a very user-friendly search engine Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results
Concluding remarks User-friendliness Automatic parsing Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! http://gretel.hum.uu.nl/gretel4/index.php Even if it is still under development
Thanks for your attention
More information http://portal.clarin.nl, http://www.clariah.nl Recorded lecture on GrETEL: http://lecturenet.uu.nl/Site1/Catalog/Full/c9f887bc45154af5bd7cdb218216816621 Educational Package: http://dev.clarin.nl/sites/default/files/EducationalModule-v4b.pdf Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) 2017. CLARIN in the Low Countries, Pp. 269–280. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.22 License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. http://aclweb.org/anthology/W/W17/W17-7608.pdf Odijk & Van Hessen (eds.) 2017. CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI: http://dx.doi.org/10.5334/bbi