Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia,
NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG) Goals: –User-friendly tools –Access to large data files
NEDERBOOMS How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?
QUERYING LASSY Existing search tools: dtsearch (Kloosterman 2007) Dact (de Kok 2010) stand-alone tools query language: XPath = standard query language for xml trees
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” and and
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” and and “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken” and and and and
QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort
QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort How to make interaction between computational linguistics and theoretical linguistics possible?
GrETEL Greedy Extraction of Trees for Empirical Linguistics Search tool based on example sentences Input = natural language No explicit knowledge of formal query language nor Alpino grammar required Bridge gap between descriptive and computational linguistics Available online (optimised for Mozilla Firefox)
GrETEL - online
GrETEL - input Green versus red word order in Dutch –green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft –red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
GrETEL - input
>> parsed with Alpino
GrETEL - annotation
>> info added to Alpino parse
GrETEL – query info
input example
GrETEL – query info input example Alpino parse
GrETEL – query info input example query treeAlpino parse
GrETEL – query info input example query treeAlpino parse XPath query
GrETEL – query info input example query treeAlpino parse XPath query treebanks
GrETEL – query tree
>> subtree extraction
GrETEL – query tree query tree
GrETEL – query tree query tree>> XPath generator >>
GrETEL – query tree query tree and and and XPath expression>> XPath generator >>
GrETEL – treebanks
LASSY Small in PostgreSQL database >> no local installation required
GrETEL – results
>> (adapted) query
GrETEL – results >> (adapted) query >> quantitative information
GrETEL – results
>> tree viewer
GrETEL – results >> tree viewer >> list of results
Thanks for your attention! Questions?