Download presentation
Presentation is loading. Please wait.
Published byBeau Goodheart Modified over 10 years ago
1
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28
2
NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG) Goals: –User-friendly tools –Access to large data files
3
NEDERBOOMS How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?
4
QUERYING LASSY Existing search tools: dtsearch (Kloosterman 2007) Dact (de Kok 2010) stand-alone tools query language: XPath = standard query language for xml trees
5
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
6
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"] “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken” //node[node[@rel="hd" and @pos="verb" and @begin <../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin <../node[@rel="hd" and @pos="verb"]/@begin]]]
7
QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort
8
QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort How to make interaction between computational linguistics and theoretical linguistics possible?
9
GrETEL Greedy Extraction of Trees for Empirical Linguistics Search tool based on example sentences Input = natural language No explicit knowledge of formal query language nor Alpino grammar required Bridge gap between descriptive and computational linguistics Available online http://nederbooms.ccl.kuleuven.be (optimised for Mozilla Firefox)http://nederbooms.ccl.kuleuven.be
10
GrETEL - online
12
GrETEL - input Green versus red word order in Dutch –green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft –red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
13
GrETEL - input
14
>> parsed with Alpino
15
GrETEL - annotation
16
>> info added to Alpino parse
17
GrETEL – query info
18
input example
19
GrETEL – query info input example Alpino parse
20
GrETEL – query info input example query treeAlpino parse
21
GrETEL – query info input example query treeAlpino parse XPath query
22
GrETEL – query info input example query treeAlpino parse XPath query treebanks
23
GrETEL – query tree
24
>> subtree extraction
25
GrETEL – query tree query tree
26
GrETEL – query tree query tree>> XPath generator >>
27
GrETEL – query tree query tree //node[@cat="ssub" and node[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]] XPath expression>> XPath generator >>
28
GrETEL – treebanks
29
LASSY Small in PostgreSQL database >> no local installation required
30
GrETEL – results
31
>> (adapted) query
32
GrETEL – results >> (adapted) query >> quantitative information
33
GrETEL – results
34
>> tree viewer
35
GrETEL – results >> tree viewer >> list of results
36
Thanks for your attention! Questions? liesbeth@ccl.kuleuven.be http://nederbooms.ccl.kuleuven.be
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.