LING 581: Advanced Computational Linguistics Lecture Notes February 9th
tregex Pattern matching for passives: using variable names and regex group numbering for coindexation matching for passives (NP-SBJ-i and object of VP [NP [ –NONE- [ -*-I ]]])
Homework Task Report Bracketing guide – TREEBANK_3/docs/prsguid1.pdf Pattern matching for selected constructions in – wsj tregex.mrg
Bikel Collins From treebanks search to stochastic parsers trained on the WSJ Penn treebank Java re-implementation of Collins’ parser Paper – Daniel M. Bikel Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp PS) (PDF) in Computational Linguistics, 30(4), pp – intricacies.pdf Software – parser
Bikel Collins some TCL/TK code (I wrote for research use) makes it easy to work the parser without memorizing the command line options some TCL/TK code (I wrote for research use) makes it easy to work the parser without memorizing the command line options
Bikel Collins The wrapper is syntactic sugar for various commands Scripting language is TCL/TK (“tickle T K”) Assume variables – set prefix "/Users/sandiway/research/" – set dbprefix "$prefix/dbparser" – set tbvprefix "/Applications/treebankviewer.app/Contents/MacOS" POS tagging (MXPOST, in directory jmx) – $prefix/jmx/mxpost $prefix/jmx/tagger.project /tmp/err.txt Parsing – $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt stdout Training – $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg stdout
Bikel Collins POS tagging (MXPOST, in directory jmx) – tagger_input – $prefix/jmx/mxpost $prefix/jmx/tagger.project /tmp/err.txt Parsing – set ddf "wsj obj.gz” – set properties "collins.properties" – parser_input – $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt stdout Training – set mrg "wsj mrg” – set properties "collins.properties" – $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg stdout Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile }
Bikel Collins There’s also a simple tree viewer I wrote but it may not run on your system…
Bikel Collins Relevant files and directories bikeldemo – wrapper2.tcl(prefix set to /Users/sandiway) jmx – mxpost(shell script) – mxpost.jar(Java code) dbparser – dbparser/bin/parse(shell script) – dbparser/bin/train(shell script) – dbparser/dbparser.jar(Java code) – dbparser/userguide/guide.pdf