LING/C SC 581: Advanced Computational Linguistics Lecture 23 April 9th
Today's Topics continuing from last time…
Bikel Collins Parser Java re-implementation of Collins’ parser (originally in C) easy to train (computationally inexpensive) Paper Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511. Software http://www.cis.upenn.edu/~dbikel/software.html#stat-parser (page no longer exists)
Bikel Collins Download and install Dan Bikel’s parser dbp.zip (on course homepage)
Bikel Collins Training the parser with the WSJ PTB See guide userguide/guide.pdf directory: TREEBANK_3/parsed/mrg/wsj chapters 02-21: create one single .mrg file events: wsj-02-21.obj.gz
Bikel Collins Settings:
Bikel Collins Parsing Command Input file format (sentences)
Java Runtime (JRE) Notes: JDK: Java Development Kit (superset of..) java -version java version "1.8.0_191" Java(TM) SE Runtime Environment (build 1.8.0_191-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) Notes: JDK: Java Development Kit (superset of..) JRE: Java Runtime Environment
Bikel Collins Verify the trainer and parser work on your machine: must have Java installed Let's test it: cd dbp dbp$ ls LICENSE dbparser.jar scorer telescope.lisp README doc settings userguide bin policy-files src dbp$ more telescope.lisp (I saw a man with a telescope) (I saw a man with a sword) bin/parse 500 settings/collins.properties ../wsj-02-21.obj.gz telescope.lisp Executing command \tjava -server -Xms500m -Xmx500m -cp /Users/sandiway/courses/581/ling581- 19/dbp/dbparser.jar -Dparser.settingsFile=settings/collins.properties danbikel.parser.Parser - is ../wsj-02-21.obj.gz -sa telescope.lisp
Bikel Collins processing sentence No. 1: (I saw a man with a telescope) danbikel.parser.Decoder: current sentence length: 7 words danbikel.parser.Decoder: cummulative average length: 7.0 words danbikel.parser.Decoder: trying with prune factor of 4.0 danbikel.parser.Decoder: highest probability item for sentence-length span (0,6): -35.89487016064518 (S (NP-A (NPB (PRP I))) (VP (VBD saw) (NP-A (NPB (DT a) (NN man)) (PP (IN with) (NP-A (NPB (DT a) (NN telescope))))))) danbikel.parser.Decoder: top-ranked +TOP+ item: (+TOP+ (S (NP-A (NPB (PRP I))) (VP (VBD saw) (NP-A (NPB (DT a) (NN man)) (PP (IN with) (NP-A (NPB (DT a) (NN telescope))))))))
Bikel Collins processing sentence No. 2: (I saw a man with a sword) danbikel.parser.Decoder: current sentence length: 7 words danbikel.parser.Decoder: cummulative average length: 7.0 words danbikel.parser.Decoder: trying with prune factor of 4.0 danbikel.parser.Decoder: highest probability item for sentence-length span (0,6): -35.625191959838 (S (NP-A (NPB (PRP I))) (VP (VBD saw) (NP-A (NPB (DT a) (NN man)) (PP (IN with) (NP-A (NPB (DT a) (NN sword))))))) danbikel.parser.Decoder: top-ranked +TOP+ item: (+TOP+ (S (NP-A (NPB (PRP I))) (VP (VBD saw) (NP-A (NPB (DT a) (NN man)) (PP (IN with) (NP-A (NPB (DT a) (NN sword))))))))
Bikel Collins File: bin/parse is a shell script that sets up program parameters and calls java
Bikel Collins
Bikel Collins File: bin/train is another shell script
Bikel Collins Relevant WSJ PTB files