LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 23rd
Today's Topics Homework 2 review
Homework 2 review Given: Write a Python program to print out the number of syllables in a word (in CMUdict). Given: from nltk.corpus import cmudict cmudict.dict()['absolutely'] [['AE2', 'B', 'S', 'AH0', 'L', 'UW1', 'T', 'L', 'IY0']] cmudict.dict()['route'] [['R', 'UW1', 'T'], ['R', 'AW1', 'T']]
Homework 2 review
Homework 3 Complete and test the installation of the Penn Treebank (version 3) (No need to submit anything)
Penn Treebank (PTB) with nltk Handed out TREEBANK_3.zip last time Put your wsj (from mrg) here ~/nltk_data/corpora/ptb Filename case problem!
Penn Treebank (PTB) with nltk Rename files to uppercase for f in `find wsj`; do mv -v "$f" "`echo $f | tr '[a-z]' '[A-Z]'`"; done (found on stackoverflow.com) seems to work but not clean directory name needs to be uppercased too!
Penn Treebank (PTB) with nltk Note: you may run into problems with file permissions when renaming: Change permissions (recursively): chmod -R u+w atis
Penn Treebank (PTB) with nltk Renaming script courtesy of Sandeep Suntwal:
Penn Treebank (PTB) with nltk Checking the install: class BracketParseCorpusReader seems to be the Brown corpus + the Wall Street Journal corpus …
Penn Treebank (PTB) with nltk WSJ only: Defined in ~/nltk_data/corpora/ptb/allcats.txt:
Penn Treebank (PTB) with nltk Validation: methods words(), tagged_words()
Penn Treebank (PTB) with nltk Validation: methods sents(), tagged_sents()
Penn Treebank (PTB) with nltk Validation: method parsed_sents()
Penn Treebank (PTB) with nltk print function and method draw() ptb.parsed_sents(categories=['news'])[0].draw()
Penn Treebank (PTB) with nltk Class nltk.tree methods s = ptb.parsed_sents(categories=['news'])[0] >>>s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] type(s) <class 'nltk.tree.Tree'>
Penn Treebank (PTB) with nltk Class nltk.tree methods s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] s.words() not defined
Penn Treebank (PTB) with nltk Class nltk.tree methods >>>len(s) 3 >>>s[0] Tree('NP-SBJ', [Tree('NP', [Tree('NNP', ['Pierre']), Tree('NNP', ['Vinken'])]), Tree(',', [',']), Tree('ADJP', [Tree('NP', [Tree('CD', ['61']), Tree('NNS', ['years'])]), Tree('JJ', ['old'])]), Tree(',', [','])]) >>>s[1] Tree('VP', [Tree('MD', ['will']), Tree('VP', [Tree('VB', ['join']), Tree('NP', [Tree('DT', ['the']), Tree('NN', ['board'])]), Tree('PP-CLR', [Tree('IN', ['as']), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['nonexecutive']), Tree('NN', ['director'])])]), Tree('NP-TMP', [Tree('NNP', ['Nov.']), Tree('CD', ['29'])])])]) >>>s[2] Tree('.', ['.'])
Penn Treebank (PTB) with nltk Class nltk.tree methods s.label() 'S' >>>s.leaves() ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'] >>>s.flatten() Tree('S', ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']) >>>s.height() 7
Penn Treebank (PTB) with nltk Class nltk.tree methods for t in s.subtrees(): print(t) (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken)) (, ,) (ADJP (NP (CD 61) (NNS years)) (JJ old)) (, ,)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29)))) (. .)) (NP-SBJ (NP (NNP Pierre) (NNP Vinken)) (, ,) (ADJP (NP (CD 61) (NNS years)) (JJ old)) (, ,)) (NP (NNP Pierre) (NNP Vinken)) (NNP Pierre) (NNP Vinken) (, ,) (ADJP (NP (CD 61) (NNS years)) (JJ old)) (NP (CD 61) (NNS years)) (CD 61)) (NNS years) (JJ old) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29)))) (MD will) (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29))) (VB join) (NP (DT the) (NN board)) (DT the) (NN board) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (IN as) (NP (DT a) (JJ nonexecutive) (NN director)) (DT a) (JJ nonexecutive) (NN director) (NP-TMP (NNP Nov.) (CD 29)) (NNP Nov.) (CD 29) (. .)
Penn Treebank (PTB) with nltk Class nltk.tree methods s.pos() [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')] Source Code here: http://www.nltk.org/_modules/nltk/tree.html chomsky_normal_form() fromstring() pretty_print()
Penn Treebank (PTB) with nltk