Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING/C SC 581: Advanced Computational Linguistics

Similar presentations


Presentation on theme: "LING/C SC 581: Advanced Computational Linguistics"— Presentation transcript:

1 LING/C SC 581: Advanced Computational Linguistics
Lecture Notes Jan 23rd

2 Today's Topics Homework 2 review

3 Homework 2 review Given:
Write a Python program to print out the number of syllables in a word (in CMUdict). Given: from nltk.corpus import cmudict cmudict.dict()['absolutely'] [['AE2', 'B', 'S', 'AH0', 'L', 'UW1', 'T', 'L', 'IY0']] cmudict.dict()['route'] [['R', 'UW1', 'T'], ['R', 'AW1', 'T']]

4 Homework 2 review

5 Homework 3 Complete and test the installation of the Penn Treebank (version 3) (No need to submit anything)

6 Penn Treebank (PTB) with nltk
Handed out TREEBANK_3.zip last time Put your wsj (from mrg) here ~/nltk_data/corpora/ptb Filename case problem!

7 Penn Treebank (PTB) with nltk
Rename files to uppercase for f in `find wsj`; do mv -v "$f" "`echo $f | tr '[a-z]' '[A-Z]'`"; done (found on stackoverflow.com) seems to work but not clean directory name needs to be uppercased too!

8 Penn Treebank (PTB) with nltk
Note: you may run into problems with file permissions when renaming: Change permissions (recursively): chmod -R u+w atis

9 Penn Treebank (PTB) with nltk
Renaming script courtesy of Sandeep Suntwal:

10 Penn Treebank (PTB) with nltk
Checking the install: class BracketParseCorpusReader seems to be the Brown corpus + the Wall Street Journal corpus

11 Penn Treebank (PTB) with nltk
WSJ only: Defined in ~/nltk_data/corpora/ptb/allcats.txt:

12 Penn Treebank (PTB) with nltk
Validation: methods words(), tagged_words()

13 Penn Treebank (PTB) with nltk
Validation: methods sents(), tagged_sents()

14 Penn Treebank (PTB) with nltk
Validation: method parsed_sents()

15 Penn Treebank (PTB) with nltk
print function and method draw() ptb.parsed_sents(categories=['news'])[0].draw()

16 Penn Treebank (PTB) with nltk
Class nltk.tree methods s = ptb.parsed_sents(categories=['news'])[0] >>>s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] type(s) <class 'nltk.tree.Tree'>

17 Penn Treebank (PTB) with nltk
Class nltk.tree methods s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] s.words() not defined

18 Penn Treebank (PTB) with nltk
Class nltk.tree methods >>>len(s) 3 >>>s[0] Tree('NP-SBJ', [Tree('NP', [Tree('NNP', ['Pierre']), Tree('NNP', ['Vinken'])]), Tree(',', [',']), Tree('ADJP', [Tree('NP', [Tree('CD', ['61']), Tree('NNS', ['years'])]), Tree('JJ', ['old'])]), Tree(',', [','])]) >>>s[1] Tree('VP', [Tree('MD', ['will']), Tree('VP', [Tree('VB', ['join']), Tree('NP', [Tree('DT', ['the']), Tree('NN', ['board'])]), Tree('PP-CLR', [Tree('IN', ['as']), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['nonexecutive']), Tree('NN', ['director'])])]), Tree('NP-TMP', [Tree('NNP', ['Nov.']), Tree('CD', ['29'])])])]) >>>s[2] Tree('.', ['.'])

19 Penn Treebank (PTB) with nltk
Class nltk.tree methods s.label() 'S' >>>s.leaves() ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'] >>>s.flatten() Tree('S', ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']) >>>s.height() 7

20 Penn Treebank (PTB) with nltk
Class nltk.tree methods for t in s.subtrees(): print(t) (S  (NP-SBJ    (NP (NNP Pierre) (NNP Vinken))    (, ,)    (ADJP (NP (CD 61) (NNS years)) (JJ old))    (, ,))  (VP    (MD will)    (VP      (VB join)      (NP (DT the) (NN board))      (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))      (NP-TMP (NNP Nov.) (CD 29))))  (. .)) (NP-SBJ  (NP (NNP Pierre) (NNP Vinken))  (, ,)  (ADJP (NP (CD 61) (NNS years)) (JJ old))  (, ,)) (NP (NNP Pierre) (NNP Vinken)) (NNP Pierre) (NNP Vinken) (, ,) (ADJP (NP (CD 61) (NNS years)) (JJ old)) (NP (CD 61) (NNS years)) (CD 61)) (NNS years) (JJ old) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29)))) (MD will) (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29))) (VB join) (NP (DT the) (NN board)) (DT the) (NN board) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (IN as) (NP (DT a) (JJ nonexecutive) (NN director)) (DT a) (JJ nonexecutive) (NN director) (NP-TMP (NNP Nov.) (CD 29)) (NNP Nov.) (CD 29) (. .)

21 Penn Treebank (PTB) with nltk
Class nltk.tree methods s.pos() [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')] Source Code here: chomsky_normal_form() fromstring() pretty_print()

22 Penn Treebank (PTB) with nltk


Download ppt "LING/C SC 581: Advanced Computational Linguistics"

Similar presentations


Ads by Google