LING/C SC 581: Advanced Computational Linguistics

Slides:



Advertisements
Similar presentations
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Advertisements

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 15 th.
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.
LING 581: Advanced Computational Linguistics Lecture Notes January 26th.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
Lecture 6 NLTK Tagging Topics Taggers Readings: NLTK Chapter 5 CSCE 771 Natural Language Processing.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
LING 581: Advanced Computational Linguistics Lecture Notes February 19th.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Some parse tree examples (from quiz 3) Pushpak Bhattacharyya CSE Dept., IIT Bombay 12 th.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
LING 408/508: Programming for Linguists Online Lecture 7 September 16 th.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
File Management commands cat Cat command cat cal.txt cat command displays the contents of a file here cal.txt on screen (or standard out).
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.
Development Environment Basics
COSC 6336 Natural Language Processing Statistical Parsing
CSCE 590 Web Scraping – NLTK
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
CS 388: Natural Language Processing: Statistical Parsing
LING/C SC 581: Advanced Computational Linguistics
LING 388: Computers and Language
LING 408/508: Computational Techniques for Linguists
Text Analytics Giuseppe Attardi Università di Pisa
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 388: Computers and Language
CSCE 590 Web Scraping - NLTK
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But….
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
CSCE 590 Web Scraping - NLTK
CSA2050: Introduction to Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 23rd

Today's Topics Homework 2 review

Homework 2 review Given: Write a Python program to print out the number of syllables in a word (in CMUdict). Given: from nltk.corpus import cmudict cmudict.dict()['absolutely'] [['AE2', 'B', 'S', 'AH0', 'L', 'UW1', 'T', 'L', 'IY0']] cmudict.dict()['route'] [['R', 'UW1', 'T'], ['R', 'AW1', 'T']]

Homework 2 review

Homework 3 Complete and test the installation of the Penn Treebank (version 3) (No need to submit anything)

Penn Treebank (PTB) with nltk Handed out TREEBANK_3.zip last time Put your wsj (from mrg) here ~/nltk_data/corpora/ptb Filename case problem!

Penn Treebank (PTB) with nltk Rename files to uppercase for f in `find wsj`; do mv -v "$f" "`echo $f | tr '[a-z]' '[A-Z]'`"; done (found on stackoverflow.com) seems to work but not clean directory name needs to be uppercased too!

Penn Treebank (PTB) with nltk Note: you may run into problems with file permissions when renaming: Change permissions (recursively): chmod -R u+w atis

Penn Treebank (PTB) with nltk Renaming script courtesy of Sandeep Suntwal:

Penn Treebank (PTB) with nltk Checking the install: class BracketParseCorpusReader seems to be the Brown corpus + the Wall Street Journal corpus …

Penn Treebank (PTB) with nltk WSJ only: Defined in ~/nltk_data/corpora/ptb/allcats.txt:

Penn Treebank (PTB) with nltk Validation: methods words(), tagged_words()

Penn Treebank (PTB) with nltk Validation: methods sents(), tagged_sents()

Penn Treebank (PTB) with nltk Validation: method parsed_sents()

Penn Treebank (PTB) with nltk print function and method draw() ptb.parsed_sents(categories=['news'])[0].draw()

Penn Treebank (PTB) with nltk Class nltk.tree methods s = ptb.parsed_sents(categories=['news'])[0] >>>s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] type(s) <class 'nltk.tree.Tree'>

Penn Treebank (PTB) with nltk Class nltk.tree methods s.productions() [S ->NP-SBJ VP ., NP-SBJ ->NP , ADJP ,, NP ->NNP NNP, NNP ->'Pierre', NNP ->'Vinken', , ->',', ADJP ->NP JJ, NP ->CD NNS, CD ->'61', NNS ->'years', JJ ->'old', , ->',', VP ->MD VP, MD ->'will', VP ->VB NP PP-CLR NP-TMP, VB ->'join', NP ->DT NN, DT ->'the', NN - >'board', PP-CLR ->IN NP, IN ->'as', NP ->DT JJ NN, DT ->'a', JJ ->'nonexecutive', NN - >'director', NP-TMP ->NNP CD, NNP ->'Nov.', CD ->'29', . ->'.'] s.words() not defined

Penn Treebank (PTB) with nltk Class nltk.tree methods >>>len(s) 3 >>>s[0] Tree('NP-SBJ', [Tree('NP', [Tree('NNP', ['Pierre']), Tree('NNP', ['Vinken'])]), Tree(',', [',']), Tree('ADJP', [Tree('NP', [Tree('CD', ['61']), Tree('NNS', ['years'])]), Tree('JJ', ['old'])]), Tree(',', [','])]) >>>s[1] Tree('VP', [Tree('MD', ['will']), Tree('VP', [Tree('VB', ['join']), Tree('NP', [Tree('DT', ['the']), Tree('NN', ['board'])]), Tree('PP-CLR', [Tree('IN', ['as']), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['nonexecutive']), Tree('NN', ['director'])])]), Tree('NP-TMP', [Tree('NNP', ['Nov.']), Tree('CD', ['29'])])])]) >>>s[2] Tree('.', ['.'])

Penn Treebank (PTB) with nltk Class nltk.tree methods s.label() 'S' >>>s.leaves() ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'] >>>s.flatten() Tree('S', ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']) >>>s.height() 7

Penn Treebank (PTB) with nltk Class nltk.tree methods for t in s.subtrees(): print(t) (S  (NP-SBJ    (NP (NNP Pierre) (NNP Vinken))    (, ,)    (ADJP (NP (CD 61) (NNS years)) (JJ old))    (, ,))  (VP    (MD will)    (VP      (VB join)      (NP (DT the) (NN board))      (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))      (NP-TMP (NNP Nov.) (CD 29))))  (. .)) (NP-SBJ  (NP (NNP Pierre) (NNP Vinken))  (, ,)  (ADJP (NP (CD 61) (NNS years)) (JJ old))  (, ,)) (NP (NNP Pierre) (NNP Vinken)) (NNP Pierre) (NNP Vinken) (, ,) (ADJP (NP (CD 61) (NNS years)) (JJ old)) (NP (CD 61) (NNS years)) (CD 61)) (NNS years) (JJ old) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29)))) (MD will) (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29))) (VB join) (NP (DT the) (NN board)) (DT the) (NN board) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (IN as) (NP (DT a) (JJ nonexecutive) (NN director)) (DT a) (JJ nonexecutive) (NN director) (NP-TMP (NNP Nov.) (CD 29)) (NNP Nov.) (CD 29) (. .)

Penn Treebank (PTB) with nltk Class nltk.tree methods s.pos() [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')] Source Code here: http://www.nltk.org/_modules/nltk/tree.html chomsky_normal_form() fromstring() pretty_print()

Penn Treebank (PTB) with nltk