LING/C SC 581: Advanced Computational Linguistics Lecture 5 Jan 24th
2019 HLT Lecture Series Speaker Title Date Tatjana Scheffler Analyzing Discourse Structure on Social Media Friday Feb 15th, 3pm, Comm 311. Marcos Zampieri Language Variation and Automatic Language Identification. The Case of Dialects and Similar Languages. Wednesday Feb 20th, noon, room TBA Adriana Picoral Investigating Multilingualism through Computational Linguistics. Wednesday Feb 27th, noon, room TBA Gus Hahn-Powell TBA Wednesday Mar 13th, noon, room TBA Miikka Silfverberg Deep Learning for inflectional morphology and phonology Wednesday Mar 20th, noon, room TBA
Administrivia Homework 3 graded Google Cloud Natural Language is pretty good: 85% of you said so… Remark: -None- is a POS tag it indicates an empty category and its contents should not be part of the input sentence to Google!
Last Time Note on Homework 4 install and test it (no need to report)
Last Time Penn Treebank (PTB) with nltk ~/nltk_data/corpora/ptb
Quick Homework 5 In all cases show your work, i.e. how you derived your number Using: from nltk.corpus import ptb ptb.parsed_sents(categories=['news']) 49208 sentences (Wall Street Journal) .productions() Q1: how many phrase structure rules are there? Q2: what are the top 5 unary branching rules? Q3: what are the top 5 binary branching rules? Q4: what are the top 5 > binary branching rules?
Quick Homework 5
Quick Homework 5 Hints: maybe use… Due date: nested list comprehension to extract all production rules Counter .most_common() Due date: by the end of next week one PDF file!