LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong
Today's Topics Natural language parsing: syntactic analysis Homeworks 11 and 12
Natural Language Parsing Syntax trees are a big deal in NLP Reminder: reading homework: JM: chapter 5, sections 1 and 2 chapter 12 Stanford Parser / Berkeley Parser (Context-Free grammars: type-2) http://nlp.stanford.edu:8080/parser/index.jsp http://tomato.banatao.berkeley.edu:8080/parser/parser.html Uses probabilistic rules learnt from a Treebank corpus Output: syntax trees diagrams (also dependency graph: Stanford) We do a lot with Treebanks in the follow-on course to this one (LING 581, Spring)
Natural Language Parsing A new generation of "deep learning" parsers (last two years): Google Cloud Natural Language (aka syntaxnet) UDPipe Output: dependency parses (only) https://cloud.google.com/natural-language/
Training Data Penn Treebank: parsed by human annotators Efforts by the Hong Kong Futures Exchange to introduce a new interest-rate futures contract continue to hit snags despite the support the proposed instrument enjoys in the colony’s financial community. (WSJ section)
Natural Language Parsing
Natural Language Parsing Comparison between human parse and machine parse: empty categories not recovered by parsing, otherwise a good match!
Natural Language Parsing
Part of Speech (POS) JM Chapter 5 Parts of speech Classic eight parts of speech: e.g. englishclub.com => traced back to Latin scholars, back further to ancient Greek (Thrax) not everyone agrees on what they are .. The textbook lists: open class 4 (noun, verbs, adjectives, adverbs) closed class 7 (prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals) or what the subclasses are e.g. what is a Proper Noun? Saturday, April Textbook answer below …
Part of Speech (POS) Getting POS information about a word dictionary In computational linguistics, the Penn Treebank tagset is the most commonly used tagset (reprinted inside the front cover of your textbook) Getting POS information about a word dictionary pronunciation: e.g. are you conTENT with the CONtent of the slide? possible n-gram sequences e.g. *pronoun << common noun the << common noun structure of the sentence/phrase (Syntax) possible inflectional endings: e.g. V-s/-ed/-en/-ing e.g. N-s 45 tags listed in textbook 36 POS + 10 punctuation Task: POS tagging
Part of Speech (POS) http://faculty.washington.edu/dillon/GramResources/penntable.html NNP NNPS
Part of Speech (POS) PRP PRP$
Part of Speech (POS)
Part of Speech (POS) Stanford parser: walk noun/verb Disambiguation: Syntax Bigram sequence: *PRP << NN DT << NN
Part of Speech (POS) Word sense disambiguation (WSD) is more than POS tagging: different senses of the noun bank
Syntax Words combine recursively with one another into phrases (aka constituents) usually when two words combine, one word will head the phrase e.g [VB/VBP eat] [NN chocolate] e.g [VB/VBP eat] [DT some][NN chocolate] projects Warning: terminology and parses in computational linguistics not necessarily the same as those used in theoretical linguistics object projects
Syntax Words combine recursively with one another into phrases (aka constituents) e.g. [PRP we][VB/VBP eat] [NN chocolate] e.g. [TO to][VB/VBP eat] [NN chocolate] subject
Syntax Words combine recursively with one another into phrases (aka constituents) e.g. [NNP John][VBD noticed][IN/DT/WDT that][PRP we][VB/VBP eat] [NN chocolate] selects/subcategorizes for CP projects projects preposition complementizer (C)
Syntax How about a SBAR node? Words combine recursively with one another into phrases (aka constituents) How about a SBAR node? PRO cf. John wanted me to eat chocolate
Syntax Words combine recursively with one another into phrases (aka constituents) John noticed that we eat chocolate John noticed we eat chocolate
Homework 11 Question 1: write a Prolog CFG for the following sentences: John ate (sensibly) (intransitive eat) I fish (intransitive fish) I ate fish (transitive eat) Bill ate rice Harry ate roast beef Note: you can use lowercase names… (or quotes, e.g. 'John') Note: use Penn Treebank tagset for words (see inside the cover of your textbook, or Stanford Parser) nnp(prp(i)) --> [i]. nnp(nnp(john)) --> [john]. vbd(vbd(ate)) --> [ate]. Your grammar should produce one parse tree per example Your grammar should not contain infinite loops Use ; (for more answers) to show your code obeys the aforementioned constraints Submit your grammar and examples of runs
Homework 11
Homework 11 Question 2: expand your grammar to handle these sentences: I ate fish, and Bill ate rice *I ate fish, Bill ate rice I ate fish, Bill ate rice, and Harry ate roast beef Note: the comma can be a quoted terminal, e.g. [','] comma(comma(',')) --> [',']. ','(','(',')) --> [',']. Note: be careful of left recursion on S (Stanford Parser)
Homework 12 Mandatory for 538; Extra Credit for 438. From Ross (1970), English exhibits (forward) gapping: I ate fish, Bill rice, and Harry roast beef cf. I ate fish, Bill ate rice, and Harry ate roast beef Forwards only (cf. Japanese: backwards): I ate fish, Bill ate rice, and Harry roast beef *I fish, Bill rice, and Harry ate roast beef *I fish, Bill ate rice, and Harry ate roast beef *I fish, Bill ate rice, and Harry roast beef Parallelism requirement: *I ate fish, Bill, and Harry roast beef *I ate fish, Bill rice, and Harry
Homework 12 Gapping: I ate fish, Bill rice, and Harry roast beef (not as gapping) I ate fish, Bill rice, and roast beef I ate fish, rice, and Harry roast beef (you don't have to handle these two) Update your grammar in Homework 11 to handle gapping HInt 1: use an extra argument to represent and spread the elided verb Hint 2: can insert Prolog code into rules e.g. {nonvar(V)}, {var(V)}, or {A=B}
Homework 12
Homework 12
Homeworks 11 and 12 Homework 11 due next Monday Homework 12 due next Wednesday Submit two files with each homework PDF writeup Your .pl file (code)