Natural Language Processing

Slides:



Advertisements
Similar presentations
Chapter 4 Syntax Part IV.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
Introduction to Natural Language Processing A.k.a., “Computational Linguistics”
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Syntax: The Sentence Patterns of Language
Statistical NLP: Lecture 3
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
Introduction to Semantics and Pragmatics. LING NLP 2 NLP tends to focus on: Syntax – Grammars, parsers, parse trees, dependency structures.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Leksička semantika i pragmatika 1. predavanje. Introduction to Semantics and Pragmatics.
Leksička semantika i pragmatika 2. predavanje. Semantics -predicate arguments break(AGENT, INSTRUMENT, PATIENT) AGENT PATIENT INSTRUMENT John broke the.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 Words and the Lexicon September 10th 2009 Lecture #3.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Artificial Intelligence 2005/06 From Syntax to Semantics.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Context Free Grammar S -> NP VP NP -> det (adj) N
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Constituency Tests Phrase Structure Rules
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
LI 2013 NATHALIE F. MARTIN S YNTAX. Grammatical vs Ungrammatical.
1.Syntax: the rules of sentence formation; the component of the mental grammar that represent speakers’ knowledge of the structure of phrase and sentence.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
Today Phrase structure rules, trees Constituents Recursion Conjunction
1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Artificial Intelligence 2004
Dec 11, Human Parsing Do people use probabilities for parsing?! Sentence processing Study of Human Parsing.
CSE391 – 2005 NLP 1 Events From KRR lecture. CSE391 – 2005 NLP 2 Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Natural Language Processing Slides adapted from Pedro Domingos
English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Natural Language Processing Vasile Rus
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Syntax.
CSC 594 Topics in AI – Applied Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Linguistic Essentials
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Natural Language Processing Machine Translation Predicate argument structures Syntactic parses Lexical Semantics Probabilistic Parsing Ambiguities in sentence interpretation CSE391 – 2005 NLP

One of the first applications for computers Machine Translation One of the first applications for computers bilingual dictionary > word-word translation Good translation requires understanding! War and Peace, The Sound and The Fury? What can we do? Sublanguages. technical domains, static vocabulary Meteo in Canada, Caterpillar Tractor Manuals, Botanical descriptions, Military Messages CSE391 – 2005 NLP

Example translation CSE391 – 2005 NLP

Translation Issues: Korean to English - Word order - Dropped arguments - Lexical ambiguities - Structure vs morphology CSE391 – 2005 NLP

Common Thread Predicate-argument structure Constituents Relations Basic constituents of the sentence and how they are related to each other Constituents John, Mary, the dog, pleasure, the store. Relations Loves, feeds, go, to, bring CSE391 – 2005 NLP

Abstracting away from surface structure CSE391 – 2005 NLP

Transfer lexicons CSE391 – 2005 NLP

Machine Translation Lexical Choice- Word Sense Disambiguation Iraq lost the battle. Ilakuka centwey ciessta. [Iraq ] [battle] [lost]. John lost his computer. John-i computer-lul ilepelyessta. [John] [computer] [misplaced]. CSE391 – 2005 NLP

Natural Language Processing Syntax Grammars, parsers, parse trees, dependency structures Semantics Subcategorization frames, semantic classes, ontologies, formal semantics Pragmatics Pronouns, reference resolution, discourse models CSE391 – 2005 NLP

Syntactic Categories Nouns, pronouns, Proper nouns Verbs, intransitive verbs, transitive verbs, ditransitive verbs (subcategorization frames) Modifiers, Adjectives, Adverbs Prepositions Conjunctions CSE391 – 2005 NLP

Syntactic Parsing The cat sat on the mat. Det Noun Verb Prep Det Noun Time flies like an arrow. Noun Verb Prep Det Noun Fruit flies like a banana. Noun Noun Verb Det Noun CSE391 – 2005 NLP

Parses The cat sat on the mat S NP VP Det PP N V the cat sat NP Prep N CSE391 – 2005 NLP

Parses Time flies like an arrow. S NP VP N time V PP flies Prep NP Det N arrow an CSE391 – 2005 NLP

Parses Time flies like an arrow. S NP VP N time V NP N like flies N Also [you] time flies like an arrows, with “time” as a verb. N time V NP N like flies N Det arrow an CSE391 – 2005 NLP

Recursive transition nets for CFGs NP VP S S1 S2 S3 pp det noun NP S4 S5 adj S6 noun pronoun s :- np,vp. np:- pronoun; noun; det,adj, noun; np,pp. CSE391 – 2005 NLP

Lexicon noun(cat). noun(mat). det(the). det(a). verb(sat). prep(on). noun(flies). noun(time). noun(arrow). det(an). verb(flies). verb(time). prep(like). anything missing? noun([fly|T],T), verb([fly|T],T). verb([sit|T],T). CSE391 – 2005 NLP

Lexicon with Roots noun(flies,fly). noun(cat,cat). noun(time,time). noun(arrow,arrow). det(an,an). verb(flies,fly). verb(time,time). prep(like,like). noun(cat,cat). noun(mat,mat). det(the,the) det(a,a). verb(sat,sit). prep(on,on). anything missing? noun([fly|T],T), verb([fly|T],T). verb([sit|T],T). CSE391 – 2005 NLP

Parses The old can can hold the water. S NP VP det NP aux the V can Do lexicon on board! det NP aux the V can adj N hold can old det N the water CSE391 – 2005 NLP

Structural ambiguities That factory can can tuna. That factory cans cans of tuna and salmon. Have the students in cse91 finish the exam in 212. Have the students in cse91 finished the exam in 212? CSE391 – 2005 NLP

Lexicon The old can can hold the water. Noun(can,can) Noun(cans,can) Noun(water,water) Noun(hold,hold) Noun(holds,hold) Det(the,the) Verb(hold,hold) Verb(holds,hold) Aux(can,can) Adj(old,old) anything missing? noun([fly|T],T), verb([fly|T],T). verb([sit|T],T). CSE391 – 2005 NLP

Simple Context Free Grammar in BNF notation S → NP VP NP → Pronoun | Noun | Det Adj Noun |NP PP PP → Prep NP V → Verb | Aux Verb VP → V | V NP | V NP NP | V NP PP | VP PP CSE391 – 2005 NLP

Top-down parse in progress [The, old, can, can, hold, the, water] S → NP VP NP → NP? NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the ADJ?old Noun? Can Succeed. VP? CSE391 – 2005 NLP

Top-down parse in progress [can, hold, the, water] VP → VP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold succeed fail [the, water] CSE391 – 2005 NLP

Top-down parse in progress [can, hold, the, water] VP → VP NP V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the ADJ? fail CSE391 – 2005 NLP

Lexicon Verb(hold,hold) Noun(can,can) Verb(holds,hold) Noun(cans,can) Aux(can,can) Adj(old,old) Adj( , ) Noun(can,can) Noun(cans,can) Noun(water,water) Noun(hold,hold) Noun(holds,hold) Det(the,the) anything missing? noun([fly|T],T), verb([fly|T],T). verb([sit|T],T). CSE391 – 2005 NLP

Top-down parse in progress [can, hold, the, water] VP → V NP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the ADJ? Noun? water SUCCEED CSE391 – 2005 NLP

Lexicon Verb(hold,hold) Noun(can,can) Verb(holds,hold) Noun(cans,can) Aux(can,can) Adj(old,old) Adj( , ) Noun(can,can) Noun(cans,can) Noun(water,water) Noun(hold,hold) Noun(holds,hold) Det(the,the) Noun(old,old) anything missing? noun([fly|T],T), verb([fly|T],T). verb([sit|T],T). CSE391 – 2005 NLP

Top-down approach Start with goal of sentence S → NP VP S → Wh-word Aux NP VP Will try to find an NP 4 different ways before trying a parse where the verb comes first. What does this remind you of? search What would be better? CSE391 – 2005 NLP

Bottom-up approach Start with words in sentence. What structures do they correspond to? Once a structure is built, kept on a CHART. CSE391 – 2005 NLP

Bottom-up parse in progress det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun. CSE391 – 2005 NLP

Bottom-up parse in progress det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun. CSE391 – 2005 NLP

Bottom-up parse in progress det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun. CSE391 – 2005 NLP

Top-down vs. Bottom-up Helps with POS ambiguities – only consider relevant POS Rebuilds the same structure repeatedly Spends a lot of time on impossible parses Has to consider every POS Builds each structure once Spends a lot of time on useless structures What would be better? CSE391 – 2005 NLP

Hybrid approach Top-down with a chart Use look ahead and heuristics to pick most likely sentence type Use probabilities for pos tagging, pp attachments, etc. CSE391 – 2005 NLP

Features C for Case, Subjective/Objective She visited her. P for Person agreement, (1st, 2nd, 3rd) I like him, You like him, He likes him, N for Number agreement, Subject/Verb He likes him, They like him. G for Gender agreement, Subject/Verb English, reflexive pronouns He washed himself. Romance languages, det/noun T for Tense, auxiliaries, sentential complements, etc. * will finished is bad CSE391 – 2005 NLP

Example Lexicon Entries Using Features: Case, Number, Gender, Person pronoun(subj, sing, fem, third, she, she). pronoun(obj, sing, fem, third, her, her). pronoun(obj, Num, Gender, second, you, you). pronoun(subj, sing, Gender, first, I, I). noun(Case, plural, Gender, third, flies,fly). CSE391 – 2005 NLP

Language to Logic How do we get there? John went to the book store.  John  store1, go(John, store1) John bought a book. buy(John,book1) John gave the book to Mary. give(John,book1,Mary) Mary put the book on the table. put(Mary,book1,table1) CSE391 – 2005 NLP

Lexical Semantics Same event - different sentences John broke the window with a hammer. John broke the window with the crack. The hammer broke the window. The window broke. CSE391 – 2005 NLP

Same event - different syntactic frames John broke the window with a hammer. SUBJ VERB OBJ MODIFIER John broke the window with the crack. The hammer broke the window. SUBJ VERB OBJ The window broke. SUBJ VERB CSE391 – 2005 NLP

Semantics -predicate arguments break(AGENT, INSTRUMENT, PATIENT) AGENT PATIENT INSTRUMENT John broke the window with a hammer. INSTRUMENT PATIENT The hammer broke the window. PATIENT The window broke. Fillmore 68 - The case for case CSE391 – 2005 NLP

John broke the window with a hammer. SUBJ OBJ MODIFIER AGENT PATIENT INSTRUMENT John broke the window with a hammer. SUBJ OBJ MODIFIER INSTRUMENT PATIENT The hammer broke the window. SUBJ OBJ PATIENT The window broke. SUBJ CSE391 – 2005 NLP

Constraint Satisfaction break (Agent: animate, Instrument: tool, Patient: physical-object) Agent <=> subj Instrument <=> subj, with-pp Patient <=> obj, subj ACL81,ACL85,ACL86,MT90,CUP90,AIJ93 CSE391 – 2005 NLP

Syntax/semantics interaction Parsers will produce syntactically valid parses for semantically anomalous sentences Lexical semantics can be used to rule them out CSE391 – 2005 NLP

Constraint Satisfaction give (Agent: animate, Patient: physical-object Recipient: animate) Agent <=> subj Patient <=> object Recipient <=> indirect-object, to-pp CSE391 – 2005 NLP

Subcategorization Frequencies The women kept the dogs on the beach. Where keep? Keep on beach 95% NP XP 81% Which dogs? Dogs on beach 5% NP 19% The women discussed the dogs on the beach. Where discuss? Discuss on beach 10% NP PP 24% Which dogs? Dogs on beach 90% NP 76% CSE391 – 2005 Ford, Bresnan, Kaplan 82, Jurafsky 98, Roland,Jurafsky 99 NLP

Reading times NP-bias (slower times to bold word) The waiter confirmed the reservation was made yesterday. The defendant accepted the verdict would be decided soon. CSE391 – 2005 NLP

Reading times S-bias (no slower times to bold word) The waiter insisted the reservation was made yesterday. The defendant wished the verdict would be decided soon. Trueswell, Tanenhaus and Kello, 93 Trueswell and Kim 98 CSE391 – 2005 NLP

Probabilistic Context Free Grammars Adding probabilities Lexicalizing the probabilities CSE391 – 2005 NLP

Simple Context Free Grammar in BNF S → NP VP NP → Pronoun | Noun | Det Adj Noun |NP PP PP → Prep NP V → Verb | Aux Verb VP → V | V NP | V NP NP | V NP PP | VP PP CSE391 – 2005 NLP

Simple Probabilistic CFG S → NP VP NP → Pronoun [0.10] | Noun [0.20] | Det Adj Noun [0.50] |NP PP [0.20] PP → Prep NP [1.00] V → Verb [0.20] | Aux Verb [0.20] VP → V [0.10] | V NP [0.40] | V NP NP [0.10] | V NP PP [0.20] | VP PP [0.20] CSE391 – 2005 NLP

Simple Probabilistic Lexicalized CFG S → NP VP NP → Pronoun [0.10] | Noun [0.20] | Det Adj Noun [0.50] |NP PP [0.20] PP → Prep NP [1.00] V → Verb [0.20] | Aux Verb [0.20] VP → V [0.87] {sleep, cry, laugh} | V NP [0.03] | V NP NP [0.00] | V NP PP [0.00] | VP PP [0.10] CSE391 – 2005 NLP

Simple Probabilistic Lexicalized CFG VP → V [0.30] | V NP [0.60] {break,split,crack..} | V NP NP [0.00] | V NP PP [0.00] | VP PP [0.10] VP → V [0.10] what about | V NP [0.40] leave? | V NP NP [0.10] leave1, leave2? | V NP PP [0.20] | VP PP [0.20] CSE391 – 2005 NLP

A TreeBanked Sentence S (S (NP-SBJ Analysts) (VP have (VP been (VP expecting (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) (VP would (VP give (NP the U.S. car maker) (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) VP have VP been VP NP-SBJ Analyst s expecting NP Here is an example from the Wall Street Journal Treebank II. The sentence is in the light blue box on the bottom left corner, the syntactic structure as a parse tree is in the middle, the actual annotation is in the light gray box. that SBAR WHNP-1 a GM-Jaguar pact NP S VP NP-SBJ VP give *T*-1 would NP PP-LOC Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company. the US car maker NP an eventual 30% stake NP the British company NP in CSE391 – 2005 NLP

The same sentence, PropBanked have been expecting (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Arg0 Arg1 Analyst s a GM-Jaguar pact Here the same annotation is still in the gray box, with the argument labels added. The tree represents the dependency structure, which gives rise to the predicates in the light blue box. Notice that the trace links back to the GM-Jaguar Pact. Notice also that it could just as easily have said, “a GM-Jaguar pact that would give an eventual…stake to the US car maker.” where it would be “ ARG0: a GM-Jaguar pact that would give an ARG1: eventual…stake to ARG2: the US car maker.” This works in exactly the same way for Chinese and Korean as it works for English (and presumably will for Arabic as well.) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) CSE391 – 2005 NLP

Headlines Police Begin Campaign To Run Down Jaywalkers Iraqi Head Seeks Arms Teacher Strikes Idle Kids Miners Refuse To Work After Death Juvenile Court To Try Shooting Defendant CSE391 – 2005 NLP

Events From KRR lecture CSE391 – 2005 NLP

Context Sensitivity Programming languages are Context Free Natural languages are Context Sensitive? Movement Features respectively John, Mary and Bill ate peaches, pears and apples, respectively. CSE391 – 2005 NLP

The Chomsky Grammar Hierarchy Regular grammars, aabbbb S → aS | nil | bS Context free grammars, aaabbb S → aSb | nil Context sensitive grammars, aaabbbccc xSy → xby Turing Machines CSE391 – 2005 NLP

Recursive transition nets for CFGs NP VP S S1 S2 S3 pp det noun NP S4 S5 adj S6 noun pronoun s :- np,vp. np:- pronoun; noun; det,adj, noun; np,pp. CSE391 – 2005 NLP

Most parsers are Turing Machines To give a more natural and comprehensible treatment of movement For a more efficient treatment of features Not because of respectively – most parsers can’t handle it. CSE391 – 2005 NLP

Nested Dependencies and Crossing Dependencies CF The dog chased the cat that bit the mouse that ran. CF The mouse the cat the dog chased bit ran. CS John, Mary and Bill ate peaches, pears and apples, respectively CSE391 – 2005 NLP

Movement What did John give to Mary? *Where did John give to Mary? John gave cookies to Mary. John gave <what> to Mary. CSE391 – 2005 NLP

Handling Movement: Hold registers/Slash Categories S :- Wh, S/NP S/NP :- VP S/NP :- NP VP/NP VP/NP :- Verb CSE391 – 2005 NLP