LING 581: Advanced Computational Linguistics Lecture Notes April 6th.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
LING 581: Advanced Computational Linguistics Lecture Notes March 26th.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ICE1341 Programming Languages Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
LING 388: Language and Computers Sandiway Fong Lecture 24.
CS21 Decidability and Tractability
Languages, grammars, and regular expressions
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
Normal forms for Context-Free Grammars
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Three Generative grammars
1 13. LANGUAGE AND COMPLEXITY 2007 년 11 월 03 일 인공지능연구실 한기덕 Text: Speech and Language Processing Page.477 ~ 498.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
LING 581: Advanced Computational Linguistics Lecture Notes April 2nd.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Pushdown Automata (PDAs)
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Lecture 16: Modeling Computation Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output.
Formal Languages and Grammars
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Spring
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Context-Free and Noncontext-Free Languages Chapter 13.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Natural Language Processing Vasile Rus
Child Syntax and Morphology
Chapter 3 – Describing Syntax
Automata and Languages What do these have in common?
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Formal Language Theory
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CS 388: Natural Language Processing: Syntactic Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Corpus based Enrichment of GermaNet Verb Frames
Intro to Data Structures
CSE 311: Foundations of Computing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COMPILER CONSTRUCTION
Presentation transcript:

LING 581: Advanced Computational Linguistics Lecture Notes April 6th

Homework Task: use Wordnet to determine which words best match which definitions: TWO QUIZZES ONLY! – (Quiz 64 from Word Smart for the GRE) WordDefinition 1.veritablemake ineffectivea. 2.vigilantannoyanceb. 3.verisimilitudecharacterize harshlyc. 4.vitiatestickyd. 5.vilifywatchfule. 6.vexationappearing truef. 7.virulentextremely harmfulg. 8.viscousauthentich.

Homework – (Quiz 65 from Word Smart for the GRE) WordDefinition 1.zealousbe in turmoila. 2.waftravenousb. 3.waverbe unsettled in opinion c. 4.vituperatecensure severelyd. 5.wendimpassionede. 6.weltergof. 7.voraciouslight breezeg. 8.volatilechangeableh.

Homework You can have another week. Please try to use the bfs code WordNet::Similarity is another possibility – recall star and telescope?

WordNet::Similarity Online version – Perl Module:

WordNet::Similarity Perl Module: –

Other Lexical Resources Framenet –

Predicate-Argument Structure Example: – John ate the sandwich – eat: predicate – eat has two arguments: eater, something that is eaten – eater = John – something to be eaten = the sandwich A Possible Representation – in Prolog term-like notation – eat(, ) – eat(john,sandwich) Linguists generally try to choose more general labels for the arguments – less verb-specific – eat(, ) – someone/something who performs some action – undergoes change of state etc. – eat(, ) – something applies to this argument but doesn’t undergo change of state

Predicate-Argument Structure It can be difficult to precisely specify the meaning of the arguments via thematic labels of this sort Here is a list of the major thematic relations. Agent : –deliberately performs the action –(e.g. Bill ate his soup quietly) Experiencer : –receives sensory or emotional input –(e.g. The smell of lilies filled Jennifer's nostrils ). Theme : –undergoes the action but does not change its state –(Sometimes used interchangeably with patient) –(e.g. Bill kissed Mary ). Patient: –undergoes the action and has its state changed –(Sometimes used interchangeably with theme) –(e.g. The falling rocks crushed the car ) Instrument : –used to carry out the action –(e.g. Jamie cut the ribbon with a pair of scissors ). Natural Cause : –mindlessly performs the action –(e.g. An avalanche destroyed the ancient temple). Location : –where the action occurs –(e.g. Johnny and Linda played carelessly in the park ). Goal : –what the action is directed towards –(e.g. The caravan continued on toward the distant oasis ). Recipient : –a special kind of goal associated with verbs expressing a change in ownership, possession. –(e.g I sent John the letter) Source : –where the action originated –(e.g. The rocket was launched from Central Command ). Time : –the time at which the action occurs –(e.g. The rocket was launched yesterday ) Beneficiary : –the entity for whose benefit the action occurs –(e.g. I baked Reggie a cake)

Predicate-Argument Structure Passives – The sandwich was eaten by John – John ate the sandwich – eat(, ) – eat(john,sandwich) – The sandwich was eaten – eat(_,sandwich) – an incomplete or underspecified predicate argument structure Not all Noun Phrases seem to have a meaningful thematic relation associated with them – It rains – It is likely that John ate the sandwich – John is likely to eat the sandwich – It seems that John ate the sandwich – John seemed to eat the sandwich – There seems to be a sandwich over there – A sandwich seems to be over there

Framenet Lexical unit index: – uIndex Let’s take a look at LU eat

Frame: Ingestion

Frame: Ingestion

Example

eat -> ingest -> ingestion

Example perl bfs3.perl eat#v#1 gobble#v#1 Found at distance 1 (11 nodes explored) gobble#v#1 hypo eat#v#1 Lexical Units (LU): breakfast.v, consume.v, devour.v, dine.v, down.v, drink.v, eat.v, feast.v, feed.v, gobble.v, gulp.n, gulp.v, guzzle.v, have.v, imbibe.v, ingest.v, lap.v, lunch.v, munch.v, nibble.v, nosh.v, nurse.v, put away.v, put back.v, quaff.v, sip.n, sip.v, slurp.n, slurp.v, snack.v, sup.v, swig.n, swig.v, swill.v, tuck.v

Example

perl bfs3.perl eat#v#2 breakfast#v#1 Found at distance 1 (14 nodes explored) breakfast#v#1 hypo eat#v#2 Lexical Units (LU): breakfast.v, consume.v, devour.v, dine.v, down.v, drink.v, eat.v, feast.v, feed.v, gobble.v, gulp.n, gulp.v, guzzle.v, have.v, imbibe.v, ingest.v, lap.v, lunch.v, munch.v, nibble.v, nosh.v, nurse.v, put away.v, put back.v, quaff.v, sip.n, sip.v, slurp.n, slurp.v, snack.v, sup.v, swig.n, swig.v, swill.v, tuck.v

Example perl bfs3.perl eat#v#1 munch#v#1 Found at distance 2 (101 nodes explored) crunch#v#3 hypo chew#v#1 enta eat#v#1 perl bfs3.perl eat#v#2 munch#v#1 Found at distance 3 (369 nodes explored) crunch#v#3 hypo chew#v#1 enta eat#v#1 hypo eat#v#2 Lexical Units (LU): breakfast.v, consume.v, devour.v, dine.v, down.v, drink.v, eat.v, feast.v, feed.v, gobble.v, gulp.n, gulp.v, guzzle.v, have.v, imbibe.v, ingest.v, lap.v, lunch.v, munch.v, nibble.v, nosh.v, nurse.v, put away.v, put back.v, quaff.v, sip.n, sip.v, slurp.n, slurp.v, snack.v, sup.v, swig.n, swig.v, swill.v, tuck.v

Example

Digression Generative capacity of (natural) language …

Chomsky Hierarchy Formal language theory [Chomsky, 1956] Finite state languages Context-free languages Context-sensitive languages Recursively enumerable (r.e.) languages

Chomsky Hierarchy grammar vs. machine characterization – A → aB, A → a – equivalent to a directed finite network of states and transitions Finite state languages 01 ab 3 cde ab > 2 fg ab S → aB B → bC C → cD D → dE E → eF F → fG G → g G → aH H → bC F → aI I → bC S → aB B → bC C → cD D → dE E → eF F → fG G → g G → aH H → bC F → aI I → bC Birdsong: Bengalese finch song [Berwick et al., 2011] Birdsong

Some useful notions Mathematical notion of a language as a set of strings L(Bengalese finch song) = {abcdefg, abcdeabcdefg, abcdefgabcdefg,.. } Discrete infinity – “A fsm is the simplest type of grammar which, with a finite amount of apparatus, can generate an infinite number of sentences.” 01 ab 3 cde ab > 2 fg ab Birdsong: Bengalese finch song [Berwick et al., 2011]

Discussion Finite state machine simply encodes precedence relations between elements of the string The (regular) grammar characterization adds in (possibly unintended) hierarchical relations between elements 01 ab 3 cde ab > 2 fg ab Birdsong: Bengalese finch song [Berwick et al., 2011] S → aB B → bC C → cD D → dE E → eF F → fG G → g G → aH H → bC F → aI I → bC S → aB B → bC C → cD D → dE E → eF F → fG G → g G → aH H → bC F → aI I → bC

Chomsky Hierarchy grammar vs. machine characterization – A → aB, A → a – equivalent to a directed finite network of states and transitions Finite state languages Birdsong English is not a finite state language [(9) pg.21, Chomsky, 1955] (11) (i) If S1, then S2 *If S1, or S2 (ii) Either S3, or S4 *Either S3, then S4 Context-free languages if-S-then-S either-S-or-S if-S-then-S either-S-or-S Context-free rules: S → if S, then S S → either S, or S or a pushdown automaton (= fsm + stack) “there are processes of sentence formation that fsm are intrinsically unable to handle.” [pg23, Chomsky, 1955]

Discussion (contd.) Grammatical construction – If.., then – Either.., or – “A grammatical construction is a syntactic template that is paired with conventionalized semantic and pragmatic content” [Wikipedia] In some modern linguistic theories, constructions have little or no distinguished theoretical status

Chomsky Hierarchy Finite state languages Birdsong Context-free languages if-S-then-S either-S-or-S if-S-then-S either-S-or-S English is not a context-free language [Higginbotham, 1984] – the woman such that she left is here Construction: relative clause + extra dependencies the woman such that (the man such that) she (gave this to him) left is here the woman such that (the man such that the man such that) she (gave this to him gave him to this) left is here

Discussion (contd.) – “if a grammar does not have recursive devices (e.g. loops in fsm) it will be prohibitively complex. If it does have recursive devices of some sort, it will produced infinitely many sentences.” [pg24, Chomsky, 1995] Suppose human language is generated by some form of a recursive device: – human language admits infinitely many sentences – sentences can be arbitrarily long in principle (practical limitations on utterance length, working memory etc.) Commonsense notion of grammaticality becomes a bit fuzzy

Chomsky Hierarchy (Fine Grained) Crossing dependencies (found in Dutch and Swiss-German) Finite state languages Context-free languages Mildly context-sensitive languages Birdsong Crossing dependencies (figures taken from [Kallmeyer, 2005]) Tree-adjoining Grammar (TAG) [Joshi] Linear Indexed Grammar [Gazdar] Combinatory Categorial Grammar [Steedman] if-S-then-S either-S-or-S if-S-then-S either-S-or-S

Indexed grammars Mildly context-sensitive languages (linear indexed grammars) Finite state languages Context-free languages if-S-then-S either-S-or-S Context-sensitive languages Chomsky Hierarchy (Fine Grained) Crossing dependencies Birdsong Computational advantage: polynomial time parsable

Primate language Northern muriquis Group studied extensively since 1982 Data collected – 212 sequential exchanges vocalizations, emitted by several members: – 10 adult males (n=80), – 1 sub adult male (n=1), – 17 adult females (n=122), and – 2 sub adult females (n=3) – sequential exchange = vocalization of an individual and the response of different group members – Muriqui vocalizations have a quite long duration allowing to analyze them as sequences of different elements that are repeated or varied over time – It is possible to identify subparts in vocalizations that we designate by the term ‘segmental’ units

Primate language Sample utterance t p t p t p p t p p p t p p p p G t

Indexed grammars Mildly context-sensitive languages (linear indexed grammars) Finite state languages Context-free languages if-S-then-S either-S-or-S Crossing dependencies Birdsong string containing an arithmetic progression Primate language Sequence: – tp tpp.. tpp..p Gt (monotonically increasing # p's) Arithmetic progression sequences

Primate language Sequence: – tp tpp.. tpp..p Gt (arithmetic progression) Indexed grammar [Aho, 1967] – equivalent to nested stack automaton – S[..] →K[p..] Gt – K[..] → tP[..] K[p..] – K[..] → tP[..] – P[p..] → pP[..] – P[] → [] Example derivation: S[] => K[p] Gt => tP[p] K[pp] Gt => tp K[pp] Gt => tp tP[pp] K[ppp] Gt => tp tpp K[ppp] Gt => tp tpp tP[ppp] K[pppp] Gt => tp tpp tppp tP[pppp] Gt => tp tpp tppp tpppp Gt Example derivation: S[] => K[p] Gt => tP[p] K[pp] Gt => tp K[pp] Gt => tp tP[pp] K[ppp] Gt => tp tpp K[ppp] Gt => tp tpp tP[ppp] K[pppp] Gt => tp tpp tppp tP[pppp] Gt => tp tpp tppp tpppp Gt Each non- terminal has individual stack copy

Primate language Sequence: – tp tpp.. tpp..p Gt (arithmetic progression) Rely on a crucial distinction between Indexed grammars (IG) and the strictly smaller class of Linear indexed grammars (LIG) – at most one nonterminal in each production be specified as receiving the stack, whereas in a normal indexed grammar, all nonterminals receive copies of the stack [Vijay-Shanker and Weir, 1994] demonstrated that LIG, CCG, TAG are all (weakly) equivalent formalisms = “Mildly Context- Sensitive Languages” Intuition: indexed grammar with binary branching 1 st nonterminal generates k p’s, and 2 nd nonterminal recursively generates sequences involving k+1 p’s Intuition: indexed grammar with binary branching 1 st nonterminal generates k p’s, and 2 nd nonterminal recursively generates sequences involving k+1 p’s

Indexed grammars Mildly context-sensitive languages (linear indexed grammars) Finite state languages Context-free languages if-S-then-S either-S-or-S Context-sensitive languages Chomsky Hierarchy (Fine Grained) Crossing dependencies Birdsong Computational dis- advantage: not polynomial time parsable Arithmetic progression sequences

Discussion (contd.) Possible objection – did I pick out a hard subset of an easy language? Finite state language “const- ruction” for example, a subset of a finite state language is not necessarily a finite state language

Discussion (contd.) Data what counts as a “construction” here?

Discussion (contd.) Weak generative capacity: – rely on surface string patterning only S[..] →K[p..] Gt K[..] → tP[..] K[p..] K[..] → tP[..] P[p..] → pP[..] P[] → [] S[..] →K[p..] Gt K[..] → tP[..] K[p..] K[..] → tP[..] P[p..] → pP[..] P[] → [] non-terminal and terminal symbols No intuition about the “constituents” of muriqui language

Finite state languages Context-free languages Mildly context-sensitive languages (linear indexed grammars) Indexed grammars if-S-then-S either-S-or-S Context-sensitive languages Human Languages