Chapter 12: Context-Free Grammars Heshaam Faili University of Tehran.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
FSG, RegEx, and CFG Chapter 2 Of Sag’s Syntactic Theory.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Artificial Intelligence 2005/06 From Syntax to Semantics.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שמונה Context Free Grammars and.
NLP and Speech 2004 English Grammar
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Grammar Sentence Constructs.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Chapter 3: Formal Translation Models
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Phrase Structure The formal means of representing constituency.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Chapter 9. Context-Free Grammars for English
Constituency Tests Phrase Structure Rules
THE PARTS OF SYNTAX Don’t worry, it’s just a phrase ELL113 Week 4.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Syntax The number of words in a language is finite
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1.Syntax: the rules of sentence formation; the component of the mental grammar that represent speakers’ knowledge of the structure of phrase and sentence.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Chapter 12: FORMAL GRAMMARS OF ENGLISH Heshaam Faili University of Tehran.
SYNTAX Lecture -1 SMRITI SINGH.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
© Child language acquisition To what extent do children acquire language by actively working out its rules?
NLP. Introduction to NLP Is language more than just a “bag of words”? Grammatical rules apply to categories and groups of words, not individual words.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
1 LIN6932 Spring 2007 LIN6932 Topics in Computational Linguistics Lecture 6: Grammar and Parsing (I) February 15, 2007 Hana Filip.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Culture , Language and Communication
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Chapter 3 Describing Syntax and Semantics
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Artificial Intelligence 2004
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
SYNTAX.
English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
1 Some English Constructions Transformational Framework October 2, 2012 Lecture 7.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שמונה Context Free Parsing אורן.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Week 3. Clauses and Trees English Syntax. Trees and constituency A sentence has a hierarchical structure Constituents can have constituents of their own.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Context Free Grammars. Slide 1 Syntax Syntax = rules describing how words can connect to each other * that and after year last I saw you yesterday colorless.
Natural Language Processing Vasile Rus
Chapter Eight Syntax.
Part I: Basics and Constituency
Chapter Eight Syntax.
Introduction to Linguistics
David Kauchak CS159 – Spring 2019
Presentation transcript:

Chapter 12: Context-Free Grammars Heshaam Faili University of Tehran

2 Motivating CFGs: grammar correction What if we want to create a grammar checker for a word processor? We could write some grammar correction rules which are essentially regular expressions. 1. Match a pattern, e.g. some or certain followed by extend /(some)|(certain) extend/ 2. Change something in the pattern: extend  extent Daniel Naber uses 56 rules like this to build a grammar corrector which works nearly as well as Microsoft Word.

3 More than regular expressions But what about correcting the following: A baseball teams were successful. We should change A to The, but a simple regular expression doesn’t work because we don’t know where the word teams might show up. A wildly overpaid horrendous baseball teams were successful. (Five words later;change needed) A player on both my teams was successful. (Five words later; no change needed) We need to look at how the sentence is constructed in order to build a better rule.

4 Syntax Syntax = the study of the way that sentences are constructed from smaller units. No “dictionary” for sentences  infinite number of possible sentences. The house is large. John believes that the house is large. Mary says that John believes that the house is large. There are some basic principles of sentence organization: Linear order Hierarchial structure (Constituency) Grammatical relations Subcategorization/Dependency relations

5 Linear order Linear order = the order of words in a sentence. A sentence has different meanings based on its linear order. John loves Mary. Mary loves John. Languages vary as to what extent this is true, but linear order is still a guiding principle for organizing words into meaningful sentences.

6 Constituency But we can’t only use linear order to determine sentence organization. e.g. We can’t simply say “The verb is the second word in the sentence.” I eat at really fancy restaurants. Many executives eat at really fancy restaurants.

7 Constituency (cont.) What are the “meaningful units” of the sentence Many executives eat at really fancy restaurants? Many executives really fancy really fancy restaurants at really fancy restaurants eat at really fancy restaurants We refer to these meaningful groupings as constituents of a sentence. There are many “tests” to determine what a constituent is.

8 Constituency tests These tests sometimes work, sometimes don’t Preposed/Postposed constructions—i.e., can you move the grouping around? (1) a. On September seventeenth, I’d like to fly from Atlanta to Denver. b. I’d like to fly on September seventeenth from Atlanta to Denver. c. I’d like to fly from Atlanta to Denver on September seventeenth. Pro-form substitution (2) John has some very heavy books, but he didn’t want them. (3) I want to go home, and John wants to do so, too.

9 Hierarchical structure Note that constituents appear within other constituents. We can represent this in a bracket form or in a syntactic tree Bracketed notation: Unlabled: [[Many executives] [eat [at [[really fancy] restaurants]]]] Labeled: [S [NP [Pro I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [Nom [N flight]]]]]] Syntactic tree is on the next page...

10 [[Many executives] [eat [at [[really fancy] restaurants]]]] a

11 Categories We would also like some way to say that Many executives and really fancy restaurants are the same type of grouping, or constituent, whereas at really fancy restaurants seems to be something else. For this, we will talk about different categories Lexical Phrasal

12 Lexical categories Lexical categories are simply word classes, or parts of speech. The main ones are: verbs: eat, drink, sleep,... nouns: gas, food, lodging,... adjectives: quick, happy, brown,... adverbs: quickly, happily, well, westward prepositions: on, in, at, to, into, of,... determiners/articles: a, an, the, this, these, some, much,...

13 Determining lexical categories How do we determine which category a word belongs to? Distribution: where these kinds of words can appear in a sentence. e.g. Nouns like mouse can appear after articles (“determiners”) like the, while a verb like eat cannot. Morphology: what kinds of word prefixes/suffixes can a word take? e.g. Verbs like walk can take a -ed ending to mark them as past tense. A noun like mouse cannot.

14 Closed & Open classes We can add words to some classes, but not to others. This also seems to correlate with whether a word is “meaningful” or just a function word = only meaning comes from its usage in a sentence. Open classes: new words can be easily added: verbs nouns adjectives adverbs Closed classes: new words cannot be easily added: prepositions determiners

15 Phrasal categories We can also look at the distribution of phrases and see which ones behave in the same way, in order to assign them categories. The joggers ran through the park. What other phrases can we put in place of The joggers? Susan students you most dogs some children a huge, lovable bear my friends from Brazil the people that we interviewed Since all of these contain nouns, we consider these to be noun phrases (NPs).

16 Noun Phrases Noun phrases, like other kinds of phrases, are headed: there is a designated item (the noun) which determines the properties of the whole phrase Before the noun, you can have determiners (and pre-determiners) and adjective phrases After the noun, you can have prepositional phrases, gerunds (and other verbal clauses), and relative clauses You can also have noun-noun compounds

17 Verb Phrases: Subcategorization Verbs tend to drive the analysis of a sentence because they subcategorize for elements We can say that verbs have subcategorization frames sleep: subject find: subject, object show: subject, object, second object want: subject, object, VP-infinitive think: subject, S

18 Grammatical relations Grammatical relations are the basic relations between words in a sentence (4) She eats a mammoth breakfast. In this sentence, She is the subject, while a mammoth breakfast is the object In English, the subject must agree in person and number with the verb.

19 Building a tree Other phrases work similarly, giving us the tree on the following page. (S = sentence, VP = verb phrase, PP = prepositional phrase, AdjP = adjective phrase)

20

21 Phrase Structure Rules (PSRs) We can give rules for building these phrases. That is, we want a way to say that a determiner and a noun make up a noun phrase, but a verb and an adverb do not. Phrase structure rules (PSRs) are a way to build larger constituents from smaller ones. e.g. S  NP VP This says: A sentence (S) constituent is composed of a noun phrase (NP) constituent and a verb phrase (VP) constituent. (hierarchy) The NP must precede the VP. (linear order)

22 Some other English rules NP  Det N (the cat, a house, this computer ) NP  Det AdjP N (the happy cat, a really happy house) Can combine these by putting parentheses around AdjP, indicating that it is optional. (Note that this is a different use of parentheses than with regular expressions.) NP  Det (AdjP) N AdjP  (Adv) Adj (really happy) VP  V (laugh, run, eat) VP  V NP (love John, hit the wall, eat cake) VP  V NP NP (give John the ball) PP  P NP (to the store, at John, in a New York minute) NP  NP PP (the cat on the stairs)

23 PSRs and Trees With every phrase structure rule (PSR), you can draw a tree for it.

24 PSR Practice Try analyzing these sentences and drawing trees for them, based on the PSRs given above. The man in the kitchen drives a truck. That dang cat squeezed some fresh orange juice. The mouse in the corner by the stairs ate the cheese.

25 Properties of Phrase Structure Rules generative = a schematic strategy that describes a set of sentences completely. potentially (structurally) ambiguous = have more than one analysis The mouse in [the corner [by the stairs]] [The mouse [in the corner] [by the stairs]] I saw the man with the telescope. hierarchical = categories have internal structure; they aren’t just linearly ordered. recursive = property allowing for a rule to be reapplied (within its hierarchical structure). e.g. NP  NP PP PP  P NP The property of recursion means that the set of potential sentences in a language is infinite.

26 Coordination One type of phrase we have not mentioned yet is the coordinate phrase, for example John and Mary Coordination can generally apply to any kinds of (identical) phrases This makes it ambiguous and cause problems for parsing (5) I saw John and Mary left early. At some point, a parser has to decide between and joining NPs and joining Ss.

27 Context-free grammars A context-free grammar (CFG) is essentially a collection of phrase structure rules. It specifies that each rule must have: a left-hand side (LHS): a single non-terminal element = (phrasal and lexical) categories a right-hand side (RHS): a mixture of non-terminal and terminal elements terminal elements = actual words A CFG tries to capture a natural language completely. Why “context-free”? Because these rules make no reference to any context surrounding them. i.e. you can’t say “PP  P NP” when there is a verb phrase (VP) to the left.

28 Formal definition of CFGs 1. N: a set of non-terminal (phrasal) symbols, e.g., NP, VP, etc. 2.  : a set of terminal (lexical) symbols N and  are disjoint 3. P: a set of productions (rules) of the form A  , where A is a non-terminal and  is a collection of terminals and non-terminals 4. S: a designated start symbol

29 The language defined by a CFG Derivation: A directly derives  if there is a rule of the form A  A chain of derivations can be established, such that A derives a string: A  1  2...  m We can thus define a language as the set of strings which are derivable from the designated start symbol S Sentences in the language are grammatical

30 Pushdown automata Pushdown automaton = the computational implementation of a context-free grammar. It uses a stack (its memory device) and has two operations: push = put an element onto the top of a stack. pop = take the topmost element from the stack. This has the property of being Last in first out (LIFO). So, when you have a rule like “PP  P NP”, you push NP onto the stack and then push P onto it. If you find a preposition (e.g. on), you pop P off of the stack and now you know that the next thing you need is an NP.

31 Writing grammar correction rules So, with context-free grammars, we can now write some correction rules, which we will just sketch here. A baseball teams were successful. A followed by PLURAL NP: change A  The John at the taco. The structure of this sentence is NP PP, but that doesn’t make up a whole sentence. We need a verb somewhere.

32 CFGs and natural language Can we just use regular languages/finite-state automata for natural languages? (6)a. The mouse escaped. b. The mouse that the cat chased escaped. c. The mouse that the cat that the dog saw chased escaped. d.... (7)a. aa b. abba c. abccba d.... Center-embedding of arbitrary depth needs to be captured to capture language competence  Not possible with a finite state automaton.

33 Grammar Equivalency& Normal Form Weak/Strong Grammar equivalency Chomsky Normal Form (CNF) A  B C D A  B X, X  C D A  X D, X  B C

34 SOME GRAMMAR RULES FOR ENGLISH(Sentence-Level Constructions) Declarative Imperative S → VP yes-no question S → Aux NP VP wh-phrase wh-word (who, whose, when, where, what, which, how, why) wh-subject-question S → Wh-NP VP wh-non-subject question S → Wh-NP Aux NP VP Long distance dependencies

35 The Noun Phrase The Determiner Det → NP ′s The Nominal Nominal → Noun Before the Head Noun cardinal numbers ordinal number Quantifiers adjective phrase After the Head Noun Gerundive Nominal → Nominal GerundVP GerundVP → GerundV NP | GerundV PP | GerundV | GerundV NP PP GerundV → being | arriving | leaving |... relative pronoun (who, that, …) subject relative Nominal → Nominal RelClause RelClause → (who | that) VP

36 Agreement What flights leave in the morning? *[What flight] leave in the morning? S → Aux NP VP S → 3sgAux 3sgNP VP S → Non3sgAux Non3sgNP VP 3sgAux → does | has | can |... Non3sgAux → do | have | can |... Doubling the grammar only to handle agreement between verb/subject. What about agreement between determiners and noun (these flights) Using Features and Unification instead of increasing the grammar size

37 The Verb Phrase and Subcategorization VP → Verb disappear VP → Verb NP prefer a morning flight VP → Verb NP PP leave Boston in the morning VP → Verb PP leaving on Thursday sentential complements: You [VP [V said [S there were two flights that were the cheapest ]]] (VP → Verb S) TRANSITIVE Intransitive SUBCATEGORIZE (complements) Sub-categorization frame

38 Subcategorization Extending the grammar size : Verb-with-NP-complement → find | leave | repeat |... Verb-with-S-complement → think | believe | say |... Verb-with-Inf-VP-complement → want | try | need |... Using Features and Unification

39 Auxiliaries Modal Perfect be Progressive be Passive be modal < perfect < progressive < passive modal perfect could have been a contender modal passive will be married perfect progressive have been feasting modal perfect passive might have been prevented

40 Coordination Conjunctions Please repeat [NP [NP the flights] and [NP the costs]] (NP → NP and NP) What flights do you have [VP [VP leaving Denver] and [VP arriving in San Francisco]] [S [S I’m interested in a flight from Dallas to Washington] and [S I’m also interested in going to Baltimore]] VP → VP and VP S → S and S

41 TREEBANKS