Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,

Slides:



Advertisements
Similar presentations
Mrach 1, 2009Dr. Muhammed Al-Mulhem1 ICS482 Formal Grammars Chapter 12 Muhammed Al-Mulhem March 1, 2009.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
1 LING 180 Autumn 2007 LINGUIST 180: Introduction to Computational Linguistics Dan Jurafsky, Marie-Catherine de Marneffe Lecture 9: Grammar and Parsing.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Grammar and Parsing.
Artificial Intelligence 2005/06 From Syntax to Semantics.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שמונה Context Free Grammars and.
Features and Unification
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Grammar Sentence Constructs.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
CPSC 503 Computational Linguistics
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING COMP3310 Natural Language Processing Eric Atwell, Language Research Group.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin and Rada Mihalcea.
Context-Free Grammars for English 1 인공지능 연구실 허 희 근.
Speech and Language Processing Lecture 12—02/24/2015 Susan W. Brown.
Session 13 Context-Free Grammars and Language Syntax Introduction to Speech and Natural Language Processing (KOM422 ) Credits: 3(3-0)
TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Syntax Sudeshna Sarkar 25 Aug Sentence-Types Declaratives: A plane left S -> NP VP Imperatives: Leave! S -> VP Yes-No Questions: Did the plane.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
10/3/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
6/2/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 9 Giuseppe Carenini.
1 LIN6932 Spring 2007 LIN6932 Topics in Computational Linguistics Lecture 6: Grammar and Parsing (I) February 15, 2007 Hana Filip.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
CPSC 503 Computational Linguistics
Artificial Intelligence 2004
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
3/20/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Speech and Language Processing Formal Grammars Chapter 12.
SI485i : NLP Set 7 Syntax and Parsing. 2 Syntax Grammar, or syntax: The kind of implicit knowledge of your native language that you had mastered by the.
Context Free Grammars. Slide 1 Syntax Syntax = rules describing how words can connect to each other * that and after year last I saw you yesterday colorless.
Natural Language Processing Vasile Rus
Natural Language Processing Vasile Rus
Heng Ji September 13, 2016 SYNATCTIC PARSING Heng Ji September 13, 2016.
Speech and Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Part I: Basics and Constituency
Lecture 13: Grammar and Parsing (I) November 9, 2004 Dan Jurafsky
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
CSCI 5832 Natural Language Processing
Heng Ji January 17, 2019 SYNATCTIC PARSING Heng Ji January 17, 2019.
Presentation transcript:

Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7, 2009

Today’s Agenda Words… structure… meaning… Formal Grammars Context-free grammar Grammars for English Treebanks Dependency grammars Next week: parsing algorithms

Grammar and Syntax By grammar, or syntax, we mean implicit knowledge of a native speaker Acquired by around three years old, without explicit instruction It’s already inside our heads, we’re just trying to formally capture it Not the kind of stuff you were later taught in school: Don’t split infinitives Don’t end sentences with prepositions

Syntax Why should you care? Syntactic analysis is a key component in many applications Grammar checkers Conversational agents Question answering Information extraction Machine translation …

Constituency Basic idea: groups of words act as a single unit Constituents form coherent classes that behave similarly With respect to their internal structure: e.g., at the core of a noun phrase is a noun With respect to other constituents: e.g., noun phrases generally occur before verbs

Constituency: Example The following are all noun phrases in English... Why? They can all precede verbs They can all be preposed …

Grammars and Constituency For a particular language: What are the “right” set of constituents? What rules govern how they combine? Answer: not obvious and difficult That’s why there are so many different theories of grammar and competing analyses of the same data! Approach here: Very generic Focus primarily on the “machinery” Doesn’t correspond to any modern linguistic theory of grammar

Context-Free Grammars Context-free grammars (CFGs) Aka phrase structure grammars Aka Backus-Naur form (BNF) Consist of Rules Terminals Non-terminals

Context-Free Grammars Terminals We’ll take these to be words (for now) Non-Terminals The constituents in a language (e.g., noun phrase) Rules Consist of a single non-terminal on the left and any number of terminals and non-terminals on the right

Some NP Rules Here are some rules for our noun phrases Rules 1 & 2 describe two kinds of NPs: One that consists of a determiner followed by a nominal Another that consists of proper names Rule 3 illustrates two things: An explicit disjunction A recursive definition

L 0 Grammar

CFG: Formal definition

Three-fold View of CFGs Generator Acceptor Parser

Derivations and Parsing A derivation is a sequence of rules applications that Covers all tokens in the input string Covers only the tokens in the input string Parsing: given a string and a grammar, recover the derivation Derivation can be represented as a parse tree Multiple derivations?

Parse Tree: Example Note: equivalence between parse trees and bracket notation

Natural vs. Programming Languages Wait, don’t we do this for programming languages? What’s similar? What’s different?

An English Grammar Fragment Sentences Noun phrases Issue: agreement Verb phrases Issue: subcategorization

Sentence Types Declaratives: A plane left. S  NP VP Imperatives: Leave! S  VP Yes-No Questions: Did the plane leave? S  Aux NP VP WH Questions: When did the plane leave? S  WH-NP Aux NP VP

Noun Phrases Let’s consider these rules in detail: NPs are a bit more complex than that! Consider: “All the morning flights from Denver to Tampa leaving before 10”

A Complex Noun Phrase “head” = central, most critical part of the NP “stuff that comes before” “stuff that comes after”

Determiners Noun phrases can start with determiners... Determiners can be Simple lexical items: the, this, a, an, etc. (e.g., “a car”) Or simple possessives (e.g., “John’s car”) Or complex recursive versions thereof (e.g., John’s sister’s husband’s son’s car)

Premodifiers Come before the head Examples: Cardinals, ordinals, etc. (e.g., “three cars”) Adjectives (e.g., “large car”) Ordering constraints “three large cars” vs. “?large three cars”

Postmodifiers Naturally, come after the head Three kinds Prepositional phrases (e.g., “from Seattle”) Non-finite clauses (e.g., “arriving before noon”) Relative clauses (e.g., “that serve breakfast”) Similar recursive rules to handle these Nominal  Nominal PP Nominal  Nominal GerundVP Nominal  Nominal RelClause

A Complex Noun Phrase Revisited

Agreement Agreement: constraints that hold among various constituents Example, number agreement in English This flight Those flights One flight Two flights *This flights *Those flight *One flights *Two flight

Problem Our NP rules don’t capture agreement constraints Accepts grammatical examples (this flight) Also accepts ungrammatical examples (*these flight) Such rules overgenerate We’ll come back to this later

Verb Phrases English verb phrases consists of Head verb Zero or more following constituents (called arguments) Sample rules:

Subcategorization Not all verbs are allowed to participate in all VP rules We can subcategorize verbs according to argument patterns (sometimes called “frames”) Modern grammars may have 100s of such classes This is a finer-grained articulation of traditional notions of transitivity

Subcategorization Sneeze: John sneezed Find: Please find [a flight to NY] NP Give: Give [me] NP [a cheaper fare] NP Help: Can you help [me] NP [with a flight] PP Prefer: I prefer [to leave earlier] TO-VP Told: I was told [United has a flight] S …

Subcategorization Subcategorization at work: *John sneezed the book *I prefer United has a flight *Give with a flight But some verbs can participate in multiple frames: I ate I ate the apple How do we formally encode these constraints?

Why? As presented, the various rules for VPs overgenerate: John sneezed [the book] NP Allowed by the second rule…

Possible CFG Solution Encode agreement in non-terminals: SgS  SgNP SgVP PlS  PlNP PlVP SgNP  SgDet SgNom PlNP  PlDet PlNom PlVP  PlV NP SgVP  SgV Np Can use the same trick for verb subcategorization

Possible CFG Solution Critique? It works… But it’s ugly… And it doesn’t scale (explosion of rules) Alternatives? Multi-pass solutions

Three-fold View of CFGs Generator Acceptor Parser

The Point CFGs have about just the right amount of machinery to account for basic syntactic structure in English Lot’s of issues though... Good enough for many applications! But there are many alternatives out there…

Treebanks Treebanks are corpora in which each sentence has been paired with a parse tree Hopefully the right one! These are generally created: By first parsing the collection with an automatic parser And then having human annotators correct each parse as necessary But… Detailed annotation guidelines are needed Explicit instructions for dealing with particular constructions

Penn Treebank Penn TreeBank is a widely used treebank 1 million words from the Wall Street Journal Treebanks implicitly define a grammar for the language

Penn Treebank: Example

Treebank Grammars Such grammars tend to be very flat Recursion avoided to ease annotators burden Penn Treebank has 4500 different rules for VPs, including… VP  VBD PP VP  VBD PP PP VP  VBD PP PP PP VP  VBD PP PP PP PP

Why treebanks? Treebanks are critical to training statistical parsers Also valuable to linguist when investigating phenomena

Dependency Grammars CFGs focus on constituents Non-terminals don’t actually appear in the sentence So what if you got rid of them? In dependency grammar, a parse is a graph where: Nodes represent words Edges represent dependency relations between words (typed or untyped, directed or undirected)

Dependency Relations

Example Dependency Parse They hid the letter on the shelf Compare with constituent parse… What’s the relation?

Summary CFG can be used to capture various facts about the structure of language Agreement and subcategorization cause problems… And there are alternative formalisms Treebanks as an important resource for NLP Next week: parsing