Download presentation
Published byJennifer Robbins Modified over 8 years ago
1
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms
2
Introduction This lecture has two aims: See also
Crash course in sentence-level grammar Show how different linguistic phenomena can be captured by grammar rules. See also Jurafsky and Martin Chapter 9 Internet Grammar of English NLP Algorithms
3
Part 1 Grammar of English NLP Algorithms
4
Different Kinds of Rule
Morphological rules.. govern how words may be composed: re+invest+ing = reinvesting. Syntactic rules .. govern how words and constituents combine to form grammatical sentences. Semantic rules .. govern how meanings may be combined. NLP Algorithms
5
Syntax: Why? You need knowledge of syntax in many applications:
Parsing Grammar checkers Question answering/database access Information extraction Generation Translation Full versus superficial analysis? NLP Algorithms
6
Levels of Grammar Organisation
Word Classes: different parts of speech (POS). Phrase Classes: sequences of words inheriting the characteristics of certain word classes. Clause Classes: sequences of phrases containing at least one verb phrase. On the basis of these one may define: Grammatical Relations: role played by constitutents e.g. subject; predicate; object Syntax-Semantics interface NLP Algorithms
7
Word Classes Closed classes. Open classes.
determiners : the, a, an, four. pronouns : it, he etc. prepositions : by, on, with . conjunctions : and, or, but. Open classes. nouns refer to objects or concepts: cat , beauty , Coke. adjectives describe or qualify nouns: fried chickens. verbs describe what the noun does: John jumps. adverbs describe how it is done: John runs quickly. NLP Algorithms
8
Word Class Characteristics
Different word classes have characteristic subclasses and properties Subclasses Properties Noun proper; mass; count number; gender Verb transitive; intransitive Number; gender; person, tense Adjective dimension; age; colour Number, gender Further Notes on Word Classes Different word classes have characteristic subclasses and properties. Nouns, for example, can be classified into different types: PROPER nouns are names like John or Paris; COMMON nouns cow, house stand for collections of properties that characterise classes of object ABSTRACT nouns like beauty stand for abstract ideas rather than physically realised objects. MASS nouns such as bread name substances rather than individual things. Consequently, they cannot be counted. COUNT nouns like loaf are precisely the opposite. Nouns also have properties such as number (i.e. singular or plural), gender (i.e masculine, feminine, neuter), and (in some languages like German) case (e.g. nominative, dative, accusative). Nouns are often preceded by the words the, a, or an. These words are called determiners. They indicate the kind of reference which the noun has which can indicate Definiteness/indefiniteness: a, the Proximity to speaker: this, that Quantity: all, both, many, several, four Adjectives come before nouns (in English) and typically describe attributes of them. These attributes can be classified into types (e.g. colour, size, age), and in some languages there is a default ordering between the types. In English, for example, we prefer big brown dog to brown big dog. Like nouns, adjectives have properties of number, gender and case, and “agree with” the nouns they modify. Hence mejda gbira not mejda gbir. Adjectives can be modified by adverbs, which in English come before (e.g. largely irrelevant). Finally, most adjectives admit comparison: Basic form: young, recent Comparative form: younger, more recent Superlative form: youngest, most recent NLP Algorithms
9
Phrases Longer phrases may be used rather than a single word, but fulfilling the same role in a sentence. Noun phrases refer to objects: four fried chickens. Verb phrases state what the noun phrase does: kicks the dog. Adjective phrases describe/qualify an object: sickly sweet. Adverbial phrases describe how it is done: very carefully. prepositional phrases: add information to a verb phrase: on the table NLP Algorithms
10
Phrases can be Complex e.g. Noun Phrases
Proper Name or Pronoun: Monday; it Specifiers, noun: the day Specifier, premodifier, noun: the first wet day Specifiers, qualifiers, noun, postmodifier: The first wet day that I enjoyed in June NLP Algorithms
11
But they all fit the same context
Monday It The day The first wet day The first wet day that I enjoyed in June was sunny. NLP Algorithms
12
Clauses A clause is a combination of noun phrases and verb phrases
Clauses can exist at the top level (main clause) or can be embedded (subordinate clause) Top level clause is a sentence. E.g. The cat ate the mouse. Embedded clause is subordinate e.g. John said that Sandy is sick. Unlike phrases, whole sentences can be used to say something complete, e.g. to state a fact or ask a question. NLP Algorithms
13
Different Kinds of Sentences
Assertion: John ate the cat. Yes/No question: Did John eat the cat? Wh- question: What did John eat? Command: Eat the cat John! NLP Algorithms
14
Context Free Grammar Rules
Part II Context Free Grammar Rules NLP Algorithms
15
Formal Grammar A formal grammar consists of
Terminal Symbols (T) Non Terminal Symbols (NT, disjoint from TS) Start Symbol (a distinguished NT) Rewrite rules of the form , where and are strings of symbols Different classes of grammar result from various restrictions on the form of rules NLP Algorithms
16
Classes of Grammar Type Grammars Languages Machines
Phrase Structure Unrestricted Recursively Enumerable TM 1 Context Sensitive LBA 2 Context Free PDA 3 Regular FSA of grammar type increasing strength NLP Algorithms
17
Restrictions on Rules For all rules
Type 0 (unrestricted): no restrictions Type 1 (context sensitive): |||| Type 2 (context free): is a single NT symbol Type 3 (regular) Every rule is of the form A aB or A a where A,B NT and aT NLP Algorithms
18
Which Class for NLP? Type 3 (Regular). Good for morphology. Cannot handle central embedding of sentences. The man that John saw eating died. Type 2 (Context Free). OK but problems handling certain phenomena e.g. agreement. Type 1 (Context Sensitive). Computational properties not well understood. Too powerful. Type 0 (Turing). Too powerful. NLP Algorithms
19
Weak versus Strong Grammar class that is too weak
cannot characterise/discriminate exactly NL sentence structures. Grammar class that is too strong has the power to characterise/discriminate structures that don't exist in human languages. Stronger grammar, higher complexity → less efficient computations. NLP Algorithms
20
Example Grammar Cabinet discusses police chief’s case
French gunman kills four s np vp np n np adj n np n np vp v np NLP Algorithms
21
Classifying the Symbols
NT – symbols appearing on the left Start – symbol appearing only on the left from which every other symbol can be derived. T – symbols appearing only on the right To include words we also need special rules such as n [police] n [gunman] n [four] Latter rules define the lexicon or “dictionary interface”. NLP Algorithms
22
Grammar Induces Phrase Structure
vp np np adj n v n French gunman kills four NLP Algorithms
23
Phrase Structure PS includes information about
precedence between constituents dominance between constituents PS constitutes a trace of the rule applications used to derive a sentence PS does not tell you the order in which the rules were used NLP Algorithms
24
Handling Sentence Types
Declaratives John left. S → NP VP Imperatives Leave! S →VP Yes-No Questions Did John leave? S →Aux NP VP WH Questions When did John leave? S →Wh-word Aux NP VP NLP Algorithms
25
Handling Recursive Structures
Flights to Miami Flights to Miami from Boston Flights to Miami from Boston in April Flights to Miami from Boston in April on Friday Flights to Miami from Boston in April on Friday under $300. Flights to Miami from Boston in April on Friday under $300 with lunch. NLP Algorithms
26
Recursive Rules NP → NP PP PP → Preposition NP 19.11.2002
NLP Algorithms
27
Handling Agreement NP → Determiner N Include these days, this day
Exclude this days, these day NP → NPSing NP → NPPlur NPPlur → DetSing NSing NPPlur → DetPlur NPlur Agreement also includes number, gender, case. Danger: proliferation of categories/rules. NLP Algorithms
28
Subcategorisation Intransitive verb: no object John disappeared John disappeared the cat* Transitive verb: one object John opened the window John opened* Ditransitive verb: two objects John gave Mary the book John gave Mary* NLP Algorithms
29
Subcategorisation Rules
Intransitive verb: no object VP → V Transitive verb: one object VP → V NP Ditransitive verb: two objects VP → V NP NP If you take account of the category of items following the verb, there are about 40 different patterns like this in English. NLP Algorithms
30
Overgeneration A grammar should generate only sentences in the language. It should exclude sentences not in the language. s n vp vp v n [John] v [snore] v [snores] NLP Algorithms
31
Undergeneration A grammar should generate all sentences in the language. There should not be sentences in the language that are not generated by the grammar. s n vp vp v n [John] n [gold] v [found] NLP Algorithms
32
Movement John looked up the number John looked the number up
NLP Algorithms
33
Appropriate Stuctures
A grammar should assign linguistically plausible structures. s n v d a n John ate a juicy hamburger vp s vs. np vp np n v d a n John ate a juicy hamburger NLP Algorithms
34
Ambiguity np np pp pp prep np
(the man) (on the hill with a telescope by the sea) (the man on the hill) (with a telescope by the sea) (the man on the hill with a telescope)( by the sea) etc. NLP Algorithms
35
Criteria for Evaluating Grammars
Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand? How many rules are there? Does it contain generalisations or is it just a collection of special cases? How ambiguous is it? NLP Algorithms
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.