101035 中文信息处理 Chinese NLP Lecture 8
句——语法分析(1) Grammatical Analysis (1) 语法分析基础(Basics) 形式语法(Formal grammars) 上下文无关语法(Context-free grammars) 依存语法(Dependency grammar)
语法分析基础 Basics Constituency(句子成分) Grammar, or strictly speaking syntax, is about how words are put together to make sentences. A constituent is a group of words, assuming a certain syntactic role. A constituent stands in certain grammatical relations to other constituents.
Examples of Constituents English noun phrases English noun phrases appear in similar syntactic environments. But an individual word in a noun phrase cannot. Harry the Horse, a high-class spot such as Mindy’s the Broadway coppers, the reason he comes into the Hot Box a high-class spot such as Mindy’s attracts. . . the Broadway coppers love. . . * a high-class attracts. . . * the love. . .
Examples of Constituents Chinese phrases “把……”,“被……” Structural account 老师被迟到的学生逗乐了。= 迟到的学生把老师逗乐了。 ≠ * 老师被迟到的学生被逗乐了。 老师被冤枉的事情传开了。≠ * 冤枉的事情把老师传开了。 = 老师被冤枉的事情被传开了。 电话被监听的老师找到了。= 监听的老师把电话找到了。 = 电话被监听的老师被找到了。
形式语法 Formal Grammars Enumeration The grammar of a language can be the set of all enumerated sentences. We cannot exhaust all possible sentences or deal with new sentences. Rather, we should use recursive language to describe sentences with internal structure.
Regular expressions Symbols of a language (POS) ART(冠词), PRON(代词) N(名词), V(动词), ADJ(形容词), ADV(副词) Combination patterns of the symbols ART+N;ART+N+V;ART+ADJ+N+V Regular expression symbols *: occurs zero or more times ART+ADJ*+N +: occurs 1 or more times ART+ADJ++N ( ): occurs zero or 1 time ART+(ADJ)+N |: disjunctions N | PRON + V
In-Class Exercise Write a regular expression that can describe all the following phrases. 老张是一个环卫工 老张是一个聪明的环卫工。 老张是一个聪明勤劳的环卫工。 他是一个聪明的人。
Rules in a Formal Grammar A set of rules or productions express the ways that symbols of the language can be grouped and ordered together. S(句子), NP(名词短语), VP(动词短语), PP(介词短语) Formal Definition of a Formal Grammar N: a set of non-terminal symbols (or variables) Σ: a set of terminal symbols (disjoint from N) R: a set of rules or productions, each of the form A β, where A is a nonterminal, β is a string of symbols from the infinite set of strings (Σ ∪N)∗ S: a designated start symbol S NP VP, NP Det N, VP V NP, PP Prep NP
上下文无关语法 Context-Free Grammars Definition As a kind of formal grammar, Context-Free Grammars (CFGs) are the most commonly used mathematical system for modeling the constituent structure of a language. They are also called Phrase- Structure Grammars.
Parse tree A parse tree is a tree structure that shows how the rules in a CFG are used in a sequence to expand a non-terminal node into terminal nodes. NP → Det Nominal Det → a Nominal → Noun Noun → flight
I prefer a morning flight. An English Example Lexicon I prefer a morning flight.
I prefer a morning flight. An English Example Grammar I prefer a morning flight.
I prefer a morning flight. An English Example Parse Tree I prefer a morning flight.
Chinese Examples
Treebanks A Treebank is a corpus in which every sentence is syntactically annotated with a parse tree. Treebanks are invaluable resources for NLP, especially parsing. The Penn Treebank Project is a representative treebank. Samples from Penn Treebank.
Chomsky Normal Form VP → VBD NP PP VP → VP PP VP -> VBD NP PP* A CFG is in Chomsky Normal Form (CNF) if each production is either of the form A → B C or A → a. That is, the right-hand side of each rule either has two non-terminal symbols or one terminal symbol. Conversion to CNF VP → VBD NP PP VP → VP PP VP -> VBD NP PP*
依存语法 Dependency Grammar Definition It is a kind of grammar where the syntactic structure of a sentence is described purely in terms of words and binary semantic or syntactic relations between these words. Dependency relations are directional. There are no structural levels or non-terminal nodes as in CFG.
A Chinese Example Dependency Tree Dependency Graph 那个小孩喜欢通俗歌曲
Axioms of Dependency Only one constituent in a sentence is independent. All the other constituents in the sentence are dependent on some constituent. No constituent is dependent on two or more other constituents. If A is dependent on B and C is situated between A and B in the sentence, then either C is dependent on A or B, or C is dependent on a constituent between A and B.
Conditions of Dependency Tree Single Type Node: A dependency tree has only terminal nodes and no non-terminal nodes. Single Parent Node: The root node is the only parent node. All the other nodes have only one parent node. Unique Root Node: A dependency tree has only one root node, which governs all the other nodes. Non-overlapping: A dependency tree’s branches cannot overlap with each other. Mutual exclusiveness: The relations of governing and preceding are exclusive. If two nodes have a “governing” relation between them, they cannot have a “preceding” relation.
Dependency Relations There are more than 50 dependency relations in English (Stanford Parser) Dependency relation Meaning Example amod adjectival modifier Sam eats red meat amod(meat, red) dobj direct object She gave me a raise dobj(gave, raise) nsubj nominal subject Clinton defeated Dole nsubj (defeated, Clinton) pcomp prepositional complement They heard about you missing classes pcomp(about, missing) tmod temporal modifier Last night, I swam in the pool tmod(swam, night)
In-Class Exercise Given the sentence The sausage was eaten by his dog, complete the following dependency relations by choosing from the list of {nsubj, amod, dobj, pcomp, tmod}. _____(eat, sausage) _____(eat, dog)
Workers dumped sacks into a bin. Heads and Dependency Syntactic constituents could be associated with a lexical head. N is the head of an NP, V is the head of a VP … Workers dumped sacks into a bin.
Vinken will join the board as a nonexecutive director Nov 29. Heads and Dependency A dependency graph can be automatically derived from a context-free parse by using the head rules. Vinken will join the board as a nonexecutive director Nov 29.
Wrap-Up 语法分析基础 形式语法 依存语法 上下文无关语法 Examples Constituents Treebanks Regular Expressions Symbols and Rules Formal Definition 上下文无关语法 Parse Tree Examples Treebanks 依存语法 Axioms Dependency Tree and Graph Dependency Relations Heads and Dependency