1 13. LANGUAGE AND COMPLEXITY 2007 년 11 월 03 일 인공지능연구실 한기덕 Text: Speech and Language Processing Page.477 ~ 498.

Slides:



Advertisements
Similar presentations
Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
Advertisements

Is natural language regular? Context –free? (chapter 13)
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
Jamie Frost – Franks Society MT10. What is language?
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Programming Languages Wrap-up. Your Toolkit Object-oriented Imperative Functional Logic.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
CS21 Decidability and Tractability
Parsing I Miriam Butt May 2005 Jurafsky and Martin, Chapters 10, 13.
1 The Chomsky Hierarchy. 2 Unrestricted Grammars: Rules have form String of variables and terminals String of variables and terminals.
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
Linear Bounded Automata LBAs
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
COP4020 Programming Languages
Fall 2003Costas Busch - RPI1 Turing Machines (TMs) Linear Bounded Automata (LBAs)
Finite State Machines Data Structures and Algorithms for Information Processing 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Language and Complexity Read J & M Chapter 13.. What Do We Mean by Complexity? Informal: How hard is it to analyze and solve the problem? Formal: What.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
Grammars CPSC 5135.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Introduction to Language Theory
Lecture # 5 Pumping Lemma & Grammar
The Chomsky Hierarchy. Sentences The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Properties of Regular Languages
9.7: Chomsky Hierarchy.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 203: Introduction to Formal Languages and Automata
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Formal Languages and Grammars
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
The Chomsky Hierarchy.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Lecture 16b Turing Machines Topics: Closure Properties of Context Free Languages Cocke-Younger-Kasimi Parsing Algorithm June 23, 2015 CSCE 355 Foundations.
Lecture 6: Context-Free Languages
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Computational Lexicology, Morphology and Syntax Diana Trandab ă ț Academic year
CS 154 Formal Languages and Computability May 12 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Computability. Turing Machines Read input letter & tape letter, write tape letter, move left or right.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Linear Bounded Automata LBAs
Natural Language Processing - Formal Language -
Context Sensitive Languages and Linear Bounded Automata
Lecture 22 Pumping Lemma for Context Free Languages
FORMAL LANGUAGES AND AUTOMATA THEORY
Computer Science IN ACTION The Chomsky Hierarchy & the Complexity of the English Language.
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CSE322 Chomsky classification
CSE322 The Chomsky Hierarchy
Jaya Krishna, M.Tech, Assistant Professor
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CHAPTER 1 Regular Languages
COMPILER CONSTRUCTION
Presentation transcript:

1 13. LANGUAGE AND COMPLEXITY 2007 년 11 월 03 일 인공지능연구실 한기덕 Text: Speech and Language Processing Page.477 ~ 498

2 Contents  13.1 The Chomsky Hierarchy  13.2 How to Tell if a Language Isn’t Regular  The Pumping Lemma  Are English and Other Natural Languages Regular Languages?  13.3 Is Natural Language Context-Free?  13.4 Complexity and Human Processing

3 13. LANGUAGE AND COMPLEXITY  13.1 The Chomsky Hierarchy

The Chomsky Hierarchy  A is a single non-terminal  α, β, and γ are arbitrary strings of terminal and non-terminal symbols  Type 0 or unrestricted grammars  No restrictions on the form of their rules, except that the left-hand side cannot be the empty string ε.  Type 0 grammars characterize the recursively enumerable languages, that is, those whose strings can be listed (enumerated) by a Turing machine.  Turing machine

The Chomsky Hierarchy  Context-sensitive grammars  Rules that rewrite a non-terminal symbol A in the context αAβ as any non- empty string of symbols  Be written in the form αAβ → α β γ or in the from A → γ /α_β  Context-free grammars  Context-free rules allow any single non-terminal to be rewritten as any string of terminals and non-terminals  A non-terminal may also be rewritten as ε

The Chomsky Hierarchy  Regular grammars  Regular grammars can either be right-linear or left-linear.  A rule in a right-linear grammar has a single non-terminal on the left, and at most one non-terminal on the right-hand side.  If there is a non-terminal on the right-hand side, it must be the last symbol in the string.

How to tell if a language isn’t regular  If English is regular, we would write regular expressions, and use efficient automata to process the rules.  If English is context-free, we would write context-free rules and use the Earley algorithm to parse sentences, and so on.  Regular expression  a+b*c?  Earley algorithm  s → np ▪ vp

How to tell if a language isn’t regular  The Pumping Lemma  Useful tool for proving that a given language is not regular  Tow intuitions behind pumping lemma -First, if a language can be modeled by a finite automaton, we must be able to decide with a bounded amount of memory whether any string was in the language or not. => The memory needs must not be proportional to the length of the input. => This mean for example that languages like a n b n are not likely to be regular. -The second intuition relies on the fact that if a regular language has any long strings (longer than the number of states in the automaton), there must be some sort of loop in the automaton for the language. => we can use this fact by showing that if a language doesn’t have such a loop, then it can’t be regular.  무한 정규언어에 속하는 어떤 문자열을 pumping 하면 다시 그 무한 정규언 어에 속한다는 원리를 이용

9 The Pumping Lemma  Deterministic FSA

10 The Pumping Lemma  Pumping lemma. Let L be an infinite regular language Then there are strings x, y, and z, such that y ≠ ε and xy n z Є L for n ≥ 0.  The lemma is not used for showing that a language is regular. Rather it is used for showing that a language isn’t regular.  Let’s use the pumping lemma to show that the language a n b n is not regular  1. y is composed only of as. (This implies that x is all as too, and z contains all the bs, perhaps preceded by some as.) But if y is all as, that means xy n z has more as than xyz. But this means it has more as than bs, and so cannot be a member of the language a n b n !  2. y is composed only of bs. The problem here is similar to case 1; If y is all bs, that means xy n z has more bs than xyz, and hence has more bs than as.  3. y is composed of both as and bs (this implies that x is only as, while z is only bs). This means that xy n z must have some bs before as, and again cannot be a member of the language a n b n !

How to tell if a language isn’t regular  Are English and Other Natural Languages Regular Languages?  Chomsky (1956, 1957) -English syntactic structures -If S 1, then S 2 -Either S 3, or S 4 -Them man who said S 5 is arriving today -“If” must be followed by “then” -If either the man who said S 5 is arriving today or the man who said S 5 is arriving tomorrow, then the man who said S 6 is arriving the day after.. if -> a then -> a either -> b or -> b other words -> ε -Now if we apply this substitution to the sentence above, we get the following sentence: abba (not regular language)

12 Are English and Other Natural Languages Regular Languages?  Partee et al. (1990)  This proof is based on a famous class of sentences with center-embedded structures (Yngve, 1960); here is a variant of these sentences: The cat likes tuna fish. The cat the dog chased likes tuna fish. The cat the dog the rat bit chased likes tuna fish. The cat the dog the rat the elephant admired bit chased likes tuna fish.  Since every fronted NP must have its associated verb, these sentences are of the form: (the + noun) n (transitive verb) n-1 likes tuna fish. A = { the cat, the dog, the rat, the elephant, the kangaroo, …} B = { chased, bit, admired, ate, befriended, …} L = x n y n-1 likes tuna fish, x Є A, y Є B

Is Natural Language Context-Free?  Non context free language  {xx | x Є {a,b}*}  a n b m c n d m  A property of these xx languages are called cross-serial dependencies.  The non-context-free nature of such languages can be shown using the pumping lemma for context-free languages.  Swiss German requires that the number of verbs requiring dative objects (hälfe) must equal the number of dative NPs (em hans) and similarly for accusatives.  L = Jan säit das mer (d’chind) n (em Hans) m es huus haend wele (laa) n (hälfe) m aastriiche.  But this language is of the form wa n b m xc n d m y, which is not context-free! So we can conclude that Swiss German is not context free.

Complexity and Human Processing  Complexity is a fact about “parsers” rather than grammars  (i) The cat likes tuna fish.  (ii) #The cat the dog the rat the elephant admired bit chased likes tuna fish  (i) The child damaged the pictures which were taken by the photographer who the professor met at the party.  (ii) #The pictures which the photographer who the professor met at the party took were damaged by the child.  The early suggestions that the complexity of human sentence processing is related to memory seem to be correct at some level; complexity in both natural and formal languages is caused by the need to keep many un- integrated things in memory