Finite State LanguagesCSE 140 - Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Formal Language, chapter 10, slide 1Copyright © 2007 by Adam Webber Chapter Ten: Grammars.
4b Lexical analysis Finite Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and DFAs COP 3402 (Summer 2014)
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
Introduction to Computability Theory
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
CS5371 Theory of Computation
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 13. LANGUAGE AND COMPLEXITY 2007 년 11 월 03 일 인공지능연구실 한기덕 Text: Speech and Language Processing Page.477 ~ 498.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
Theory of Languages and Automata
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CSCI 2670 Introduction to Theory of Computing September 13.
CS 203: Introduction to Formal Languages and Automata
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 10 Automata, Grammars and Languages.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Lecture #4 Thinking of designing an abstract machine acts as finite automata. Advanced Computation Theory.
Lecture #5 Advanced Computation Theory Finite Automata.
Computational (Cognitive) modeling of language
Non Deterministic Automata
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
COSC 3340: Introduction to Theory of Computation
Intro to Data Structures
Presentation transcript:

Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture 2: Slides 22-…

Finite State LanguagesCSE Intro to Cognitive Science2 Language Exists at Many Levels Sounds Words Sentences (utterances) Discourse (text) Dialog Combined with other modalities Etc. We will focus on a formal account of the sentence level Provides formal account of grammaticality judgments Simple yet powerful models

Finite State LanguagesCSE Intro to Cognitive Science3 A Formal Theory  We will present a mathematical theory of language.  Because of time constraints we will be somewhat informal in introducing concepts, but EVERYTHING we present can be made completely rigorous, starting from definitions and proceeding through proofs.  Strategy: First, examples from English Then, more abstract examples

Finite State LanguagesCSE Intro to Cognitive Science4 The Set of Strings over an Alphabet  Given a finite alphabet, , the set of strings over  will be denoted by  *, including the null string  Let  = { all words of English} Then  * denotes all strings of words of English, including the empty (null) string  Only some of these strings are grammatical sentences of English Let  = {a, b}. Then  * denotes all strings of a's and b's, including the empty (null) string 

Finite State LanguagesCSE Intro to Cognitive Science5 A Language over  A language L over  is a subset of  * Let L E be the set of all grammatical sentences of English L E   * is a language over  = { all words of English} Sentences in L E : John likes applesApples like John Two is greater than fourThe black cat is on the mat …. (Notation:L   * means L is a subset of  * )

Finite State LanguagesCSE Intro to Cognitive Science6 Sentences not in L E The the the John like peanuts Every student hates any course The rat the cat the dog chased bit ate the cheese (?) etc.

Finite State LanguagesCSE Intro to Cognitive Science7 Another Language Over Another  L = { a m b n | m  1,n  2 } = Set of all strings of a's and b's such that All a's precede all b's and There is at least one a and There are at least two b’s -L is a language over {a, b} * (Notation: a 2 means aa, a 2 b 3 means aabbb) (Notation {x|y} means the set of all xs such that condition y is true of those xs)

Finite State LanguagesCSE Intro to Cognitive Science8 Infinite Language from Finite Models A language over  can be finite or infinite L E : the set of all grammatical sentences of English L E is potentially infinite Finite characterization of a potentially infinite set can often be alternatively modeled by: grammar characterization machine characterization behavioral characterization

Finite State LanguagesCSE Intro to Cognitive Science9 Road Map to the Reading! We will begin with Chapters 17 and 18, returning to the general characterization of grammars (Sections 16.4 and 16.5 ) later. (Skip Section 16.3 ) Chapter 17: machine characterization of languages with finite state machines equivalent grammar characterization equivalent ‘behavioral’ characterization in terms of terminal symbols only regular expressions

Finite State LanguagesCSE Intro to Cognitive Science10 Introduction to Finite State Automata Finite State Automata (FSAs) are characterized by:  States (circles), including initial and final states  A vocabulary  (here {the, a, big, very, book, poor})  Transitions between states (arrows) An FSA accepts a language L q0 the q1 book the poor a q2 q3 q4 big very

Finite State LanguagesCSE Intro to Cognitive Science11 Another Finite State Automaton  States: K = {q0, q1} Initial state: Q0 Final states: F = {q1}  A vocabulary  a, b  (Back to English on Slide 27…) qoq1 a b b

Finite State LanguagesCSE Intro to Cognitive Science12 Another Finite State Automaton II The arrows are a graphical representation of  the transition function:   q0, a  q0 q0, b  q1 q1, b  q1 qoq1 a b b

Finite State LanguagesCSE Intro to Cognitive Science13 A Formal Definition of an FSA M We can characterize a languages L   by FSA M = ( K,   q o, F) where K is the finite set of states  is the finite input alphabet q o  Kis the initial state F  Kis the set of final states  is the transition function for each state and each input symbol,  specifies the next state of the machine  Notation: a  means a is an element of the set A.)

Finite State LanguagesCSE Intro to Cognitive Science14 The Language Accepted by an FSA Given a FSA, M, the language (i.e., the set of strings accepted by M) is defined as follows: L(M) = { w | w  * and starting with q0 and following the transitions as specified by , M reaches one of the final states}

Finite State LanguagesCSE Intro to Cognitive Science15 Definition of Finite State Language A language L is a finite state language (fsl) or a regular language if there is a FSA, M such that the language of M is L.

Finite State LanguagesCSE Intro to Cognitive Science16 The Language Accepted by our FSA? L = any number of a’s (including none) followed by at least one b = { a n b m | n  0, m  1} = a* b b* (* here means any number of repetitions including none) qoq1 a b b

Finite State LanguagesCSE Intro to Cognitive Science17 Let’s Do an Example! Let  = {0, 1}. Let L = the set of all strings of 0’s and 1’s that contain exactly two 1’s. Show that L is a finite state language! First step: L is a finite state language if…. Second step: Define such an M (For other examples, see Exercise 3, Ch. 17)

Finite State LanguagesCSE Intro to Cognitive Science18 But  Isn’t a Function in Our Example! So far:  = {a,b}   q0, a  q0 q0, b  q1 q1, b  q1 q1, a  ? qoq1 a b b

Finite State LanguagesCSE Intro to Cognitive Science19 But  Isn’t a Function in Our Example! So far:  = {a,b}   q0, a  q0 q0, b  q1 q1, b  q1 q1, a  ? Needed: a dead state qoq1 a b b

Finite State LanguagesCSE Intro to Cognitive Science20 A Fully Specified FSA with Dead States  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

Finite State LanguagesCSE Intro to Cognitive Science21 A Taste of FSA Algebra: Complements Definition: The complement of L =    - L i.e. the set of strings in    not contained in L To find the complement of an FSL L: 1.Find fully specified FSA M such that L = L(M) 2.Switch the final and non-final states! So the complements of FSLs are FSLs!

Finite State LanguagesCSE Intro to Cognitive Science22 Complements: An Example Let  = {0, 1}. Let L = {a n b m | n  0, m  1} (our old friend) What’s the complement of L??

Finite State LanguagesCSE Intro to Cognitive Science23 1. Find … FSA M such that L = L(M)  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

Finite State LanguagesCSE Intro to Cognitive Science24 2. Switch Final and Non-final States  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

Finite State LanguagesCSE Intro to Cognitive Science25 Definition: A Deterministic FSA  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b An FSA M is deterministic if  is a function, i.e. for each state and each input there is exactly one new state. The FSA’s we have considered since slide 11 are all deterministic FSAs (DFAs).

Finite State LanguagesCSE Intro to Cognitive Science26 Non-deterministic Finite Automata In a non-deterministic FSA (NFA), the transition relation  allows any number of new states for each state and each input. We will also allow transitions on no input (i.e., on the null string). A string w is accepted by a non-deterministic FSA if there is at least one state sequence (starting with the initial state) that will reach one of the final states. (Notation:  is the upper case version of 

Finite State LanguagesCSE Intro to Cognitive Science27 A Non-deterministic FSA for English… Simple noun phrases of English containing a determiner (DET) followed by a noun (N) the cat DET followed by an adjective (ADJ) the poor N only peanuts q0 DET q1 N DET ADJ  q2 q3 q4

Finite State LanguagesCSE Intro to Cognitive Science28 A Surprise! While NFAs are often convenient to use, it turns out that: For every FSA M such that M is non- deterministic there is a simple algorithm which will construct a FSA M’ such that M’ is deterministic and L(M’) = L(M) (If M’ accepts exactly the strings accepted by M, we say M’ is equivalent to M.) So NFAs are no more powerful than DFAs!!

Finite State LanguagesCSE Intro to Cognitive Science29 An Equivalent NFA and DFA  An NFA:  An equivalent DFA: q0 DET q1 N DET ADJ  q2 q3 q4 q’0 DET q’1 N q’2 q’4 q’5 N ADJ

Finite State LanguagesCSE Intro to Cognitive Science30 Back to Noun Phrases: ADJs How can we add: optional adjectives before the noun? the black cat, the beautiful black cat q0 DET q1 N DET ADJ  q2 q3 q4

Finite State LanguagesCSE Intro to Cognitive Science31 More About Noun Phrases: ADJs To add optional adjectives before the noun, add to  q1, ADJ  q1 the black cat, the beautiful black cat q0 DET q1 N DET ADJ  q2 q3 q4 ADJ

Finite State LanguagesCSE Intro to Cognitive Science32 More About Noun Phrases: ADVs How can we add: optional adverbs (ADV) on adjectives? the very old, the very very old q0 DET q1 N DET ADJ  q2 q3 q4 ADJ

Finite State LanguagesCSE Intro to Cognitive Science33 More About Noun Phrases: ADVs To add optional adverbs (ADV) on adjectives, add to  q3, ADV  q3 the very old, the very very old q0 DET q1 N DET ADJ  q2 q3 q4 ADJ ADV

Finite State LanguagesCSE Intro to Cognitive Science34 Bug! What about the very old cat?? We need to allow optional ADVs before the ADJ from q1 as well… q0 DET q1 N DET ADJ  q2 q3 q4 ADJ ADV

Finite State LanguagesCSE Intro to Cognitive Science35 Consistently adding ADVs before ADJs Why did we need to add an extra state, q5?? q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ

Finite State LanguagesCSE Intro to Cognitive Science36 Adding Prepositional Phrase Modifiers A prepositional phrase (PP) consists of a preposition (P) like in, on, above, for, near followed by a noun phrase on the dirty old matin the very old box on the mantlefor the very poor PPs can also modify NPs the black cat on the dirty old mat the very old box on the mantle

Finite State LanguagesCSE Intro to Cognitive Science37 Extending our NFA for PP modifiers The cat The cat on the mat The cat on the mat by the door in the back q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5 P P  ADV ADJ

Finite State LanguagesCSE Intro to Cognitive Science38 An NFA for Simple Sentences The dog chased the cat The young admire the old Foxes eat chickens Looks promising….. q0 DET q1 N DET ADJ  q2 q3 q4 q5 DET q6 N DET ADJ  q7 q8 q9 V V

Finite State LanguagesCSE Intro to Cognitive Science39 An NFA for Less Simple Sentences The very old man watched young brown puppies The very very poor want a good education The young puppies in the old brown box watched the cat in the corner Looks even better….. BUT …. q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ q6 DET q7 N DET ADJ  q8 q9 q10 ADV q11  ADV ADJ V V

Finite State LanguagesCSE Intro to Cognitive Science40 Bug: The NFA “loses generalizations”  For these sentences, the Subject NP states and Object NP states are duplicates…  What would a FSA for “NP gave NP to NP” look like?  The FSA model loses the generalization that NPs are NPs are NPs.. q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ q6 DET q7 N DET ADJ  q8 q9 q10 ADV q11  ADV ADJ V V

Finite State LanguagesCSE Intro to Cognitive Science41 Another FSA Bug: (17.3.2) More serious trouble: The cat died. (NP V) The cat the dog chased died. (NP NP V V) The cat the dog the rat bit chased died. (NP 3 V 3 ) The cat the dog the rat the elephant admired bit. chased died. (NP 4 V 4 ) These are all of the form NP n V n FSAs can’t generate these, as we’ll see next…

Finite State LanguagesCSE Intro to Cognitive Science42 A Language That Is Not an FSL Consider L = {a n b n | n  1 } i.e. L consists of all strings where There are an equal number of a’s and b’s, All a’s precede all b’s. L is not a fsl. FSA’s cannot count up to an arbitrary number! So English isn’t an fsl!!

Finite State LanguagesCSE Intro to Cognitive Science43 The Pumping Lemma for fsl’s (17.2.1) If L is fsl (regular) then for all sufficiently long strings w  L we have the following property: w = x u y i.e. w can be segmented into three parts, which we’ll call x, u, and y all strings of the form x u i y  L, (where u i means i copies of u, i  0 ) q0 q1q3 xy The loop involving u may include several states. u

Finite State LanguagesCSE Intro to Cognitive Science44 Showing L = {a n b n | n  1 } is not a fsl: The Pumping Lemma: If L is fsl then for all sufficiently long strings w  L: w = x u y all strings of the form x u i y  L. 1.Try locating the u segment in various places in the string a a... b b.. 2.In each case the string obtained by iterating u is not in L. 3.Hence, L is not a fsl.

Finite State LanguagesCSE Intro to Cognitive Science45 Characterizing fsl’s Using Grammars  Finite State Grammarsaka  Type 3 Grammarsaka  Right Linear Grammars  The languages generated by right linear grammars are exactly the grammars accepted by FSAs.

Finite State LanguagesCSE Intro to Cognitive Science46 An Informal Intro to FSGs S  John A A  likes B B  roasted C C  peanuts Derivation: S John A (= VP) likes B (= NP) roasted C (= N) peanuts

Finite State LanguagesCSE Intro to Cognitive Science47 Finite State Grammars A finite state grammar G = ( V T, V N, S, R) consists of V T the terminal vocabulary V N the non-terminal vocabulary Sthe start symbol, S  V N Ra finite set of rewrite rules (productions) The rewrite rules are of the following form A  a B where A, B  V N and a  V T A  a

Finite State LanguagesCSE Intro to Cognitive Science48 An Example FSG G = ( V T, V N, S, R) where V T = {John, roasted, peanuts, likes} V N = {S, A, B, C} R = {S  John A A  likes B B  roasted C C  peanuts}

Finite State LanguagesCSE Intro to Cognitive Science49 Derivations Derivation starts with S. Since the right hand side of a rule has at most one non- terminal there is only one non-terminal (if any) that can be rewritten at each step. Derivation stops when there no more non-terminals to be rewritten. L(G)= language derived by G= set of all strings of terminal strings derived in G starting from S.

Finite State LanguagesCSE Intro to Cognitive Science50 A Derivation in our Example FSG G = ( V T, V N, S, R) where V T = {John, roasted, peanuts, likes} V N = {S, A, B, C} R = {S  John A A  likes B B  roasted C C  peanuts} Derivation: S John A (= VP) likes B (= NP) roasted C (= N) peanuts

Finite State LanguagesCSE Intro to Cognitive Science51 The Equivalence of FSGs and fsa’s We can construct an FSG G given an FSA M: 1.Treat the states of M as the non-terminals (treat K as V N ). 2.Treat the vocabulary of M as the terminals. (treate  as V T ). 3.For transition from state A to state B on input symbol a create a rule A  a. 4.For a transition from state A to a final state of M on the input symbol a corresponds to the rule A  a.

Finite State LanguagesCSE Intro to Cognitive Science52 More to come….