Languages, grammars, and regular expressions

Slides:



Advertisements
Similar presentations
Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
Advertisements

Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
CFGs and PDAs Sipser 2 (pages ). Long long ago…
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Regular Grammars Formal definition of a regular expression.
CFGs and PDAs Sipser 2 (pages ). Last time…
CS5371 Theory of Computation
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
Finite state automaton (FSA)
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
CS 490: Automata and Language Theory Daniel Firpo Spring 2003.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Normal forms for Context-Free Grammars
Chapter 3: Formal Translation Models
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
Languages and Machines Unit two: Regular languages and Finite State Automata.
Chapter 2 Languages.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Languages & Strings String Operations Language Definitions.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
Finite-State Machines with No Output
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
L ECTURE 1 T HEORY OF A UTOMATA. P RAGMATICS  Pre-Requisites  No Pre-Requisite  Text book  Introduction to Computer Theory by Daniel I.A. Cohen 
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
1 Chapter 1 Introduction to the Theory of Computation.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Introduction to Language Theory
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
9.7: Chomsky Hierarchy.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Formal Languages and Grammars
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Lecture 6: Context-Free Languages
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Lecture 1 Theory of Automata
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
Formal Language Theory
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 1 Introduction to the Theory of Computation
Presentation transcript:

Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/05/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Unit #1 Formal grammar, language and regular expression Finite-state automaton (FSA) Finite-state transducer (FST) Morphological analysis using FST

Regular expression Two concepts: Regular expression in formal language theory Regular expression (or pattern) in pattern matching: it is a way of expressing a pattern for the purpose of matching a string Both concepts describe a set of strings. The two concepts are closely related, but the latter is often more expressive than the former.

Outline Formal languages Regular languages Context-free languages Regular expression in formal language theory Formal grammars Regular grammars Context-free grammars Regular expression in pattern matching

Formal languages

Definition of formal language An alphabet is a finite set of symbols: Ex: § = {a, b, c} A string is a finite sequence of symbols from a particular alphabet juxtaposed: Ex: the string “baccab” Ex: empty string ² A formal language is a set of strings defined over some alphabet. Ex1: {aa, bb, cc, aaaa, abba, acca, baab, bbbb, ….} Ex2: {an bn | n > 0} Ex3: the empty set Á

Definition of regular languages The class of regular languages over an alphabet § is formally defined as: The empty set, Á, is a regular language 8 a 2 § [ {²}, {a} is a regular language. If L1 and L2 are regular languages, then so are: (a) L1 ² L2 = {xy | x 2 L1; y 2 L2} (concatenation) (b) L1  L2 (union or disjunction) (c) L1* = {x1 x2 …xn | xi 2 L1 , n 2 N} (Kleene closure) There are no other regular languages.

Kleene star Another way to define L*: L2 = L ² L Ln = Ln-1 ² L Examples: L = {a, bc} L2 = {aa, abc, bca, bcbc} L* = {abcbca, ….} = { (a|bc)*}

Properties Regular languages are closed under Concatenation Union Kleene closure Regular languages are also closed under: Intersection: L1 Å L2 Difference: L1 – L2 Complementation: §* - L1 Reversal

Are the following languages regular? {a, aa, aaa, ….} Any finite set of strings {xy | x2 §*, and y is the reverse of x} {xx | x 2 §*} {an bn | n 2 N} {an bn cn | n 2 N}

Regular expression

Definition of Regular expression (as in formal language theory) The set of regular expressions is defined as follows: (1) Every symbol of is a regular expression (2) ² is a regular expression (3) If r1 and r2 are regular expressions, so are (r1), r1 r2, r1 | r2 , r1* (4) Nothing else is a regular expression.

Examples ab*c a (0|1|2|..|9)* b (CV | CCV)+ C?C?: C is a consonant, V is a vowel Other operations that we can use: a+ = a a* a? = (a | ²)

Relation between regular language and Regex They are equivalent: With every regular expression we can associate a regular language. Conversely, every regular language can be obtained from a regular expression. Examples: Regular expression = ab*c Regular language = {ac, abc, abbc, ….}

Formal grammars

Definition of formal grammar A formal grammar is a concise description of a formal language. It is a (N, §, P, S) tuple: A finite set N of nonterminal symbols A finite set Σ of terminal symbols that is disjoint from N A finite set P of production rules, each of the form: (§  N)* N (§  N)* ! (§  N)* A distinguished symbol S 2 N that is the start symbol

Chomsky hierarchy The left-hand side of a rule must contain at least one non-terminal. ®, ¯, ° 2 (N  §)*, A,B 2 N, a 2 § Type 0: unrestricted grammar: no other constraints. Type 1: Context-sensitive grammar: The rules must be of the form: ® A ¯ ! ® ° ¯ Type 2: Context-free grammar (CFGs): The rules must be of the form: A ! ® Type 3: Regular grammar: The rules are of the forms: right regular grammar: A ! a, A ! aB, or A ! ² left regular grammar: A ! a, A ! Ba, or A ! ² Are there other kinds of grammars?

Strings generated from a grammar The rules are: S → x | y | z | S + S | S - S | S * S | S/S | (S) What strings can be generated? A grammar is ambiguous if there exists at least one string which has multiple parse trees. Is this grammar ambiguous?

Languages generated by grammars Given a grammar G, L(G) is the set of strings that can be generated from G. Ex: G = (N, §, P, S) N = {S}, § = {a, b, c} P = { S ! a S b, S ! c } What is L(G)? L(G) = {an c bn}

The relation between regular grammars and regular languages The regular grammars describe exactly all regular languages. All the following are equivalent: Regular languages Regular grammars Regular expression Finite state automaton (FSA)

Relation between grammars and languages (from wikipedia page)** Chomsky hierarchy Grammars Languages Minimal automaton Type-0 Unrestricted Recursively enumerable Turing machine n/a (no common name) Recursive Decider Type-1 Context-sensitive Linear-bounded Indexed Nested stack Tree-adjoining Mildly context-sensitive Thread Type-2 Context-free Nondeterministic pushdown Deterministic context-free Deterministic pushdown Type-3 Regular Finite state

How about human languages? Are they formal languages? What is alphabet? What is string? What type of formal languages are they? crossing dependency: N1 N2 V1 V2

Outline Formal language Regular expression in formal language theory Regular language Regular expression in formal language theory Formal grammar Regular grammar Patterns in pattern matching  J&M 2.1

Patterns in Perl [ab] a|b . match any character ^ the starting position in a string $ the ending position in a string (..) defines a marked subexpression a* match “a” zero or more times a+ match “a” one or more time a? match “a” zero or one time a{n,m} “a” appears n to m times

Special symbols in the patterns \s match any whitespace char \d match any digit \w match any letter or digit \S match any non-whitespace char … \+, \-, \., \?, \*, …

Examples Integer: (\+|\-)?\d+ Real: (\+|\-)?\d+\.\d+ Scientific notation: (\+|\-)? \d+ (\.\d+)?e (\+|\-)?\d+ Any of the three: (\+|\-)? \d+ (\.\d+)? (e (\+|\-)?\d+)?

Patterns in Perl and Regex /^(.*)\1$/  { xx | x 2 §*} /^(.+)a(.+)\1\2$/  {xayxy | x, y 2 §*}  The extra power comes from the ability to refer to marked subexpression.

Outline Formal language Regular expression in formal language theory Regular language Regular expression in formal language theory Formal grammars Regular grammars Regex patterns in pattern matching