Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestricted Context-sensitive.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Transformational Grammars “Colourless green ideas sleep furiously” - Noam Chomsky We might ask “Is this novel sentence (or sequence!) grammatical?” i.e.,
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
Chapter 5 Pushdown Automata
1 Pushdown Automata (PDA) Informally: –A PDA is an NFA-ε with a stack. –Transitions are modified to accommodate stack operations. Questions: –What is a.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Pushdown Automata Chapter 12. Recognizing Context-Free Languages We need a device similar to an FSM except that it needs more power. The insight: Precisely.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Introduction to Computability Theory
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Transformational grammars
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Pushdown Automata.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
Pushdown Automata (PDAs)
Grammars CPSC 5135.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Push-down Automata Section 3.3 Fri, Oct 21, 2005.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Context-Free and Noncontext-Free Languages Chapter 13 1.
Chapter 7 Pushdown Automata
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.2: Pushdown Automata) Prof. Karen Daniels, Fall 2010 with acknowledgement.
Lecture 16b Turing Machines Topics: Closure Properties of Context Free Languages Cocke-Younger-Kasimi Parsing Algorithm June 23, 2015 CSCE 355 Foundations.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
Theory of Computation Automata Theory Dr. Ayman Srour.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
CS6800 Advance Theory of Computation Spring 2016 Nasser Alsaedi
Lecture 11  2004 SDU Lecture7 Pushdown Automaton.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Pushdown Automata.
Context-Free Grammars: an overview
Theory of Languages and Automata
Table-driven parsing Parsing performed by a finite state machine.
Linear Bounded Automata LBAs
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
Context Sensitive Languages and Linear Bounded Automata
PUSHDOWN AUTOMATA. PUSHDOWN AUTOMATA Hierarchy of languages Regular Languages  Finite State Machines, Regular Expression Context Free Languages 
PZ03A - Pushdown automata
4 (c) parsing.
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
CSE322 The Chomsky Hierarchy
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
Jaya Krishna, M.Tech, Assistant Professor
CHAPTER 2 Context-Free Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 2 Context-Free Language - 01
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
The Chomsky Hierarchy Costas Busch - LSU.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Normal Forms for Context-free Grammars
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

Transformational Grammars The Chomsky hierarchy of grammars Context-free grammars describe languages that regular grammars can’t Unrestricted Context-sensitive Context-free Regular Slide after Durbin, et al., 1998

Limitations of Regular Grammars Regular grammars can’t describe languages where there are long-distance interactions between the symbols! two classic examples are palindrome and copy languages: Regular language : a b a a a b Palindrome language: a a b b a a Copy language: a a b a a b Yes, OK. Regular grammars can produce palindromes. But you can’t design one that produces only palindromes! Illustration after Durbin, et al., 1998

Context-Free Grammars Symbols and Productions (A.K.A “rewriting rules”) Like regular grammars are defined by their set of symbols and the production rules for manipulating strings consisting of those symbols There are still only two types of symbols: Terminals (generically represented as “a” ) these actually appear in the final observed string (so imagine nucleotide or amino acid symbols) Non-terminals (generically represented as “W” ) abstract symbols – easiest to see how they are used through example. The start state (usually shown as “S” ) is a commonly used non-terminal The difference arises from the allowable types of production

Context-free Grammars Symbols and Productions (A.K.A “rewriting rules”) The left-hand side must still be just a non-terminal, but the right-hand side can be any combination of terminals and non-terminals W→ aW W→ abWa W→ abW W→ WW W→ aWa W→ aWb W→ aabb W→  These are just examples of some possible valid productions

Context-free Grammars Symbols and Productions (A.K.A “rewriting rules”) W = {S = “Start”} a = { a,b } S→ aSaS→ bSb S→ aaS→ bb As before, we start with S then repeatedly choose any of the valid productions, with the non-terminal S being replaced each time by the string on the right hand side of the production we’ve chosen… Here’s the minimal CFG that produces palindromes:

Context-free Grammars Symbols and Productions (A.K.A “rewriting rules”) W = {S = “Start”} a = { a,b,  } S→ aSa|bSb|aa|bb Or, with an explicit end state: S→ aSa|bSb|  S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabaabaa Here’s the minimal CFG that produces palindromes: Here’s one possible sequence of productions: Note that the sequence now grows from outside in, rather than from left to right!!

A CFG for RNA stem-loops A A C A C A G A G A G A GC UA GxC AU CG CxU CG GC GxG Figure after Durbin, et al., 1998 RNA secondary structure imposes nested pairwise constraints similar to those of a palindrome language Seq1 Seq2 Seq3 Seq1 C A G G A A A C U G Seq2 G C U G C A A A G C

A CFG for RNA stem-loops A A C A C A G A G A G A GC UA GxC AU CG CxU CG GC GxG Figure after Durbin, et al., 1998 Sequences that violate the constraints would be rejected Seq1 Seq2 Seq3 Seq3 G C G G C A A C U G

A CFG for RNA stem-loops A A C A C A G A G A G A GC UA GxC AU CG CxU CG GC GxG S → aW 1 u | cW 1 g | gW 1 c | uW 1 a W 1 → aW 2 u | cW 2 g | gW 2 c | uW 2 a W 2 → aW 3 u | cW 3 g | gW 3 c | uW 3 a W 3 → gaaa | gcaa Seq1 Seq2 Seq3 A context-free grammar specifying stem loops with a three base-pair stem and either a GAAA or GCAA loop W = {S = “Start”, W 1, W 2, W 3 } a = {a,c,g,u}

Context-free grammars are parsed with push-down automata Proviso: Push-down automata generally only practical with deterministic CFG!! The PDA faces a combinatorial explosion if confronted with a non-deterministic CGF with non-trivial problem size… but we can brute-force small N Grammar Parsing automaton Regular grammar Context-free grammar Context-sensitive grammar Unrestricted grammar Finite State automaton Push-down automaton Linear bounded automaton Turing machine

A Push-Down Automaton An RNA stem-loop considered as a sequence of states? W1S The regular grammar / finite state automaton paradigm will not work!! W2W3  S → aW 1 u | cW 1 g | gW 1 c | uW 1 a W 1 → aW 2 u | cW 2 g | gW 2 c | uW 2 a W 2 → aW 3 u | cW 3 g | gW 3 c | uW 3 a W 3 → gaaa | gcaa

Push-Down Automaton Parse trees are the most useful way to depict PDA S → aW 1 u | cW 1 g | gW 1 c | uW 1 a W 1 → aW 2 u | cW 2 g | gW 2 c | uW 2 a W 2 → aW 3 u | cW 3 g | gW 3 c | uW 3 a W 3 → gaaa | gcaa W1 S W2 W3 G C C G C A A G G C This depiction suggests a stack based method for parsing…

Python focus – stacks Python lists have handy stack-like methods! myStack = [] # creates an empty list myStack.append(someObject) # “push” otherObject = myStack.pop() # “pop” Remember, the stack is a “First-In, Last-Out” (FILO) data structure How is FILO relevant to context-free grammars?

Python focus – stacks Python exception handling may be convenient: try: otherObject = myStack.pop() # “pop” except indexError: # means myStack was empty! # accepting the input sequence return self.return_string We’ll introduce exception handling on an “as-needed” basis, but it is a very powerful and useful feature of Python Errors of various sorts each have their own internal error type. These are objects too!

Algorithm for PDA parsing Initialization: Set cur_position in sequence under test (“input sequence”) to zero Push the start state “S” onto the stack Pop a symbol off the stack stack empty? Accept!! Return string Is the symbol from the stack a terminal or non-terminal? Terminal? stack symbol matches symbol at cur_position ? Yes! – accept symbol and increment cur_position No? – reject sequence, return False Non-terminal? Does symbol at cur_position + 1 have a valid production? No? – reject sequence, return False Yes! Push right side of production onto stack, rightmost symbols first Iteration: For non-deterministic, we need to consider each possible production!

PDA parsing – an example Input string: GCCGCAAGGC Stack: S S →gW 1 c Valid production:

PDA parsing – an example Input string: GCCGCAAGGC Stack: cW 1 g Accept G, move right Action: Remember, the previous production is added to the stack right-to-left!!

PDA parsing – an example Input string: GCCGCAAGGC Stack: cW 1 W 1 →cW 2 g Valid production:

PDA parsing – an example Input string: GCCGCAAGGC Stack: cgW 2 c Action: Accept C, move right

PDA parsing – an example Input string: GCCGCAAGGC Stack: cgW 2 W 2 →cW 3 g Valid production:

PDA parsing – an example Input string: GCCGCAAGGC Stack: cggW 3 c Action: Accept C, move right

PDA parsing – an example Input string: GCCGCAAGGC Stack: cggW 3 W 3 →gcaa Valid production:

PDA parsing – an example Input string: GCCGCAAGGC Stack: cggaacg Action: Accept G, move right

PDA parsing – an example cggaacg An interlude…. If the stack has no non-terminals and corresponds to the input string....we would accept several symbols in a row. let’s skip ahead a few steps!! GCCGCAAGGC

PDA parsing – an example Input string: GCCGCAAGGC Stack: c Action: Accept C, move right

PDA parsing – an example Input string: GCCGCAAGGC Stack: Empty or  Action: Accept input string!

Push-down Automata Our stem-loop context-free grammar as a Python data structure This dict has keys that are states corresponding to the left- hand side of valid productions, and values that are lists corresponding to the right-hand side of valid productions. These again are encapsulated as tuples As with our regular grammar this is just one possible way… states = { "Start":[("A","W1","U"), ("C","W1","G"), ("G","W1","C"), ("U","W1","A")], "W1":[("A","W2","U"),("C", "W2", "G"), ("G", "W2", "C"),("U", "W2","A")], "W2":[("A","W3","U"),("C","W3", "G"), ("G", "W3", "C"),("U", "W3", "A")], "W3" : [("G", "A", "A", "A"),("G", "C", "A", "A")] }

Python focus Some possibly useful Python The in keyword can be used to test membership in a list: if my_symbol in mylist_of_terminals: # do something Reverse iterate through a list or tuple with reversed(): for element in reversed(cur_tuple): # do something Iterate by both index and item with enumerate(): for i,NT in enumerate(list_of_nucleotides): print I # first will be 0, then 1, etc. print NT # first will be A, then C, etc.