Download presentation
Presentation is loading. Please wait.
Published byMegan Cooper Modified over 6 years ago
1
Automata and Languages What do these have in common?
Copyright © Curt Hill
2
Regular Expressions The Finite State Machines that we have seen and regular expressions have equivalent power to express or recognize a language What sort of languages can they accept? Or not accept? How complicated may they be? We now detour through formal languages Copyright © Curt Hill
3
Noam Chomsky Professor emeritus of linguistics at MIT
Developed a theory of generative grammars This includes a language hierarchy AKA Chomsky-Schützenberger Hierarchy Includes recursively enumerable, context sensitive, context free and regular Copyright © Curt Hill
4
Recursively enumerable
Language Hierarchies Type 3 Regular Type 2 Context Free Type 1 Context Sensitive Type 0 Unrestricted or Recursively enumerable Copyright © Curt Hill
5
Languages and Automata
Each of these languages corresponds to machine that can accept it The weakest is a regular language, which can be accepted by a regular expression Later machines correspond to stronger languages Lets consider languages for a minute Copyright © Curt Hill
6
Formal Grammars A grammar should be able to enumerate any legal sentence Each grammar consists of four things V – a finite set of non-terminals (aka variables) T – a finite set of terminal symbols Words made up from an alphabet S – the start symbol Must be an element of V P – a set of productions Copyright © Curt Hill
7
C as an Example V – set of non-terminals T – set of terminals
Statement Declaration For-statement T – set of terminals Reserved words Punctuation Identifiers Copyright © Curt Hill
8
C example again S – Start symbol P – set of productions
Independently compilable part Program Function Constant P – set of productions Rewrite rules Start at the start symbol End at terminals Before we consider productions we must consider notation Copyright © Curt Hill
9
Copyright © 2003-2014 by Curt Hill
John Backus Principle designer of FORTRAN Substantial contributions to ALGOL 60 Designed Backus Normal Form Eventually became a functional languages proponent Turing award winner Copyright © by Curt Hill
10
Copyright © 2003-2014 by Curt Hill
BNF John Backus defined FORTRAN with a notation similar to Context Free languages independent of Chomsky in 1959 Peter Naur extended it slightly in describing ALGOL 60 Became known as BNF for Backus Normal Form or Backus Naur Form A meta-language is any language that describes another language Copyright © by Curt Hill
11
Copyright © 2003-2014 by Curt Hill
Simplest notation Form of productions: LHS ::= RHS Where: LHS is a non-terminal (context free grammars) RHS is any sequence of terminals and non-terminals, including empty A common alternative to ::= is There can be many productions with exactly the same LHS, these are alternatives If the RHS contains the LHS, the rule is recursive Copyright © by Curt Hill
12
Copyright © 2003-2014 by Curt Hill
Notation There is usually a simple way to distinguish terminals and non-terminals Rosen and others enclose non-terminals in angle brackets <if> ::= if ( <condition> ) <statement> <if> ::= if ( <condition> ) <statement> else <statement> Copyright © by Curt Hill
13
Copyright © 2003-2014 by Curt Hill
Simple extensions Some times there is an alternation symbol that allows us to only need one production with the same LHS, often the vertical bar <sign> ::= + | - Some times things enclosed in [ and ] are optional, they may be present zero or one times Some times things enclosed in { and } may be present 1 or more times Thus [{x}] allows zero or more x items Copyright © by Curt Hill
14
Copyright © 2003-2014 by Curt Hill
More The extensions are often called EBNF Syntax graphs are equivalent to EBNF These tend to be more easy to read Copyright © by Curt Hill
15
Syntax Graphs A circle represents a terminal
Reserved word or operator No further definition A rectangle represents a non-terminal For statement or expression Must be defined else where An arrow represents the path between one item and another The arrows may branch indicating alternatives Recursion is also allowed Copyright © by Curt Hill
16
Simple Expressions expression term + - term factor * / factor constant
( expression ) ident Copyright © by Curt Hill
17
Productions Productions may be represented as BNF, EBNF or syntax graphs A production is a rewrite rule We take a construction and find one way to rewrite it In parsing we go from the distinguished symbol to any real program using application of these rewrite rules Copyright © Curt Hill
18
C For Production For-statement ::= for ( expression; expression; expression) statement This contains the terminals: For ( ; ) Non-terminals Expression Statement Copyright © Curt Hill
19
Productions Again Each non-terminal should have one or more productions that define it Every non-terminal must have one or more productions Multiple productions usually signify alternation Recursion is allowed Copyright © Curt Hill
20
Recursion Productions may be recursive
Recall for-statement, here is Statement Statement ::= expression ; Statement ::= for-statement ; Statement ::= if-statement ; Statement ::= while-statement ; Statement ::= compound-statement Etc. Copyright © Curt Hill
21
Hierarchy Again Type Grammar Language Automata 3 Finite State Regular
2 Context Free Pushdown 1 Context Sensitive Linear Bounded Recursively enumerable Unrestricted Turing Machine Copyright © Curt Hill
22
How are these related? Each of these grammars are related by how productions may be constructed Regular are most restrictive Unrestricted is the least restrictive Lets compare Upper case represent non-terminals Lower case represent terminals Copyright © Curt Hill
23
Regular Grammars(3) A ::= b | A ::= bC | A ::= Cd
The production must have only one non-terminal on the left The right-hand side must be: A terminal A terminal followed by a non-terminal A non-terminal followed by a terminal May not have a terminal non-terminal terminal on right Terminal may lead or follow but not both Copyright © Curt Hill
24
Aside on Scanners The first phase of a compiler is the lexical analyzer AKA the scanner It does the following: Converts the source to a series of tokens Removes comments and white space The token stream is then used by the parser Copyright © Curt Hill
25
Scanners again A token could be: Parser inputs the stream of tokens
Any constant, usually typed Any reserved word Any punctuation mark Any identifier Parser inputs the stream of tokens The scanner will often be just a finite state machine Copyright © Curt Hill
26
Context Free(2) A ::= aNy Single non-terminal on left
Any number or arrangement of non-terminals and terminals on the right Most programming languages are largely context free The optional else in C is not These languages may be recognized by a pushdown machine Copyright © Curt Hill
27
Context Sensitive(1) x A y ::= x aNy y
Left hand side may have non-terminal surrounded by optional terminals If terminals are present on left they must also be on right Any number or arrangement of non-terminals and terminals on the right in between terminals Recognized by linear bounded Turing machine Copyright © Curt Hill
28
Unrestricted(0) Anything on left and right
Terminals and non-terminals may be replaced by combinations of terminals and non-terminals in any combination May be recognized by Turing machine Copyright © Curt Hill
29
Finally It may seem strange that langauges and automata are related but they are We find that most programming languages are context free Sometimes with small exceptions There are a number of table driven parsers for context free languages Copyright © Curt Hill
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.