LESSON 16.

LESSON 16

Overview of Previous Lesson(s)

Over View For a deterministic finite automaton M, the minimum number of states in any equivalent deterministic finite automaton is the same as the number of equivalence groups of M's states. Final states = A = {s2, s7} Non Final States = B = {s0, s1, s3, s4, s5, s6}

Over View.. The following table shows the result of applying the inputs a & b to these states. The input a leads from s1 to s5 in group B and input b leads to s2 in group A. Looking at the table we find that the input b helps us distinguish between two of the states (s1 and s6) and the rest of the states in the group since it leads to group A for these two instead of group B.

Over View... The states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set {s1, s6} and we must partition B into two groups. Now we have the groups: A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6} The next examination of where the inputs lead shows us that s3 is not equivalent to the rest of group B. Continuing this process until we cannot distinguish between the states in any group by employing our input tests, we end up with the groups: A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}

Minimizing DFA States... In view of the above theoretical definitions and results, it is easy to argue that all of the states in each group are equivalent because they all go to the same groups under the inputs a and b.

Over View... Transition Diagrams:
The behavior of a lexical analyzer can often be described by a transition diagram. These diagrams have states, each of which represents something about the history of the characters seen during the current search for a lexeme that matches one of the possible patterns. There are arrows, or transitions, from one state to another, each of which indicates the possible next input characters that cause the lexical analyzer to make that change of state.

Over View... Finite Automata:
These are a formalization of transition diagrams that include a designation of a start state and one or more accepting states, as well as the set of states, input characters, and transitions among states. Accepting states indicate that the lexeme for some token has been found. Unlike transition diagrams, finite automata can make transitions on empty input as well as on input characters.

Over View... DFA: A DFA is a special kind of finite automaton that has exactly one transition out of each state for each input symbol. Transitions on empty input are disallowed. NFA: Automata that are not DFA's are called nondeterministic. NFA's often are easier to design than are DFA's. Allow ɛ transitions.

Over View... Lex: Minimization of Finite Automata:
There is a family of software systems, including Lex and Flex, that are lexical-analyzer generators. The user specifies the patterns for tokens using an extended regular-expression notation. Lex converts these expressions into a lexical analyzer that is essentially a DFA that recognizes any of the patterns. Minimization of Finite Automata: For every DFA there is a minimum state DFA accepting the same language.

TODAY’S LESSON

Contents Syntax Analysis Introduction Context-Free Grammars
The Role of the Parser Representative Grammars Syntax Error Handling Error-Recovery Strategies Context-Free Grammars Formal Definition of a CFG Notational Conventions Derivations Parse Trees and Derivations Ambiguity

Syntax Analysis In this section we will see parsing methods that are typically used in compilers. Initially we discuss basic concepts, then techniques suitable for hand implementation, and finally algorithms that have been used in automated tools. Since programs may contain syntactic errors, we discuss extensions of the parsing methods for recovery from common errors. The syntax of programming language constructs can be specified by CFG (Backus-Naur Form) notation.

Syntax Analysis.. Grammars offer significant benefits for both language designers and compiler writers. A grammar gives a precise syntactic specification of a programming language. From certain classes of grammars, we can construct automatically an efficient parser that determines the syntactic structure of a source program. The structure imparted to a language by a properly designed grammar is useful for translating source programs into correct object code and for detecting errors.

Role of the Parser In compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the string of token names can be generated by the grammar for the source language.

Role of the Parser.. There are 03 general types of parsers for grammars: universal, top-down, and bottom-up. Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's algorithm can parse any grammar. These general methods are, however, too inefficient to use in production compilers. The methods commonly used in compilers is either top-down or bottom-up. Top-down methods build parse trees from the top (root) to the bottom (leaves), while bottom-up methods start from the leaves and work their way up to the root.

Representative Grammars
Expressions with + and * E → E + T | T T → T * F | F F → ( E ) | id This takes care of precedence, but as we saw before, gives us trouble since it is left-recursive and we did top-down parsing. So we use the next non-left-recursive grammar that generates the same language.

Representative Grammars
E → T E' E' → + T E' | ε T → F T' T' → * F T' | ε F → ( E ) | id Following ambiguous grammar will be used for illustration, but in general we try to avoid ambiguity. E → E + E | E * E | ( E ) | id This grammar does not enforce precedence and it does not specify left vs right associativity. For example, id + id + id and id * id + id each have two parse trees.

Syntax Error Handling Syntactic errors and general strategies for error recovery. Panic-mode & Phrase-level recovery. Common programming errors can occur at many different levels. Lexical errors include misspellings of identifiers, keywords, or operators - e.g., the use of an identifier elipseSize instead of ellipseSize – and missing quotes around text intended as a string. Semantic errors include type mismatches between operators and operands. An example is a return statement in a Java method with result type void.

Syntax Error Handling.. Syntactic errors include misplaced semicolons or extra or missing braces, that is, "{" or "}" As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error. Logical errors can be anything from incorrect reasoning on the part of the programmer to the use in a C program of the assignment operator = instead of the comparison operator ==.

Syntax Error Handling... The error handler in a parser has goals that are simple to state but challenging to realize: Report the presence of errors clearly and accurately. Recover from each error quickly enough to detect subsequent errors. Add minimal overhead to the processing of correct programs.

Error Recovery Strategies
Once an error is detected, how should the parser recover? The simplest approach is for the parser to quit with an informative error message when it detects the first error. Error Recovering Strategies: Trivial Approach: No Recovery Print an error message when parsing cannot continue and then terminate parsing. Panic-Mode Recovery The parser discards input until it encounters a synchronizing token. These tokens are chosen so that the parser can make a fresh beginning. Good examples are ; and }.

Error Recovery Strategies
Phrase-Level Recovery Locally replace some prefix of the remaining input by some string. Simple cases are exchanging ; with , and = with ==. Difficulties occur when the real error occurred long before an error was detected. Error Productions Include productions for common errors. Global Correction Change the input I to the closest correct input I' and produce the parse tree for I'.

CFG Grammars used to systematically describe the syntax of programming language constructs like expressions and statements. stmt --> if ( expr ) stmt else stmt A syntactic variable stmt is used to denote statements and variable expr to denote expressions. Other productions then define precisely what an expr is and what else a stmt can be. A language generated by a (context-free) grammar is called a context free language.

CFG Definition Context-free grammar (grammar ) consists of terminals, non-terminals, a start symbol, and productions. Terminals: The basic components found by the lexer. They are sometimes called token names, i.e., the first component of the token as produced by the lexer. Non-terminals: Syntactic variables that denote sets of strings. The sets of strings denoted by non-terminals help define the language generated by the grammar

CFG Definition.. Start Symbol: Productions:
A non-terminal that forms the root of the parse tree. Conventionally, the productions for the start symbol are listed first. Productions: The productions of a grammar specify the manner in which the terminals and non-terminals can be combined to form strings. Each production consists of: A nonterminal called the head or left side of the production, this production defines some of the strings denoted by the head.

CFG Definition.. The symbol  Sometimes ::= has been used in place of the arrow. A body or right side consisting of zero or more terminals and non-terminals. The components of the body describe one way in which strings of the non-terminal at the head can be constructed.

CFG Definition.. Ex Grammar Terminals: id + - * / ( )
Non-Terminals: expression, term, factor Start Symbol: expression

Notational Conventions
Notational conventions for grammars: (used in this course) These symbols are terminals: (a) Lowercase letters early in the alphabet, such as a, b, c. (b) Operator symbols such as +, *, and so on. (c) Punctuation symbols such as parentheses, comma, and so on. (d) The digits 0, 1, , 9. (e) Boldface strings such as id or if, each of which represents a single terminal symbol.

Notational Conventions..
These symbols are non-terminals: (a) Uppercase letters early in the alphabet, such as A, B, C. (b) The letter S, which, when it appears, is usually the start symbol. (c) Lowercase, italic names such as expr or stmt. (d) When discussing programming constructs, uppercase letters may be used to represent non-terminals for the constructs. For example, non-terminals for expressions, terms, and factors are often represented by E, T, and F, respectively.

Notational Conventions…
Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either non-terminals or terminals. Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly empty) strings of terminals. Lowercase Greek letters, represents (possibly empty) strings of grammar symbols. A set of productions A  α1 , A  α2 ,…, A  αk with a common head A (call them A-productions) , may be written as A  α1 | α2 ,…, |αk

Notational Conventions…
The grammar we defined earlier using notations:

Derivations Assume we have a production A → α.
We would then say that A derives α and write A ⇒ α We generalize this. If, in addition, β and γ are strings, we say that βAγ derives βαγ and write βAγ ⇒ βαγ We generalize further. If α derives β and β derives γ, we say α derives γ and write α ⇒* z Means drives in zero or more steps.

Derivations Formal definition of zero or more definitions:
α ⇒* α, for any string α. If α ⇒* β and β ⇒ γ, then α ⇒* γ. If S is the start symbol and S ⇒* α, we say α is a sentential form of the grammar. A sentential form may contain non-terminals and terminals. If it contains only terminals it is a sentence of the grammar and the language generated by a grammar G, L(G), is the set of sentences. Two grammars generating the same language are called equivalent.

Derivations Ex: E → E + E | E * E | ( E ) | id
We see that id + id is a sentence. Indeed it can be derived in two ways from the start symbol E. E ⇒ E + E ⇒ id + E ⇒ id + id E ⇒ E + E ⇒ E + id ⇒ id + id In the first derivation, we replaced the leftmost non-terminal by the body of a production having the non-terminal as head. This is called a leftmost derivation. Similarly the second derivation in which the rightmost non-terminal is replaced is called a rightmost derivation or a canonical derivation.

Thank You

LESSON 16.

Similar presentations

Presentation on theme: "LESSON 16."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LESSON 16.

Similar presentations

Presentation on theme: "LESSON 16."— Presentation transcript:

Similar presentations

About project

Feedback