241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and.

241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and writing compilers; – –to show the connection between REs and grammars 241-303, Semester 1 2014-2015 8. Grammars

241-303 Discrete Maths: Grammars/8 2 Overview 1. Why Grammars? 2. Languages 3.Using a Grammar 4.Parse Trees 5.Ambiguous Grammars 6.Top-down and Bottom-up Parsing continued

241-303 Discrete Maths: Grammars/8 3 7.Building Recursive Descent Parsers 8.Making the Translation Easy 9.Building a Parse Tree 10.Kinds of Grammars 11.From RE to a Grammar 12.Context-free Grammars vs. REs

241-303 Discrete Maths: Grammars/8 4 1. Why Grammars? Grammars are the standard way of defining programming languages. Tools exist for translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR) – –this saves weeks of work

241-303 Discrete Maths: Grammars/8 5 2. Languages We use a natural language to communicate – –its grammar rules are very complex – –the rules don’t cover important things We use a formal language to define a programming language – –its grammar rules are fairly simple – –the rules cover almost everything continued

241-303 Discrete Maths: Grammars/8 6 A formal language is a set of legal strings. The strings are legal if they correctly use the language’s alphabet and grammar rules. The alphabet is often called the language’s terminal symbols (or terminals).

241-303 Discrete Maths: Grammars/8 7 Example 1 Alphabet (terminals) = {1, 2, 3} Using the grammar rules, the language is: L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33} L1 is the set of strings of length 2. not shown here; see later

241-303 Discrete Maths: Grammars/8 8 Example 2 Terminals = {1, 2, 3} Using different grammar rules, the language is: L2 = { 111, 222, 333} L2 is the set of strings of length 3, where all the terminals are the same.

241-303 Discrete Maths: Grammars/8 9 Example 3 Terminals = {1, 2, 3} Using different grammar rules, the language is: L3 = {2, 12, 22, 32, 112, 122, 132,...} L3 is the set of strings whose numerical value is divisible by 2.

241-303 Discrete Maths: Grammars/8 10 3. Using a Grammar A grammar is a notation for defining a language, and is made from 4 parts: – –the terminal symbols – –the syntactic categories (nonterminal symbols) e.g. statement, expression, noun, verb – –the grammar rules (productions) e,g, A => B1 B2... Bn – –the starting nonterminal the top-most syntactic category for this grammar continued

241-303 Discrete Maths: Grammars/8 11 We define a grammar G as a 4-tuple: G = (T, N, P, S) – –T = terminal symbols – –N = nonterminal symbols – –P = productions – –S = starting nonterminal

241-303 Discrete Maths: Grammars/8 12 3.1. Example 1 Consider the grammar: T = {0, 1} N = {S, R} P = {S => 0 S => 0 R R => 1 S } S is the starting nonterminal the right hand sides of productions usually use a mix of terminals and nonterminals

241-303 Discrete Maths: Grammars/8 13 Is “01010” in the language? Start with a S rule: – –RuleString Generated --S S => 0 R0 R R => 1 S0 1 S S => 0 R0 1 0 R R => 1 S0 1 0 1 S S => 00 1 0 1 0 No more rules can be applied since there are no more nonterminals left in the string. Yes, it is in the language.

241-303 Discrete Maths: Grammars/8 14 Example 2 Consider the grammar: T = {a, b, c, d, z} N = {S, R, U, V} P = {S => R U z | z R => a | b R U => d V U | c V => b | c } S is the starting nonterminal

241-303 Discrete Maths: Grammars/8 15 The notation: X => Y | Z is shorthand for the two rules: X => Y X => Z Read ‘|’ as ‘or’.

241-303 Discrete Maths: Grammars/8 16 Is “adbdbcz” in the language? RuleString Generated --S S => R U zR U z R => aa U z U => d V Ua d V U z V => ba d b U z U => d V Ua d b d V U z V => ba d b d b U z U => ca d b d b c z Yes! This grammar has choices about how to rewrite the string.

241-303 Discrete Maths: Grammars/8 17 Is “abdbcz” in the language? RuleString Generated --S S => R U zR U z R => aa U z which U rule? U must be replaced by something beginning with a ‘b’, but the only U rule is: U => d V U | c No

241-303 Discrete Maths: Grammars/8 18 3.2. BNF BNF is a shorthand notation for productions – –Backus Normal Form, or – –Backus-Naur Form We have already used ‘|’: X => Y1 | Y2 |... | Yn continued

241-303 Discrete Maths: Grammars/8 19 X => Y [Z] is shorthand for two rules: X => Y X => Y Z [Z] means 0 or 1 occurrences of Z. continued

241-303 Discrete Maths: Grammars/8 20 X => Y { Z } is shorthand for an infinite number of rules: X => Y X => Y Z X => Y Z Z X => Y Z Z Z : { Z } means 0 or more occurrences of Z.

241-303 Discrete Maths: Grammars/8 21 3.3. A Grammar for Expressions Consider the grammar: T = { 0, 1, 2,..., 9, +, -, *, /, (, ) } N = { Expr, Number } P = {Expr => Number Expr => ( Expr ) Expr => Expr + Expr | Expr - Expr | Expr * Expr | Expr / Expr } Expr is the starting nonterminal

241-303 Discrete Maths: Grammars/8 22 Defining Number The RE definition for a number is: number = digit digit* digit = [0-9] The productions for Number are: Number => Digit { Digit } Digit => 0 | 1 | 2 | 3 | … | 9 or Number => Number Digit | Digit Digit => 0 | 1 | 2 | 3 |... | 9

241-303 Discrete Maths: Grammars/8 23 Using Productions Expand Expr into (125-2)*3 Expr => Expr * Expr => ( Expr ) * Expr => ( Expr - Expr ) * Expr => ( Number - Number ) * Number : => ( 125 - 2 ) * 3 continued

241-303 Discrete Maths: Grammars/8 24 Expand Number into 125 Number => Number Digit => Number Digit Digit => Digit Digit Digit => 1 2 5

241-303 Discrete Maths: Grammars/8 25 3.4. Grammars are not Unique Two grammars that do the same thing: Balanced =>  Balanced => ( Balanced ) Balanced and: Balanced =>  Balanced => ( Balanced ) Balanced => Balanced Balanced Both generate the same strings: (()(()))()  (()())

241-303 Discrete Maths: Grammars/8 26 3.5. Productions for parts of C Control structures: Statement => while ( Cond ) Statement Statement => if ( Cond ) Statement Statement => if ( Cond ) Statement else Statement Testing (conditionals): Cond => Expr Expr |... continued

241-303 Discrete Maths: Grammars/8 27 Statement blocks: Statement => ‘{‘ StatList ‘}’ StatList => Statement ; StatList | Statement ;

241-303 Discrete Maths: Grammars/8 28 Using the Statement Production Statement => while ( Cond ) Statement => while ( Expr while ( Expr while ( Expr while (x < 10) { y++; x++; } This example requires an extra Expr production for variables: Expr => VariableName

241-303 Discrete Maths: Grammars/8 29 3.6. Generating a Language For a given grammar, what strings can it generate? – –the language is the set of legal strings Most languages contain an infinite number of strings (e.g. English) – –but there is a process for generating them continued

241-303 Discrete Maths: Grammars/8 30 For each production, list the strings that can be derived immediately. On the 2nd round, put those strings back into the productions to generate more strings. On the 3rd round, put those strings back... Continue for as many rounds as you want.

241-303 Discrete Maths: Grammars/8 31 Example Consider the grammar: T = { w, c, s, ‘{‘, ‘}’, ‘;’ } N = { S, L } P = {S => w c S | ‘{‘ L ‘}’ | s ‘;’ L => L S |  } S is the starting nonterminal

241-303 Discrete Maths: Grammars/8 32 Strings in First 3 Rounds SL Round 1:s;  Round 2:wcs; {} s; Round 3:wcwcs; wc{} {s;} wcs; {} s;s; s;wcs; s;{}

241-303 Discrete Maths: Grammars/8 33 4. Parse Trees A parse tree is a graphical way of showing how productions are used to generate a string. Data structures representing parse trees are used inside compilers to store information about the program being compiled.

241-303 Discrete Maths: Grammars/8 34 Example 1 Consider the grammar: T = { a, b } N = { S } P = { S => S S | a S b | a b | b a } S is the starting nonterminal

241-303 Discrete Maths: Grammars/8 35 Parse Tree for “aabbba” The root of the tree is the start symbol S: S Expand using S => S S S S S Expand using S => a S b continued expand the symbol in the circle

241-303 Discrete Maths: Grammars/8 36 S S S S a b Expand using S => a b S S S S a b ab Expand using S => b a continued

241-303 Discrete Maths: Grammars/8 37 S S S a b ab S ba Stop when there are no more nonterminals in leaf positions. Read off the string by reading the leaves left to right.

241-303 Discrete Maths: Grammars/8 38 Example 2 Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = {E => T | T + E T => F | F * T F => a | ( E )} E is the starting nonterminal

241-303 Discrete Maths: Grammars/8 39 Is “a+a*a” in the Language? E Expand using E => T + E E +E T Expand using T => F E +E F T continued

241-303 Discrete Maths: Grammars/8 40 Continue expansion until: E +E T FT a F *T aF a

241-303 Discrete Maths: Grammars/8 41 5. Ambiguous Grammars A grammar is ambiguous when a string can be represented by more than one parse tree – –it means that the string has more than one “meaning” in the language e.g. a variant of the last grammar example: P = { E => E + E | E * E | ( E ) | a }

241-303 Discrete Maths: Grammars/8 42 Parse Trees for “a+a*a” E E+E aE *E aa and E E +E aa E *E a continued

241-303 Discrete Maths: Grammars/8 43 The two parse trees allow a string like “5+5*5” to be read in two different ways: – –5+ 25(the left hand tree) – –10*5(the right hand tree)

241-303 Discrete Maths: Grammars/8 44 Why is Ambiguity Bad? In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it. e.g in C: x = 5 + 5 * 5; // what is the value in x?

241-303 Discrete Maths: Grammars/8 45 6. Top-down and Bottom-up Parsing Top-down parsing creates a parse tree starting from the start symbol and moves down towards the leaves. – –used in most compilers – –usually implemented as recursive-descent parsing continued

241-303 Discrete Maths: Grammars/8 46 Bottom-up parsing creates a parse tree starting from the leaves, and moves up towards the start symbol. – –productions are used in ‘reverse’ Both kinds of parsing often require “guessing” to decide which productions to use to parse a string.

241-303 Discrete Maths: Grammars/8 47 Example Consider the grammar: T = { a, +, *, (, ) } N = { E, T, F } P = {E => T | T + E T => F | F * T F => a | ( E )} E is the starting nonterminal

241-303 Discrete Maths: Grammars/8 48 Top-down Parse of “a+a*a” E +E T FT a F *T aF a Top-down

241-303 Discrete Maths: Grammars/8 49 Bottom-up Parse of “a+a*a” E +E T FT a F *T aF a Bottom-up

241-303 Discrete Maths: Grammars/8 50 Guessing when Building Guessing occurs when there are several rules which can apply to the current nonterminal. Compilers are very bad at guessing, and so program language designers try to make grammars as simple as possible.

241-303 Discrete Maths: Grammars/8 51 Guessing in Bottom-up The compiler must backtrack to an earlier point and try a different rule. E T F a E T F a E T F a+ * STUCK !

241-303 Discrete Maths: Grammars/8 52 7. Building Recursive Descent Parsers The parser will read a string as input, and test if it fits the grammar. parser continued input string e.g. "a+a*a" checks input against the grammar output is: "yes" or "no"

241-303 Discrete Maths: Grammars/8 53 In section 9 we will add the ability to generate a parse tree. continued parser input string e.g. "a+a*a" checks input against the grammar output is: "no" or a tree:

241-303 Discrete Maths: Grammars/8 54 The parser will be coded in 2 steps: – –1) Convert the grammar into syntax graphs – –2) Convert the syntax graphs into code grammar converted to syntax graphs converted to parser The pay-off is that a programmar writes high-level grammar rules instead of complex code.

241-303 Discrete Maths: Grammars/8 55 7.1. What is a Syntax Graph? A syntax graph is a graphical representation of a grammar – –easier to manipulate than grammars For example: P = {A => x | ( B ) B => A C C => { + A }} Valid strings: x(x)(x+x+x) Remember that { R } means 0 or more R’s

241-303 Discrete Maths: Grammars/8 56 Graphs for A, B, and C B( ) x A AC B A+ C choice point choice point

241-303 Discrete Maths: Grammars/8 57 Meanings The input string is processed by following the graph for the top-most symbol in the grammar. A circle means: – –if the current input character is a "x" then continue by reading the next input character, otherwise reject the input string. x continued

241-303 Discrete Maths: Grammars/8 58 A box means: – –that the current input should be processed by the B syntax graph. It's like 'calling' the B graph to do the work. B

241-303 Discrete Maths: Grammars/8 59 Choice Points Make Things Hard The graph must decide which path to take when execution reaches a choice point. – –"deciding" can be hard continued ( x choice point

241-303 Discrete Maths: Grammars/8 60 General solution is to lookahead in the graph: – –e.g. if the current input character is a "(" then go along the path that looks for a "(" next

241-303 Discrete Maths: Grammars/8 61 For lookahead to be fast, it should be possible to decide which path to take by looking at only the next graph symbol. ( x choice point This is possible if all the paths start with circles. e.g. the top-path wants a "(", the bottom path wants a "x".

241-303 Discrete Maths: Grammars/8 62 7.2. From Grammar to Syntax Graph There are 6 translation rules to convert a grammar into a syntax graph.

241-303 Discrete Maths: Grammars/8 63 7.2.1. Translate a Production The production: A => Body is mapped to a graph labelled A. Body A The text inside the hexagon still needs to be translated to a graph.

241-303 Discrete Maths: Grammars/8 64 7.2.2. Translate a Terminal A terminal symbol x is translated to the graph: x the translation is finished

241-303 Discrete Maths: Grammars/8 65 7.2.3. Translate a Nonterminal A nonterminal symbol B is translated to the graph: B the translation is finished

241-303 Discrete Maths: Grammars/8 66 7.2.4. Translate ‘|’ A production body of the form: Body1 | Body2 |... | BodyN is translated to the graph: Body1 Body2 BodyN : The text inside the hexagons still needs to be translated to graphs.

241-303 Discrete Maths: Grammars/8 67 7.2.5. Translate a Sequence A production body of the form: Body1 Body2... BodyN is translated to the graph: Body1Body2BodyN... The text inside the hexagons still needs to be translated to graphs.

241-303 Discrete Maths: Grammars/8 68 7.2.6. Translate {...} (0 or more) A production body of the form: { Body1 } is translated to the graph: Body1 The text inside the hexagon still needs to be translated to a graph.

241-303 Discrete Maths: Grammars/8 69 7.3. Grammar to Graph Example Consider the grammar: T = { x, +, (, ) } N = { A, B, C } P = {A => x | ( B ) B => A C C => { + A }} A is the starting nonterminal

241-303 Discrete Maths: Grammars/8 70 Translate the A Rule A => x | ( B ) uses 7.2.1 to become: x | ( B ) A Use 7.2.4 to become: Use 7.2.4 to become: x ( B ) A continued

241-303 Discrete Maths: Grammars/8 71 Use 7.2.2. on the top branch: ( B ) A x Use 7.2.5. on the bottom branch: Use 7.2.5. on the bottom branch: (B) A x continued

241-303 Discrete Maths: Grammars/8 72 Use rules 7.2.2 and 7.2.3 on the bottom branch: B( ) x A

241-303 Discrete Maths: Grammars/8 73 Graphs for A, B, and C B( ) x A AC B A+ C choice point choice point

241-303 Discrete Maths: Grammars/8 74 Combining the Graphs Combine B and C graphs with A: A( A+ ) x A choice point choice point

241-303 Discrete Maths: Grammars/8 75 Two Easy Choice Points v v It is easy to decide which path to take at the two choice points. v v Each path starts with a different nonterminal. v v We can lookahead to decide which path to take.

241-303 Discrete Maths: Grammars/8 76 7.4. From Syntax Graphs to Code Each syntax graph is tranformed into a function using 6 basic transformations. main() does two things: – –reads the first input character: ch = getchar(); // ch is a global variable – –calls the function representing the starting nonterminal: A();

241-303 Discrete Maths: Grammars/8 77 7.4.1. Transform a Graph Becomes the function: void G() { /* the code generated by transforming the graph GBody */ } GBody G The graph inside the pentagon still needs to be translated to code.

241-303 Discrete Maths: Grammars/8 78 7.4.2. Transform a Terminal Becomes the code: if (ch == ‘x’) ch = getchar(); // get ch for next step else error(); // reports error then exits x check input is x; get next input;

241-303 Discrete Maths: Grammars/8 79 7.4.3. Transform a Nonterminal Becomes the function call: G1(); G1

241-303 Discrete Maths: Grammars/8 80 7.4.4. Transform a Choice Becomes a switch or multiple if statement. : GBody1 GBody2 GBodyN continued choice point

241-303 Discrete Maths: Grammars/8 81 if (ch == firstGBody1) // transformation of GBody1 ; else if (ch == firstGBody2) // transformation of GBody2 ; else if... : else if (ch == firstGBodyN) // transformation of GBodyN ; else error(); continued

241-303 Discrete Maths: Grammars/8 82 The translation tests ch to see if it is the character firstGBody1, firstGBody2, etc – –ch is the current input character firstGBody1, firstGBody2, etc. are the first terminals (circles) of the paths GBody1, GBody2, etc. These terminals must be distinct (different) – –then only one test will succeed

241-303 Discrete Maths: Grammars/8 83 7.4.5. Transform a Sequence Becomes the block: { // transformation of GBody1 ; // transformation of GBody2 ; : // transformation of GBodyN ; } GBody1GBody2GBodyN...

241-303 Discrete Maths: Grammars/8 84 7.4.6. Transform a Multiple Becomes the loop: while (ch == firstGBody1) // transformation of GBody1 ; firstGBody1 is the first terminal in GBody1. GBody1 choice point

241-303 Discrete Maths: Grammars/8 85 Two Optimising Transformations There are two other transformations for a choice and a multiple. These are optimisations when the graph is a special shape.

241-303 Discrete Maths: Grammars/8 86 7.4.7. Optimising Choice Becomes a switch or multiple if statement. : GBody1 GBody2 GBodyN continued x1 x2 xN choice point

241-303 Discrete Maths: Grammars/8 87 if (ch == ‘x1’) { ch = getchar(); // transformation of GBody1 ; } else if (ch == ‘x2’) { ch = getchar(); // transformation of GBody2 ; } else if... : else if (ch == ‘xN’) { ch = getchar(); // transformation of GBodyN ; } else error(); continued

241-303 Discrete Maths: Grammars/8 88 Here the assumption is that the terminals x1, x2, etc are all different – –this means that only 1 test will succeed

241-303 Discrete Maths: Grammars/8 89 7.4.8. Optimising Multiple Becomes the loop: while (ch == ‘x’) { ch = getchar(); // transformation of GBody1 ; } GBody1 x choice point

241-303 Discrete Maths: Grammars/8 90 Code Optimisations Sometimes the generated code can be simplified. For example: ch = getchar(); foo(); while (ch == ‘x’) { ch = getchar(); foo(); } can be rewritten as: do { ch = getchar(); foo(); while (ch == ‘x’);

241-303 Discrete Maths: Grammars/8 91 error() Function A simple error reporting function: void error() { printf(“Error while processing %c\n”,ch); exit(1); }

241-303 Discrete Maths: Grammars/8 92 7.5. Graph to Code Example The original grammar in section 7.3: T = { x, +, (, ) } N = { A, B, C } P = {A => x | ( B ) B => A C C => { + A }} A is the starting nonterminal

241-303 Discrete Maths: Grammars/8 93 Graphs for A, B, and C (again) AC B A+ C B( ) x A

241-303 Discrete Maths: Grammars/8 94 Code #include... void A(); // parse functions void B(); void C(); void error(); int ch; // holds current input char void main() { ch = getchar(); A(); printf(“parsed successfully\n”); } continued

241-303 Discrete Maths: Grammars/8 95 void A() { if (ch == ‘x’) ch = getchar(); else if (ch == ‘(‘) { ch = getchar(); B(); if (ch == ‘)’) ch = getchar(); else error(); } else error(); } continued This code has been optimised to reduce the number of calls to error().

241-303 Discrete Maths: Grammars/8 96 void B() { A(); C(); } void C() { while (ch == ‘+’) { ch = getchar(); A(); } }

241-303 Discrete Maths: Grammars/8 97 8. Making the Translation Easy The translation (syntax graphs to code) requires the grammar to have special properties. When there is a choice about which path to take through a graph, the decision should depend only on the current character and the first terminals on the paths. continued ( x

241-303 Discrete Maths: Grammars/8 98 Examples ( x the current input is 'x' execution is here The choice is easy to make. a a the current input is 'a' execution is here The choice isn't easy. ( x

241-303 Discrete Maths: Grammars/8 99 It may be possible to “convert” a grammar into a suitable form by using techniques such as: – –left recursion elimination – –left factoring

241-303 Discrete Maths: Grammars/8 100 8.1. Left Recursion Elimination Example of left recursion: L => L a d |  How many times should the L production be used to parse “ adad ”? Rearrange the grammar: L => a d L |  Such a rearrangement is not always possible.

241-303 Discrete Maths: Grammars/8 101 L ad L the current input is "a" ad L L the current input is "a" BAD... GOOD... execution is here execution is here

241-303 Discrete Maths: Grammars/8 102 Another Example Left recursive grammar: Number => Number Digit | Digit Digit => 0 | 1 | 2 | 3 |... | 9 How many times should the Number production be used to parse “123”? Rearrange to: Number => Digit Number | Digit Digit => 0 | 1 | 2 | 3 |... | 9 there’s still a problem; see next slides

241-303 Discrete Maths: Grammars/8 103 8.2. Left factoring When 2 (or more) productions begin with the same terminal or nonterminal, then which production should be used? e.g. Which X rule to use to parse “ae...”? X => a d S X => a e R continued

241-303 Discrete Maths: Grammars/8 104 a a the current input is 'a' execution is here e d X S R BAD...

241-303 Discrete Maths: Grammars/8 105 Left factoring creates a new production which represents the “tails” of the left factored rules. e.g. left factoring the X rules: X => a XTail XTail => d S | e R

241-303 Discrete Maths: Grammars/8 106 a the current input is 'a' execution is here e d X S R XTail GOOD...

241-303 Discrete Maths: Grammars/8 107 Another Example Which Number rule should be used to parse “123”? Number => Digit Number | Digit Digit => 0 | 1 | 2 | 3 |... | 9 Left factorise Number: Number => Digit NumTail NumTail => Number | 

241-303 Discrete Maths: Grammars/8 108 9. Building a Parse Tree Now we will augment the parser code of section 7 to generate a parse tree. The grammar again: T = { x, +, (, ) } N = { A, B, C } P = {A => x | ( B ) B => A C C => { + A }} A is the starting nonterminal

241-303 Discrete Maths: Grammars/8 109 9.1. Representing the Parse Trees The production: A => x | ( B ) can create two possible parse trees: A x or A ( tree for B ) continued

241-303 Discrete Maths: Grammars/8 110 The production: B => A C will create the parse tree: B tree for A tree for C continued

241-303 Discrete Maths: Grammars/8 111 The production: C => { + A } can generate an infinite number of parse trees: C  C + tree for A C + tree for A + tree for A or or...

241-303 Discrete Maths: Grammars/8 112 A Parse Tree for “(x+x+x)” A (B) AC x+A+A xx Our code will read in a string and create a parse tree data structure like this one.

241-303 Discrete Maths: Grammars/8 113 9.2. The Tree Data Structure The nodes in a parse tree can have different numbers of children. The C grammar rule can generate 1 child or any even number of children! – –2, 4, 6, 8,... continued

241-303 Discrete Maths: Grammars/8 114 struct node { char label; struct node *leftChild; struct node *rightSib; // sibling }; typedef struct node *TREE; leftChild rightSib label Tree Node Date Structure This approach allows us to have a variable number of siblings.

241-303 Discrete Maths: Grammars/8 115 9.3. Tree Building Functions A collection of building functions: – –a function to create a node with 0 children – –a function to create a node with 1 child – –a function to create a node with 2 children – –etc. The C production will require some fancy coding.

241-303 Discrete Maths: Grammars/8 116 TREE makeLeaf(char x) { TREE root = (TREE) malloc( sizeof(struct node)); root->label = x; root->leftChild = NULL; root->rightSib = NULL; return root; } continued x NN makeLeaf() creates the node:

241-303 Discrete Maths: Grammars/8 117 TREE makeNode1(char x, TREE t) { // the subtree t is supplied TREE root = makeLeaf(x); root->leftChild = t; return root; } continued x N ? ?? t makeNode1() creates the tree: ‘?’ means that makeNode1() does not care what the value is.

241-303 Discrete Maths: Grammars/8 118 TREE makeNode2(char x, TREE t1, TREE t2) { // subtrees t1 and t2 are supplied TREE root = makeNode1(x, t1); t1->rightSib = t2; return root; } x N ? ? t1 ? ?? t2 continued makeNode2() creates the tree:

241-303 Discrete Maths: Grammars/8 119 TREE makeNode3(char x, TREE t1, TREE t2, TREE t3) { // the subtrees are supplied TREE root = makeNode2(x, t1, t2); t2->rightSib = t3; return root; } x N ? ? t1 ? ? t2 ? ?? t3 This approach can be used to create makeNode4(), and so on. makeNode3() creates the tree:

241-303 Discrete Maths: Grammars/8 120 Dealing with the C production The C production can generate any even number of children: C => { + A } continued

241-303 Discrete Maths: Grammars/8 121 A C tree will be constructed in three ways: – –a C node with 1 child use makeNode1(‘C’, makeLeaf(‘e’)) – –a C node with 2 children use makeNode2() – –a C node with 4, 6, 8,... children use add2Children() repeatedly, after calling makeNode2() first ‘e’ stands for 

241-303 Discrete Maths: Grammars/8 122 TREE add2Children(TREE t, TREE t1, TREE t2) { TREE rm = rightMostChild(t); rm->rightSib = t1; t1->rightSib = t2; return t; } x N ? ? ? ? t1 ? ?? t2 t ? ? rm We will not define this function. add2Children() adds t1 and t2 to the end of t’s children (after rm):

241-303 Discrete Maths: Grammars/8 123 9.4. Parse Trees as Code A x become: A ( tree for B ) A N x NN A N ( N B) NN cells representing B tree

241-303 Discrete Maths: Grammars/8 124 B tree for A tree for C becomes: B N AC cells for C tree cells for A tree N continued

241-303 Discrete Maths: Grammars/8 125 C + tree for A + tree for A becomes: C N + N A+ N cells for A tree A cells for A tree N

241-303 Discrete Maths: Grammars/8 126 9.5. Code with Parse Tree Generation #include... struct node {... }; typedef struct node *TREE; TREE A(); // parse functions TREE B(); TREE C(); void error(); TREE makeLeaf(char x); : // other TREE building prototypes char ch; // holds current input char continued

241-303 Discrete Maths: Grammars/8 127 void main() { ch = getchar(); TREE parseTree = A(); : // use parseTree, print it, etc. } continued

241-303 Discrete Maths: Grammars/8 128 TREE A() { if (ch == ‘x’) { ch = getchar(); return makeNode1(‘A’, makeLeaf(‘x’)); } else if (ch == ‘(‘) { ch = getchar(); TREE BTree = B(); if (ch == ‘)’) { ch = getchar(); return makeNode3(‘A’, makeLeaf(‘(’), BTree, makeLeaf(‘)’) ); } else error(); } else error(); } continued

241-303 Discrete Maths: Grammars/8 129 TREE B() { Tree ATree = A(); Tree CTree = C(); return makeNode2(‘B’, ATree, CTree); } continued

241-303 Discrete Maths: Grammars/8 130 TREE C() { TREE ATree, CTree; int numLoops = 0; // times round the loop while (ch == ‘+’) { numLoops++; ch = getchar(); ATree = A(); if (numLoops == 1) // 1st time through loop CTree = makeNode2(‘C’,makeLeaf(‘+’),ATree); else // 2nd, 3rd, etc time CTree = add2Children(CTree, makeLeaf(‘+’), ATree); } if (numLoops == 0) // skipped the loop CTree = makeNode1(‘C’, makeLeaf(‘e’)); return CTree; }

241-303 Discrete Maths: Grammars/8 131 10. Kinds of Grammars There are 4 main kinds of grammar, of increasing expressive power: – –regular (type 3) grammars – –context-free (type 2) grammars – –context-sensitive (type 1) grammars – –unrestricted (type 0) grammars They vary in the kinds of productions they allow.

241-303 Discrete Maths: Grammars/8 132 10.1. Regular Grammars Every production is of the form: A => a | a B |  – –A, B are nonterminals, a is a terminal These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last. Regular grammars are equivalent to REs (and also to automata). S => wT T => xT T => a

241-303 Discrete Maths: Grammars/8 133 An Equivalence Diagram Regular Grammars REs Automata same expressive power

241-303 Discrete Maths: Grammars/8 135 10.2. Context-Free Grammars Every production is of the form: A =>  – –A is a nonterminal,  can be any number of nonterminals or terminals Most of our examples have been context- free grammars – –used widely to define programming languages – –they subsume regular grammars A => a A => aBcd B => ae

241-303 Discrete Maths: Grammars/8 136 10.3. Context-Sensitive Grammars Every production is of the form:  =>  – – ,  can contain any number of terminals and nonterminals – –  must contain at least 1 nonterminal – –size(  ) >= size(  ) – –  cannot be  continued A => a 11A => aB2d B2 => ae

241-303 Discrete Maths: Grammars/8 137 Context-sensitive rules allow the grammar to specify a context for a rewrite – –e.g. A1a0 => 1b00 – –the string 2A1a01 becomes 21b001 – –Context-sensitive grammars are more powerful than context-free grammars because of this context ability.

241-303 Discrete Maths: Grammars/8 138 Example The language: E = {012, 001122, 000111222,... } or, in brief, E = {0 n 1 n 2 n | n >= 1} can only be expressed using a context-sensitive grammar: S => 0 A 1 2 | 0 1 2 A => 0 A 1 C | 0 1 C C 1 => 1 C C 2 => 2 2

241-303 Discrete Maths: Grammars/8 139 Rewrite S to 001122 S => O A 1 2 0 A 1 2 => 0 0 1 C 1 2 0 0 1 C 1 2 => 0 0 1 1 C 2 0 0 1 1 C 2 => 0 0 1 1 2 2

241-303 Discrete Maths: Grammars/8 140 10.4. Unrestricted Grammars Every production is of the form:  =>  – – ,  can contain any number of terminals and nonterminals;  must contain at least 1 nonterminal – –no restrictions on size(  ) it may be smaller than size(  ) – –  can be  Also called phrase-structure grammars. more general than context sensitive A =>  11A => a B2 => aeA

241-303 Discrete Maths: Grammars/8 141 Example The language: E = { , 012, 001122, 000111222,... } or, in brief, E = {0 n 1 n 2 n | n >= 0} can only be expressed using an unrestricted grammar: S => 0 A 1 2 |  A => 0 A 1 C |  C 1 => 1 C C 2 => 2 2 new features

241-303 Discrete Maths: Grammars/8 142 Rewrite S to 012 S => 0 A 1 2 0 A 1 2 => 0 1 2 – –using A ==> 

241-303 Discrete Maths: Grammars/8 143 10.5. Why so many Grammar Kinds? More powerful grammars are more expressive, but also harder to implement efficiently – –a trade-off between power and implementation continued

241-303 Discrete Maths: Grammars/8 144 For example, most compilers have two grammar-based components: – –the lexical analyzer uses REs (regular grammars) to parse basic nonterminals such as identifier and number – –the syntax analyzer uses (context-free) grammars to deal with complex syntactic categories such as loops and expressions

241-303 Discrete Maths: Grammars/8 145 Lexical and Syntax Analyzers lexical analyzer syntax analyzer program text file chars: 'i' 'n' 't' ' ' 'x' '=' '4' '3' ';'... tokensint x = 43 ;.... parse tree int x=43;.... the compiler code generation

241-303 Discrete Maths: Grammars/8 146 11. From REs to Grammars It is easy to translate a RE into a context- free grammar. – –each RE operand and operator can be implemented by a grammar rule Infact, the power of context-free grammars is not needed, since REs are equivalent to regular grammars – –we translate to context-free because it is simple to do

241-303 Discrete Maths: Grammars/8 147 Operand to Production Assume that R is the regular expression, and G is the new production. OperandProduction R = xG => x R =  G =>  R = {}nothing translates to

241-303 Discrete Maths: Grammars/8 148 Operator to Production Assume that S and T are REs; G s and G t are their translation to productions. OperatorProduction R = S | TG => G s | G t R = S TG => G s G t R = S*G => G s G |  or G => { G s } translates to

241-303 Discrete Maths: Grammars/8 149 Example: translate a | bc* The RE with brackets: a | ( b ( c* ) ) Translate the operands: A => a B => b C => c – –the nonterminals A, B, C are invented continued

241-303 Discrete Maths: Grammars/8 150 Translate the operators in precedence order. Translate c*: CStar => C CStar |  Translate b c* BCStar => B CStar Translate a | b c* S => A | BCStar The CStar, BCStar, and S nonterminals are invented.

241-303 Discrete Maths: Grammars/8 151 The complete grammar: T = { a, b, c } N = { S, BCStar, CStar, A, B, C } P = {S => A | BCStar BCStar => B CStar CStar=> C CStar |  A => a B => b C => c} S is the starting nonterminal These rules can be simplified.

241-303 Discrete Maths: Grammars/8 152 Rules Simplification Substitute in the right hand sides for the A, B, and C rules: P = { S => a | BCStar BCStar => b CStar CStar => c CStar |  } Substitute in the right hand side for BCStar: P = { S => a | b CStar CStar => c CStar |  }

241-303 Discrete Maths: Grammars/8 153 12. Context-free Grammars vs. REs REs (and automata) are equivalent to regular grammars – –they can be used for all the same problems Every production in a regular grammar is right linear: A => a | a B |  – –A, B are nonterminals, a is a terminal continued

241-303 Discrete Maths: Grammars/8 154 This means that a regular grammar (and also REs, automata) can not be used to express most context-free grammars, or any context-sensitive or unrestricted grammars. REs are less powerful than context-free grammars.

241-303 Discrete Maths: Grammars/8 155 Example Context-free grammar: S => 0 1 | 0 S 1 – –it defines the language E = { 0 n 1 n | n >= 1} The S production is not right linear, so a RE cannot be used to model the language E. S is not at the end.

241-303 Discrete Maths: Grammars/8 156 12.1. Proof Using Automata Proving that REs are weaker than context- free grammars is easiest if we prove that automata are weaker than context-free grammars – –remember that REs are equivalent to automata continued

241-303 Discrete Maths: Grammars/8 157 Assume an automaton with 2*k states. How could it be used to represent? E = { 0 n 1 n | n >= 1} We will consider the case when n >> k.

241-303 Discrete Maths: Grammars/8 158 First Automaton Problem: not enough states since n >> k 123k start 0 00 0 2k2k-12k-2k+1 1 11 1 1 It uses all of its allowed 2*k states. must be equal length

241-303 Discrete Maths: Grammars/8 159 Second Try Add loops to reuse states. 123k start 0 0 0 0 2k2k-12k-2k+1 1 11 1 1 0 1 continued must be equal length

241-303 Discrete Maths: Grammars/8 160 Question: how many 0’s were matched between state 1 and k? – –Answer: it could be any number Question: how can the number of matched 0’s be used to fix the number of matched 1’s? – –Answer: it cannot when n can be any number continued

241-303 Discrete Maths: Grammars/8 161 So, no automaton can model the language: E = {0 n 1 n | n >= 1} – –so there is no RE for E – –but E can be written as a context-free grammar This shows that REs are weaker than context-free grammars.

241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and.

Similar presentations

Presentation on theme: "241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and.

Similar presentations

Presentation on theme: "241-303 Discrete Maths: Grammars/8 1 Discrete Maths Objectives – –to introduce grammars and show their importance for defining programming languages and."— Presentation transcript:

Similar presentations

About project

Feedback