CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
Compiling Grammars are used to define programming language and check syntax. Phases of a compiler source code scanner stream of tokens parser parse tree
Scanning Scanner needs to know what to expect when eating your program. –identifiers –numbers –strings –comments Specifications for tokens can be expressed by regular expression (or regular grammar). While scanning, we can be in different states, such as inside a comment.
Parser Purpose is to understand structure of program. All programming structures can be expressed as CFG. Simple example for + and – expr expr + digit | expr – digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 How would we derive the string 9 – ?
9 – expr expr + digit | expr – digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr expr +digit expr - digit 2 digit 5 9 Leftmost derivation: expr expr + digit expr – digit + digit digit – digit + digit 9 – digit + digit 9 – 5 + digit 9 – “parse tree”
Left & right recursion What is the difference between these 2 grammars? Which one is better? expr expr + digit | expr – digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr digit + expr | digit – expr | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Let’s try 9 – on both of these. The grammar must convey the order of operations! Operators may be left associative or right associative.
+ - * / Question: How do we write grammar for all 4 operators? Can we do it this way… expr expr + digit | expr – digit | expr * digit | expr / digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 BTW, we can ignore “digits” and from now on just replace them with “num”, and understand it’s any single number.
Precedence (* /) bind stronger than (+ -) (+ -) separate better than (* /) Need to break up expression into terms –Ex. 9 – 8 * / 5 –We want to say that an expression consists of “terms” separated by + and – –And each term consists of numbers separated by * and / –But which should we define first, expr or term?
Precedence (2) Which grammar is right? expr expr + term | expr – term | term term term * num | term / num | num Or this one: expr expr * term | expr / term | term term term + num | term – num | num Let’s try examples * 3 and 1 * 2 + 3
Moral If a grammar is defining something hierarchical, like an expression, define large groupings first. Lower precedence operators appear first in grammar. (They separate better) –Ex. * appears lower in parse tree than + because it gets evaluated first. In a real programming language, there can be more than 10 levels of precedence. C has ~15!
C language Handout –How does the grammar begin? –Where are the mathematical expressions? –Do you agree with the precedence? –Do you see associativity? –What else is defined in grammar? –Where are the terminals?