Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.

Similar presentations


Presentation on theme: "Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating."— Presentation transcript:

1 Grammars and Parsing

2 Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating sentences in a language. Our sample grammar has these rules: –a Sentence can be a Noun followed by a Verb followed by a Noun –a Noun can be ‘boys’ or ‘girls’ or ‘dogs’ –a Verb can be ‘like’ or ‘see’ Examples of Sentence: –boys see dogs –dogs like girls –….. Note: white space between words does not matter This is a very boring grammar because the set of Sentences is finite (exactly 18 sentences). Work this out as an exercise.

3 Recursive grammar Examples of Sentences in this language: –boys like girls –boys like girls and girls like dogs –boys like girls and girls like dogs and girls like dogs –boys like girls and girls like dogs and girls like dogs and girls like dogs –……… This grammar is more interesting than the one in the last slide because the set of Sentences is infinite. What makes this set infinite? Answer: recursive definition of Sentence Sentence  Sentence and Sentence Sentence  Sentence or Sentence Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see

4 Detour What if we want to add a period at the end of every sentence? Does this work? Sentence  Sentence and Sentence. Sentence  Sentence or Sentence. Sentence  Noun Verb Noun. Noun  …….. No! This produces sentences like girls like boys. and boys like dogs..

5 Sentences with periods Add a new rule that adds a period only at the end of the sentence. Thought exercise: how does this work? End of detour TopLevelSentence  Sentence. Sentence  Sentence and Sentence Sentence  Sentence or Sentence Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see

6 Grammar for simple expressions This is a grammar for simple expressions: –An E can be an integer. –An E can be ‘(‘ followed by an E followed by ‘+’ followed by an E followed by ‘)’ Set of Expressions defined by this grammar is a recursively-defined set. Expression  integer Expression  ( Expression + Expression )

7 E  integer E  (E + E) Here are some legal expressions: 2 (3 + 34) ((4+23) + 89) ((89 + 23) + (23 + (34+12))) Here are some illegal expressions: (3 3 + 4

8 Parsing: given a grammar and some text, determine if that text is a legal sentence in the language defined by that grammar For many grammars such the simple expression grammar, we can write efficient programs to answer this question. Next slides: parser for our small expression language Parsing

9 Helper class: SamTokenizer Read the on-line code for –Tokenizer: interfaceTokenizer: interface –SamTokenizer: code Code lets you –open file for input: SamTokenizer f = new SamTokenizer(String-for-file-name) –examine what the next thing in file is: f.peekAtKind() integer: such as 3, -34, 46 word: such as x, r45, y78z (variable name in Java) operator: such as +, -, *, (, ), etc. …. –read next thing from file: integer: f.getInt() word: f.getWord() operator: f.getOp()

10 Useful methods in SamTokenizer class: –f.check(char c): char  boolean Example: f.check(‘*’); //true if next thing in input is * Check if next thing in input is c –If so, eat it up and return true –Otherwise, return false –f.check(String s): String  boolean Example of its use: f.check(“if”); –Return true if next token in input is word if –f.match(char c): char  void like f.check but throws TokenizerException if next token in input is not “c” –f.match(String s): string  void (eg) f.match(“if”)

11 Parser for simple expressions Input: file Output: true if a file contains a single expression as defined by this grammar, false otherwise Note: file must contain exactly one expression File: (2+3) (3+4) will return false Expression  integer Expression  ( Expression + Expression )

12 Parser for expression language static boolean expParser(String fileName) {//parser for expression in file try { SamTokenizer f = new SamTokenizer (fileName); return getExp(f) && (f.peekAtKind() == Tokenizer.TokenType.EOF) ;//must be at EOF } catch (Exception e) { System.out.println("Aaargh"); return false; } static boolean getExp(SamTokenizer f) { switch (f.peekAtKind()) { case INTEGER: //E -> integer {f.getInt(); return true; } case OPERATOR: //E ->(E+E) return f.check('(') && getExp(f) && f.check('+') && getExp(f) && f.check(')'); default: return false; }

13 Note on boolean operators Java supports two kinds of boolean operators: –E1 & E2: Evaluate both E1 and E2 and compute their conjunction (i.e.,“and”) –E1 && E2: Evaluate E1. If E1 is false, E2 is not evaluated, and value of expression is false. If E1 is true, E2 is evaluated, and value of expression is the conjunction of the values of E1 and E2. In our parser code, we use && –if “f.check(‘(‘) returns false, we simply return false without trying to read anything more from input file. This gives a graceful way to handling errors.

14 (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) Tracing recursive calls to getExp

15 Modifying parser to do SaM code generation Let us modify the parser so that it generates SaM code to evaluate arithmetic expressions: (eg) 2 : PUSHIMM 2 STOP (2 + 3) : PUSHIMM 2 PUSHIMM 3 ADD STOP

16 Idea Recursive method getExp should return a string containing SaM code for expression it has parsed. Top-level method expParser should tack on a STOP command after code it receives from getExp. Method getExp generates code in a recursive way: –For integer i, it returns string “PUSHIMM” + i + “\n” –For (E1 + E2), recursive calls return code for E1 and E2 –say these are strings S1 and S2 method returns S1 + S2 + “ADD\n”

17 CodeGen for expression language static String expCodeGen(String fileName) {//returns SaM code for expression in file try { SamTokenizer f = new SamTokenizer (fileName); String pgm = getExp(f); return pgm + "STOP\n"; } catch (Exception e) { System.out.println("Aaargh"); return "STOP\n"; } static String getExp(SamTokenizer f) { switch (f.peekAtKind()) { case INTEGER: //E -> integer return "PUSHIMM " + f.getInt() + "\n"; case OPERATOR: //E ->(E+E) { f.match('('); // must be ‘(‘ String s1 = getExp(f); f.match('+'); //must be ‘+’ String s2 = getExp(f); f.match(')'); //must be ‘)’ return s1 + s2 + "ADD\n"; } default: return "ERROR\n"; }

18 (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) (3 + (34 + 23)) getExp( ) Tracing recursive calls to getExp PUSHIMM 3 PUSHIMM 34 PUSHIMM 23 PUSHIMM 34 PUSHIMM 23 ADD PUSHIMM 34 PUSHIMM 23 ADD PUSHIMM 3

19 Exercises Think about recursive calls made to parse and generate code for simple expressions 2 (2 + 3) ((2 + 45) + (34 + -9)) Can you derive an expression for the total number of calls made to getExp for parsing an expression? –Hint: think inductively Can you derive an expression for the maximum number of recursive calls that are active at any time during the parsing of an expression?

20 Number of recursive calls Claim: # of calls to getExp for expression E = # of integers in E + # of addition symbols in E. Example: ((2 + 3) + 5) # of calls to getExp = 3 + 2 = 5

21 Formal Languages Grammars for computer languages have been studied extensively We will study Context-free Languages (CFL) later in the course For now, we will just introduce some terminology informally

22 Terminology Symbols: names/strings in grammar –(eg) Sentence, Noun, Verb, boys, girls, dogs, like, see Non-terminals: symbols that occur on the left hand sides of rules –(eg) Sentence, Noun, Verb Terminals: symbols that do not occur on left hand sides of rules –(eg) boys, girls, dogs, like, see Start symbol: the symbol used to begin the derivation of sentences –(eg) Sentence Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see

23 Parse trees Derivation: description of how to produce sentence from start symbol –(eg) boys like dogs Sentence  Noun Verb Noun  boys Verb Noun  boys Verb dogs  boys like dogs Derivations can be shown as parse trees –You can decorate the tree with the names of the rules used in each step of the derivation Sentence NounVerbNoun boys like dogs Parse tree for “boys like dogs”

24 Conclusion The two parsers we have written are called “recursive descent parsers” –parser is essentially a big recursive function that operates more or less directly off of the grammar Not all grammars can be parsed by a recursive descent parser –most grammars require more complex parsers Recursive descent parsers were among the first parsers invented by compiler writers Ideally, we would like to be able generate parsers directly from the grammar –software maintenance would be much easier maintain the “parser-generator” for everyone maintain specification of your grammar Today we have lots of tools that can generate parsers automatically from many grammars –yacc is perhaps the most famous one: needs an LALR(1) grammar, which we will study later

25 Extra CS 211 material: Number of recursive calls Claim: # of calls to getExp for expression E = # of integers in E + # of addition symbols in E. Example: ((2 + 3) + 5) # of calls to getExp = 3 + 2 = 5

26 Inductive Proof Order expressions by their length (# of tokens) E1 < E2 if length(E1) < length(E2). 0 1 23 54 1 -2 7 (2 + 3) (1 + 0)

27 Proof of # of recursive calls Base case: (length = 1) Expression must be an integer. getExp will be called exactly once as predicted by formula. Inductive case: Assume formula is true for all expressions with n or fewer tokens. –If there are no expressions with n+1 tokens, result is trivially true for n+1. –Otherwise, consider expression E of length n+1. E cannot be an integer; therefore it must be of the form (E1 + E2) where E1 and E2 have n or fewer tokens. By inductive assumption, result is true for E1 and E2. (contd. on next slide)

28 Proof(contd.) #-of-calls-for-E = = 1 + #-of-calls-for-E1 + #-of-calls-for-E2 = 1 + #-of-integers-in-E1 + #-of-'+'-in-E1 + #- of-integers-in-E2 + #-of-'+'-in-E2 = #-of-integers-in-E + #-of-'+'-in-E as required.


Download ppt "Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating."

Similar presentations


Ads by Google