Download presentation
Presentation is loading. Please wait.
1
Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail.ustc.edu.cn
2
How to write a parser? Write a parser by hand Use a parser generator May not be as efficient as hand-written parser General and robust How it works? Parser Specification parser generator Parser abstract syntax stream of tokens
3
ML-Yacc specification Three parts again User Declarations: declare values available in the rule actions % ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts % Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax
4
ML-Yacc Definitions specify type of positions %pos int * int specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS... %nonterm prog | exp | op specify end-of-parse token %eop EOF specify start symbol (by default, non terminal in LHS of first rule) %start prog
5
A Simple ML-Yacc File % %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF % exp : fact () | fact PLUS exp() fact : base () | base MUL factor() base : NUM() | LPAR exp RPAR () grammar rules semantic actions (currently do nothing) grammar symbols
6
each nonterminal may have a semantic value associated with it when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s when parsing is completed successfully parser returns semantic value associated with the start symbol usually a syntax tree
7
to use semantic values during parsing, we must declare symbol types: %terminal NUM of int | PLUS | MUL |... %nonterminal exp of int | fact of int | base of int type of semantic action must match type declared for the nonterminal in rule
8
A Simple ML-Yacc File with Action % %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF % exp : fact (fact) | fact PLUS exp(fact + exp) fact : base (base) | base MUL base(base1 * base2) base : NUM(NUM) | LPAR exp RPAR (exp) grammar rules with semantic actions grammar symbols with type declarations computing integer result via semantic actions
9
Conflicts in ML-Yacc We often write ambiguous grammar Example Tokens from lexer NUM PLUS NUM MUL NUM State of Parser E+E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read
10
Conflicts in ML-Yacc We often write ambiguous grammar Example Tokens from lexer NUM PLUS NUM MUL NUM State of Parser E+E Result is : E+(E*E) exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read ShiftE+E* ShiftE+E*E ReduceE+E ReduceE If we shift
11
Conflicts in ML-Yacc We often write ambiguous grammar Example Tokens from lexer NUM PLUS NUM MUL NUM State of Parser E+E Result is: (E+E)*E exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR To be read ReduceE ShiftE* ShiftE*E ReduceE If we reduce
12
This is a shift-reduce conflict We want E+E*E, because “*” has higher precedence than “+” Another shift-reduce conflict Tokens from lexer NUM PLUS NUM PLUS NUM State of Parser E+E Result is : E+(E+E) and (E+E)+E To be read ShiftE+E+ ShiftE+E+E ReduceE+E ReduceE If we shift ReduceE ShiftE+ ShiftE+E ReduceE If we reduce
13
Deal with shift-reduce conflicts This case, we need to reduce, because “+” is left associative Deal with it! let ML-Yacc complain. default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant rewrite the grammar to eliminate ambiguity can be complicated and less clear use Yacc precedence directives %left, %right %nonassoc
14
Precedence and Associativity precedence of terminal based on order in which associativity is specified precedence of rule is the precedence of the right- most terminal eg: precedence of (E ::= E + E) == prec(+) a shift-reduce conflict is resolved as follows prec(terminal) > prec(rule) ==> shift prec(terminal) reduce prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error
15
datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp) Higher precedence
16
Reduce-reduce Conflict This kind of conflict is more difficult to deal with Example When we get a “word” from lexer, word -> maybeword -> sequence (rule 1) empty –> sequence word -> sequence (rule 2) We have more than one way to get “sequence” from input “word” sequence::= | maybeword | sequence word maybeword: := | word
17
Reduce-reduce Conflict Reduce-reduce conflict means there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar. ML-Yacc reduce by first rule Generally, reduce-reduce conflict is not allowed in your ML-Yacc file We need to fix our grammar sequence::= | sequence word
18
Summary of conflicts Shift-reduce conflict precedence and associativity Shift by default Reduce-reduce conflict reduce by first rule Not allowed!
19
Lab3 Your job is to finish a parser for C language Input: A “.c” file Output: “Success!” if the “.c” file is correct File description c.lex c.grm main.sml call-main.sml sources.cm lab3.mlb test.c
20
Using ML-Yacc Read the ML-Yacc Manual Run If your finish “c.grm” and “c.lex” In command-line: (use MLton’s) mlyacc c.grm mllex c.lex we will get “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml” Then compile Lab3 Start SML/NJ, Run CM.make “sources.cm”; or in command-line, mlton lab3.mlb To run lab3 In SML/NJ, Main.parse “test.c”; or in command-line, lab3 test.c
21
“Debug” ML-Yacc File When you run mlyacc, you’ll see error messages if your ml-yacc file has conflicts. For example, mlyacc c.grm 2 shift/reduce conflicts open file “c.grm.desc”(This file is generated by mlyacc) The beginning of this file the rest are all the states rule 12 means the 12 th rule (from 0) in your ML-Yacc file 2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12) state 0:prog :. structs vdecs preds funcs MYSTRUCTshift 3proggoto 429 structsgoto 2structdecgoto 1.reduce by rule 12
22
Use ML-lex with ML-yacc Most of the work in “c.lex” this time can be copied from Lab2 You can re-use Regular expressions and Lexical rules Difference with Lab2 You have to define “token” in “c.grm” %term INT of int | EOF “%term” in “c.grm” will be automatically in “c.grm.sig” signature C_TOKENS = sig type ('a,'b) token type svalue val EOF: 'a * 'a -> (svalue,'a) token val INT: (int) * 'a * 'a -> (svalue,'a) token end
23
Hints Read ML-Yacc Manual Read the language specification Test a lot!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.