Saumya Debray The University of Arizona Tucson, AZ 85721

Slides:



Advertisements
Similar presentations
Chapter 3 Syntax Analysis
Advertisements

Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Language processing: introduction to compiler construction Andy D. Pimentel Computer Systems Architecture group
Yacc Examples Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
CPSC Compiler Tutorial 5 Parser & Bison. Bison Concept Bison reads tokens and pushes them onto a stack along with the semantic values. The process.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
Saumya Debray The University of Arizona Tucson, AZ 85721
LEX and YACC work as a team
1 Using Yacc: Part II. 2 Main() ? How do I activate the parser generated by yacc in the main() –See mglyac.y.
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS308 Compiler Principles Introduction to Yacc Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
CPS 506 Comparative Programming Languages Syntax Specification.
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Introduction to YACC CS 540 George Mason University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
Set 6 Debugging with Lex & Yacc. DEBUGGING YOUR GRAMMAR WITH YACC Assume your Yacc defn. file is called “Test” 1.OBTAINING THE PARSING MACHINE Employ:
More LR Parsing and Bison CPSC 388 Ellen Walker Hiram College.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
YACC SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Introduction to Parsing
Syntax error handling –Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime:
Yacc.
Language processing: introduction to compiler construction
Chapter 3 – Describing Syntax
Syntax Analysis Part III
Tutorial On Lex & Yacc.
CS510 Compiler Lecture 4.
Compiler Construction
Chapter 4 Syntax Analysis.
Syntax Analysis Part III
Bison: Parser Generator
4 (c) parsing.
Lexical and Syntax Analysis
Syntax Analysis Part III
Bison Marcin Zubrowski.
CSE 3302 Programming Languages
Syntax Analysis Part III
Subject Name:Sysytem Software Subject Code: 10SCS52
R.Rajkumar Asst.Professor CSE
Syntax Analysis Part III
Syntax-Directed Translation
Kanat Bolazar February 16, 2010
Compiler Lecture Note, Miscellaneous
Yacc Yacc.
Compiler Structures 7. Yacc Objectives , Semester 2,
Appendix B.2 Yacc Appendix B.2 -- Yacc.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design Yacc Example "Yet Another Compiler Compiler"
CMPE 152: Compiler Design December 4 Class Meeting
Presentation transcript:

Saumya Debray The University of Arizona Tucson, AZ 85721 A brief yacc tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on yacc Yacc: Overview Parser generator: Takes a specification for a context-free grammar. Produces code for a parser. Output: C code implementing a parser: function: yyparse() file [default]: y.tab.c Input: a set of grammar rules and actions yacc (or bison) A quick tutorial on yacc

Scanner-Parser interaction Parser assumes the existence of a function ‘int yylex()’ that implements the scanner. Scanner: return value indicates the type of token found; other values communicated to the parser using yytext, yylval (see man pages). Yacc determines integer representations for tokens: Communicated to scanner in file y.tab.h use “yacc -d” to produce y.tab.h Token encodings: “end of file” represented by ‘0’; a character literal: its ASCII value; other tokens: assigned numbers  257. A quick tutorial on yacc

A quick tutorial on yacc Using Yacc lexical rules grammar rules y.output flex yacc describes states, transitions of parser (useful for debugging) “yacc -v” “yacc -d”  y.tab.h lex.yy.c y.tab.c yylex() yyparse() tokens parsed input input A quick tutorial on yacc

A quick tutorial on yacc int yyparse() Called once from main() [user-supplied] Repeatedly calls yylex() until done: On syntax error, calls yyerror() [user-supplied] Returns 0 if all of the input was processed; Returns 1 if aborting due to syntax error. Example: int main() { return yyparse(); } A quick tutorial on yacc

A quick tutorial on yacc yacc: input format A yacc input file has the following structure: definitions %% rules user code required optional Shortest possible legal yacc input: %% A quick tutorial on yacc

A quick tutorial on yacc Definitions Information about tokens: token names: declared using ‘%token’ single-character tokens don’t have to be declared any name not declared as a token is assumed to be a nonterminal. start symbol of grammar, using ‘%start’ [optional] operator info: precedence, associativity stuff to be copied verbatim into the output (e.g., declarations, #includes): enclosed in %{ … }% A quick tutorial on yacc

A quick tutorial on yacc Rules Grammar production yacc rule A  B1 B2 … Bm A  C1 C2 … Cn A  D1 D2 … Dk | C1 C2 … Cn | D1 D2 … Dk ; /* ‘;’ optional, but advised */ Rule RHS can have arbitrary C code embedded, within { … }. E.g.: A : B1 { printf(“after B1\n”); x = 0; } B2 { x++; } B3 Left-recursion more efficient than right-recursion: A : A x | … rather than A : x A | … A quick tutorial on yacc

A quick tutorial on yacc Conflicts Conflicts arise when there is more than one way to proceed with parsing. Two types: shift-reduce [default action: shift] reduce-reduce [default: reduce with the first rule listed] Removing conflicts: specify operator precedence, associativity; restructure the grammar use y.output to identify reasons for the conflict. A quick tutorial on yacc

Specifying Operator Properties Binary operators: %left, %right, %nonassoc: %left '+' '-' %left '*' '/' %right '^‘ Unary operators: %prec Changes the precedence of a rule to be that of the token specified. E.g.: %left '+' '-' %left '*' '/‘ Expr: expr ‘+’ expr | ‘–’ expr %prec ‘*’ | … Operators in the same group have the same precedence A quick tutorial on yacc

Specifying Operator Properties Binary operators: %left, %right, %nonassoc: %left '+' '-' %left '*' '/' %right '^' Unary operators: %prec Changes the precedence of a rule to be that of the token specified. E.g.: %left '+' '-' Expr: expr ‘+’ expr | ‘–’ expr %prec ‘*’ | … Operators in the same group have the same precedence Across groups, precedence increases going down. A quick tutorial on yacc

Specifying Operator Properties Binary operators: %left, %right, %nonassoc: %left '+' '-' %left '*' '/' %right '^‘ Unary operators: %prec Changes the precedence of a rule to be that of the token specified. E.g.: %left '+' '-' Expr: expr '+' expr | '–' expr %prec '*' | … Operators in the same group have the same precedence Across groups, precedence increases going down. The rule for unary ‘–’ has the same (high) precedence as ‘*’ A quick tutorial on yacc

A quick tutorial on yacc Error Handling The “token” ‘error’ is reserved for error handling: can be used in rules; suggests places where errors might be detected and recovery can occur. Example: stmt : IF '(' expr ')' stmt | IF '(' error ')' stmt | FOR … | … Intended to recover from errors in ‘expr’ A quick tutorial on yacc

Parser Behavior on Errors When an error occurs, the parser: pops its stack until it enters a state where the token ‘error’ is legal; then behaves as if it saw the token ‘error’ performs the action encountered; resets the lookahead token to the token that caused the error. If no ‘error’ rules specified, processing halts. A quick tutorial on yacc

Controlling Error Behavior Parser remains in error state until three tokens successfully read in and shifted prevents cascaded error messages; if an error is detected while parser in error state: no error message given; input token causing the error is deleted. To force the parser to believe that an error has been fully recovered from: yyerrok; To clear the token that caused the error: yyclearin; A quick tutorial on yacc

Placing ‘error’ tokens Some guidelines: Close to the start symbol of the grammar: To allow recovery without discarding all input. Near terminal symbols: To allow only a small amount of input to be discarded on an error. Consider tokens like ‘)’, ‘;’, that follow nonterminals. Without introducing conflicts. A quick tutorial on yacc

A quick tutorial on yacc Error Messages On finding an error, the parser calls a function void yyerror(char *s) /* s points to an error msg */ user-supplied, prints out error message. More informative error messages: int yychar: token no. of token causing the error. user program keeps track of line numbers, as well as any additional info desired. A quick tutorial on yacc

Error Messages: example #include "y.tab.h" extern int yychar, curr_line; static void print_tok() { if (yychar < 255) { fprintf(stderr, "%c", yychar); } else { switch (yychar) { case ID: … case INTCON: … … void yyerror(char *s) { fprintf(stderr, "[line %d]: %s", curr_line, s); print_tok(); } A quick tutorial on yacc

A quick tutorial on yacc Debugging the Parser To trace the shift/reduce actions of the parser: when compiling: #define YYDEBUG at runtime: set yydebug = 1 /* extern int yydebug; */ A quick tutorial on yacc

Adding Semantic Actions Semantic actions for a rule are placed in its body: an action consists of C code enclosed in { … } may be placed anywhere in rule RHS Example: expr : ID { symTbl_lookup(idname); } decl : type_name { tval = … } id_list; A quick tutorial on yacc

Synthesized Attributes Each nonterminal can “return” a value: The return value for a nonterminal X is “returned” to a rule that has X in its body, e.g.: A : … X … X : … This is different from the values returned by the scanner to the parser! value “returned” by X A quick tutorial on yacc

Attribute return values To access the value returned by the ith symbol in a rule body, use $i an action occurring in a rule body counts as a symbol. E.g.: decl : type { tval = $1 } id_list { symtbl_install($3, tval); } To set the value to be returned by a rule, assign to $$ by default, the value of a rule is the value of its first symbol, i.e., $1. 1 2 3 4 A quick tutorial on yacc

A quick tutorial on yacc Example /* A variable declaration is an identifier followed by an optional subscript, e.g., x or x[10] */ var_decl : ident opt_subscript { if ( symtbl_lookup($1) != NULL ) ErrMsg(“multiple declarations”, $1); else { st_entry = symtbl_install($1); if ( $2 == ARRAY ) { st_entry->base_type = ARRAY; st_entry->element_type = tval; } st_entry->base_type = tval; st_entry->element_type = UNDEF; opt_subscript : ‘[‘ INTCON ‘]’ { $$ = ARRAY; } | /* null */ { $$ = INTEGER; } ; A quick tutorial on yacc

Declaring Return Value Types Default type for nonterminal return values is int. Need to declare return value types if nonterminal return values can be of other types: Declare the union of the different types that may be returned: %union { symtbl_ptr st_ptr; id_list_ptr list_of_ids; tree_node_ptr syntax_tree_ptr; int value; } Specify which union member a particular grammar symbol will return: %token <value> INTCON, CHARCON; %type <st_ptr> identifier; %type <syntax_tree_ptr> expr, stmt; terminals nonterminals A quick tutorial on yacc

A quick tutorial on yacc Conflicts A conflict occurs when the parser has multiple possible actions in some state for a given next token. Two kinds of conflicts: shift-reduce conflict: The parser can either keep reading more of the input (“shift action”), or it can mimic a derivation step using the input it has read already (“reduce action”). reduce-reduce conflict: There is more than one production that can be used for mimicking a derivation step at that point. A quick tutorial on yacc

A quick tutorial on yacc Example of a conflict Grammar rules: S  if ( e ) S /* 1 */ Input: if ( e1 ) if ( e2 ) S2 else S3 | if ( e ) S else S /* 2 */ Parser state when input token = ‘else’: Input already seen: if ( e1 ) if ( e2 ) S2 Choices for continuing: 1. keep reading input (“shift”): ‘else’ part of innermost if eventual parse structure: if (e1) { if (e2) S2 else S3 } 2. mimic derivation step using S  if ( e ) S (“reduce”): ‘else’ part of outermost if eventual parse structure: if (e1) { if (e2) S2 } else S3 shift-reduce conflict A quick tutorial on yacc

A quick tutorial on yacc Handling Conflicts General approach: Iterate as necessary: Use “yacc -v” to generate the file y.output. Examine y.output to find parser states with conflicts. For each such state, examine the items to figure why the conflict is occurring. Transform the grammar to eliminate the conflict: Reason for conflict Possible grammar transformation Ambiguity with operators in expressions Specify associativity, precedence Error action Move or eliminate offending error action Semantic action Move the offending semantic action Insufficient lookahead “expand out” the nonterminal involved Other …???… A quick tutorial on yacc

Understanding y.output Input grammar: S : ‘0’ S ‘0’ | /* epsilon */ ; y.output file: 0 $accept : S $end 1 S : ‘0’ S ‘0’ 2 | … 1: shift/reduce conflict (shift 1, reduce 2) on ‘0’ state 1 …. artificial rule, introduced by yacc rules from input grammar rule numbers indicates which state has conflict information about state 1 A quick tutorial on yacc

Understanding y.output: cont’d no. of state with conflict 1: shift/reduce conflict (shift 1, reduce 2) on ‘0’ nature of the conflict: “shift 1” : “shift and go to state 1 “reduce 2” : reduce using rule 2 (S  ) input token on which the conflict occurs What’s going on? haven’t reached midpoint of input string seen ‘0’, looking to process S  if input has ‘0’, continue processing  shift state 1: S : ‘0’ . S ‘0’ S : . Reached midpoint of input string next ‘0’ in input is from second half of string if input is ‘0’, reduce using (S  ) A quick tutorial on yacc