YACC (Yet Another Compiler-Compiler) Chung-Ju Wu

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

Chapter 3 Syntax Analysis
Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Language processing: introduction to compiler construction Andy D. Pimentel Computer Systems Architecture group
Yacc Examples Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
CPSC Compiler Tutorial 5 Parser & Bison. Bison Concept Bison reads tokens and pushes them onto a stack along with the semantic values. The process.
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
PLLab, NTHU,Cs2403 Programming Languages
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
1 YACC Yet Another Compiler Compiler. 2 Yacc is a parser generator: Input: A Grammar Output: A parser for the grammar (Reminder: a parser finds derivations)
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
Saumya Debray The University of Arizona Tucson, AZ 85721
LEX and YACC work as a team
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
1 October 14, October 14, 2015October 14, 2015October 14, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.
Lab 3: Using ML-Yacc Zhong Zhuang
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
YACC Example Taken from LEX & YACC Simple calculator a = a a=10 b = 7 c = a + b c c = 17 pressure = ( ) * 16.4 $
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS308 Compiler Principles Introduction to Yacc Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University.
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
Introduction to Yacc Ying-Hung Jiang
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
Introduction to YACC CS 540 George Mason University.
1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
Calculator 구현. Simple Calculator 문제 " 입력되는 수식의 값을 계산하는 lex 와 yacc 의 source 을 작성하시오 " – 허용하는 연산자 : + * - / ( ) – 우선순위 : ( ) > * / > + - – 결합순위 : 모두 좌측.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
2-1. LEX & YACC. 2 Overview  Syntax  What its program looks like –Context-free grammar, BNF  Syntax-directed translation –A grammar-oriented compiling.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
YACC SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
Syntax error handling –Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime:
Yacc.
Language processing: introduction to compiler construction
Syntax Analysis Part III
Tutorial On Lex & Yacc.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Compiler Construction
Chapter 4 Syntax Analysis.
Context-free Languages
Syntax Analysis Part III
Bison: Parser Generator
Syntax Analysis Part III
Bison Marcin Zubrowski.
Syntax Analysis Part III
Subject Name:Sysytem Software Subject Code: 10SCS52
Syntax Analysis Part III
Syntax-Directed Translation
Compiler Lecture Note, Miscellaneous
Compiler Structures 7. Yacc Objectives , Semester 2,
Saumya Debray The University of Arizona Tucson, AZ 85721
Compiler Design Yacc Example "Yet Another Compiler Compiler"
CMPE 152: Compiler Design December 4 Class Meeting
Presentation transcript:

YACC (Yet Another Compiler-Compiler) Chung-Ju Wu

The Goal Compiler Lex Yacc C programs (subset) Microsoft Intermediate Language Executed on CLR MSIL Assembler

Why use Lex & Yacc ? Writing a compiler is difficult requiring lots of time and effort. Construction of the scanner and parser is routine enough that the process may be automated. Lexical Rules Grammar Semantics Compiler Scanner Parser Code generator

Introduction What is YACC ? –Tool which will produce a parser for a given grammar. –YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar.

History Original written by Stephen C. Johnson, Variants: –lex, yacc (AT&T) –bison: a yacc replacement (GNU) –flex: fast lexical analyzer (GNU) –BSD yacc –PCLEX, PCYACC (Abraxas Software)

How YACC Works a.out File containing desired grammar in yacc format yacc program C source program created by yacc C compiler Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc

yacc How YACC Works (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time y.tab.c a.out (3) Run time Token stream Abstract Syntax Tree y.output

An YACC File Example

Works with Lex How to work ?

Works with Lex call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM

YACC File Format %{ C declarations %} yacc declarations % Grammar rules % Additional C code –Comments enclosed in /*... */ may appear in any of the sections.

Definitions Section %{ #include %} %token ID NUM %start expr It is a terminal 由 expr 開始 parse

Start Symbol The first non-terminal specified in the grammar specification section. To overwrite it with %start declaraction. %start non-terminal

Rules Section Is a grammar Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;

Rules Section Normally written like this Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;

The Position of Rules expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ;

The Position of Rules expr expr : expr '+' term { $$ = $1 + $3; } term | term { $$ = $1; } ; term term : term '*' factor { $$ = $1 * $3; } factor | factor { $$ = $1; } ; ( factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $1

The Position of Rules + expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; * term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; expr factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $2

The Position of Rules term expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; factor term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; ) factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $3 Default: $$ = $1;

Communication between LEX and YACC call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM LEX and YACC 需要一套方法確認 token 的身份

Communication between LEX and YACC yacc -d gram.y Will produce: y.tab.h Use enumeration ( 列舉 ) / define 由一方產生,另一方 include YACC 產生 y.tab.h LEX include y.tab.h

Communication between LEX and YACC %{ #include #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* % int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include %} %token CHAR, FLOAT, ID, INT % yacc -d xxx.y Produced y.tab.h # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y scanner.l

YACC Rules may be recursive Rules may be ambiguous Uses bottom up Shift/Reduce parsing –Get a token –Push onto stack –Can it reduced (How do we know?) If yes: Reduce using a rule If no: Get another token Yacc cannot look ahead more than one token

Another Problem? Every terminal-token (symbol) may represent a value or data type –May be a numeric quantity in case of a number (42) –May be a pointer to a string ("Hello, World!") When using lex we put the value into yylval –In complex situations yylval is a union Typical lex code: [0-9]+ {yylval = atoi(yytext); return NUM}

Another Problem? Yacc allows symbols to have multiple types of value symbols %union { double dval; int vblno; char* strval; }

Another Problem? %union { double dval; int vblno; char* strval; } yacc -d y.tab.h … extern YYSTYPE yylval; [0-9]+ { yylval.vblno = atoi(yytext); return NUM;} [A-z]+ { yylval.strval = strdup(yytext); return STRING;} Lex file include “y.tab.h”

Yacc Example Taken from Lex & Yacc Simple calculator a = a a=10 b = 7 c = a + b c c = 17 $

Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME

#define NSYMS 20/* maximum number of symbols */ struct symtab { char *name; double value; } symtab[NSYMS]; struct symtab *symlook(); parser.h namevalue 0 namevalue 1 namevalue 2 namevalue 3 namevalue 4 namevalue 5 namevalue 6 namevalue 7 namevalue 8 namevalue 9 namevalue 10 Symbol Table

%{ #include "parser.h" #include %} %union { double dval; struct symtab *symp; } %token NAME %token NUMBER %type expression %type term %type factor % parser.y Parser Terminal token NAME 與 有 相同的 data type. Nonterminal token expression 與 有相同的 data type.

statement_list:statement '\n' |statement_list statement '\n' ; statement:NAME '=' expression{ $1->value = $3; } |expression{ printf("= %g\n", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } | term ; parser.y Parser (cont’d)

term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if ($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } ; % parser.y Parser (cont’d)

%{ #include "y.tab.h" #include "parser.h" #include %} % ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); return NUMBER; } [ \t]; /* ignore white space */ Scanner scanner.l

[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; } "$"{ return 0; /* end of input */ } \n|”=“|”+”|”-”|”*”|”/”return yytext[0]; % Scanner (cont’d) scanner.l

YACC Command Yacc (AT&T) –yacc –d xxx.y Bison (GNU) –bison –d –y xxx.y 產生 y.tab.c, 與 yacc 相同 不然會產生 xxx.tab.c

Precedence / Association = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association *3 = 1-(2*3) Define “*” operator is precedent to “-” operator (1) 1 – (2) 1 – 2 * 3

Precedence / Association %right ‘=‘ %left ' ' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence

Precedence / Association expr: expr ‘+’ expr{ $$ = $1 + $3; } | expr ‘-’ expr{ $$ = $1 - $3; } | expr ‘*’ expr{ $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; } %left '+' '-' %left '*' '/' %noassoc UMINUS

IF-ELSE Ambiguity Consider following rule: Following state :

IF-ELSE Ambiguity It is a shift/reduce conflict. Yacc will always do shift first. Solution1 :re-write grammar

Solution2: IF-ELSE Ambiguity the rule has the same precedence as token IFX

Shift/Reduce Conflicts shift/reduce conflict –occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. –ex: IF-ELSE ambigious. To resolve this conflict, yacc will choose to shift.

Reduce/Reduce Conflicts Reduce/Reduce Conflicts: start : expr | stmt ; expr : CONSTANT; stmt : CONSTANT; Yacc(Bison) resolves the conflict by reducing using the rule that occurs earlier in the grammar. NOT GOOD!! So, modify grammar to eliminate them.

Error Messages Bad error message: –Syntax error. –Compiler needs to give programmer a good advice. It is better to track the line number in lex:

Debug Your Parser 1.Use –t option or define YYDEBUG to 1. 2.Set variable yydebug to 1 when you want to trace parsing status. 3.If you want to trace the semantic values Define your YYPRINT function

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: a = 7; b = 3 + a + 2 stack:

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: = 7; b = 3 + a + 2 stack: NAME SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 7; b = 3 + a + 2 stack: NAME ‘=‘ SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: NAME ‘=‘ 7 SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: NAME ‘=‘ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: ; b = 3 + a + 2 stack: stmt REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: b = 3 + a + 2 stack: stmt ‘;’ SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: = 3 + a + 2 stack: stmt ‘;’ NAME SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 3 + a + 2 stack: stmt ‘;’ NAME ‘=‘ SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + a + 2 stack: stmt ‘;’ NAME ‘=‘ NUMBER SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + a + 2 stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: a + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ NAME SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: + 2 stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: 2 stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ NUMBER SHIFT!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt ‘;’ NAME ‘=‘ exp ‘+’ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt ‘;’ NAME ‘=‘ exp REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt ‘;’ stmt REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt REDUCE!

Shift and reducing stmt: stmt ‘;’ stmt | NAME ‘=‘ exp exp: exp ‘+’ exp | exp ‘-’ exp | NAME | NUMBER input: stack: stmt DONE!

Recursive Grammar Left recursion Right recursion LR parser prefer left recursion. LL parser prefer right recursion.

YACC Declaration Summary `%start' Specify the grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol

YACC Declaration Summary `%right' Declare a terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left-associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, ex: x op. y op. z is syntax error)