Download presentation
Presentation is loading. Please wait.
Published byGyles Payne Modified over 9 years ago
1
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani
2
2 (1) Parsers Parsers are programs that create parse trees from source programs. Many aspects of a programming language have a structure that may be described by REs. (e.g. Identifiers could be represented by RE using lex analyzer). However, there are some very important aspects of programming languages that cannot be represented be REs. (Typical languages use parentheses and/or brackets in a nested and balanced fashion). Example #1: A grammar G bal = ({B}, {(, )}, P, B), where P consists of: B -> BB | (B) | ε Example #2: A grammar that generates the possible sequences of if and else in C (represented as i and e, respectively) is: S -> SS | iS | iSeS | ε Q: Can we generate the following strings using the above grammar, And why?: ieie, iie, ei, iei, ieeii ? How about: iieie ?
3
3 The answer for the last one is yes. Because the iieie corresponds to a C program whose structure is like: if (Condition) { … if (Condition) Statement; else Statement; … if (Condition) Statement; else Statement; … }
4
4 Lexical Analyzer Syntax Analyzer Code Generator Compilation Sequence a = b + c * d id1 = id2 + id3 * id4 = + * id4 id1 id2 id3 load id3 mul id4 add id2 store id1 Source code Generated code Syntax tree tokens
5
5 (2) The YACC Parser-Generator Yacc and lex are very closely related. The fact that both program generators are often used in combination should not be surprising. The structure of a yacc program closely resembles the structure of a lex program. A yacc program has the following structure. % % A yacc program describes the production rules for a context-free grammar. (A yacc program usually has a “.y" suffix.) Yacc generates a procedure yyparse() that processes a stream of tokens generated by yylex() and attempts to match a sentence in the specified language. Notice that yacc (yyparse()) calls the scanner when it needs the next token. The scanner is called yylex(). This may or may not be generated by lex.The output of yacc is placed in a file called y.tab.c, unless otherwise specified.
6
6 yacc lex y.tab.c lex.yy.c bas.y bas.l y.tab.h (yyparse) (yylex) bas.exe Compiled output source (Building a compiler with Lex/Yacc) Commands to create our compiler, bas.exe, are: yacc –d bas.y# create y.tab.h, y.tab.c lex bas.l# create lex.yy.c cc lex.yy.c y.tab.c –obas.exe # compile/link cc
7
7 You may use a lex generated version of yylex() by simply including the statement: #include "lex.yy.c" in the program section of the yacc definition file. The declaration section contains declaration statements such as: %token TK_IDand %start set The heart of a yacc program is the rules section. This section describes the grammar productions and actions to perform once those productions are realized. For example, a typical grammar rule might be the following: set : '{' list_of_ids '}‘ ; Here set and list_of_ids are variables (nonterminals) and '{' and '}' are terminals. (The semicolon in the above rule definition denotes the end of a sequence of production rules.) Similarly, we might define list_of_ids :TK_ID | TK_ID ',' list_of_ids ;
8
8 where list_of_ids is a variable and TK_ID and ',' are tokens. Notice that alternation is denoted by “|” in our grammar rules. Yacc works with attribute grammars, i.e., those grammars in which every nonterminal and terminal may have an associated attribute or value. In yacc actions, these attributes may be read and/or set when needed. The attribute of the variable (nonterminal) on the left-hand side is denoted by $$. The attributes of the other elements in a production may be accessed by their number. For example, if the variable EXPR has an integer attribute, then the following production rule and action are appropriate. EXPR : EXPR + EXPR {$$ = $1 + $3;} ; Example 1: Here is a lex program that removes comments, tabs, new lines, etc. It returns { }, ; = TKID, and TK_COLORS as tokens.
9
9 /* exam-y.y: Use strings and sets as yacc types */ %{ struct color_list { char *my_color; struct color_list *next; }; %} %union { char *t_val; struct color_list *color_set; } /* These are the token types that exam-l.l returns */ %token TKID %token TK_COLORS /* The token TKID returns the type t_val. */ %type TKID %type color_def list_of_ids %start color_def;
10
10 Here are the production rules: % color_def : TK_COLORS '=' '{' list_of_ids '}' ';' { print_set($4);} ; list_of_ids : TKID { struct color_list *set1; set1 = (struct color_list *)malloc(sizeof(struct color_list)); set1->next = NULL; set1->my_color = $1; $$ = set1; } | TKID ',' list_of_ids { struct color_list *set1; set1 = (struct color_list *)malloc(sizeof(struct color_list)); set1->next = $3; set1->my_color = $1; $$ = set1; } ; %
11
11 /* exam-l.l: Here is a lex program that removes comments, tabs, newlines, etc. */ /* It returns { }, ; = TKID, and TK_COLORS as tokens. */ % [ \t\n\f] {ACC(yytext[0]); /* Remove tabs/spaces/newlines */} \/\* {char c; int line_cur; line_cur = linecount; while (1) {if ((c = input()) == EOF) { /* If this is the case, there is an error, an unterminated comment. */ printf("Detected unterminated comment starting on line %d \n",line_cur); return(0); } ACC(c);
12
12 if (c == '*') { if ((c = input()) == '/') { break; } else {unput(c);} } } colors {printf("%s\n",yytext); return TK_COLORS;} [a-zA-Z_.][a-zA-Z_.0-9]* { /* Copy yytext to yylval.string_val */ yylval.t_val = strdup(yytext); return TKID;} [{},;=] {return yytext[0];}. {printf("Illegal Character %s on line %d\n", yytext,linecount); printf("Ignored \n"); ACC(yytext[0]);} %
13
13 Input: /* This is a test of a colors file. */ colors = {red, green, blue, white}; /* End of test. */ Output: Reading a colors definition Equals Beginning of set Color = red Separator Color = green Separator Color = blue Separator Color = white Semicolon
14
14 Example 2: This is a yacc program that acts as an interpreter for a simple language called SSET that manipulates STRINGS and SETS of STRINGS.. It has two classes: (1) set_list: that represents sets and their functions like: Searching, storing, set union, set intersection, set difference, and printing contents of a set. (2) symtab: that represents a symbol table that stores information about each variable such as: variable name, type, and its values. - Here are the production rules from YACC file without their actions (Too long to fit here!).
15
15 % Sset: declaration program |{$$ = NULL; } ; declaration : TK_SET TK_ID setdeclar ';'{ … } | TK_STRING TK_ID strdeclar ';'{ … } ; setdeclar: ',' TK_ID setdeclar |{$$ = NULL;} ; program: declaration program{$$ = $1; } | statement program |{/* if lamda */ $$ = NULL; } ; statement: TK_ID '=' simp_exp{ … } | TK_DISPLAY TK_STR_CONST ';'{ … } | TK_DISPLAY TK_ID ';' { … } ; simp_exp: TK_ID';'{ … } | TK_STR_CONST ';'{ … } | set_def{$$ = $1; } | bin_exp{$$ = $1; } set_def: '{' list_of_ids '}' ';' {$$ = $2; } ; list_of_ids: TK_ID{ … } | TK_STR_CONST{ … } | TK_ID ',' list_of_ids{ … } | TK_STR_CONST ',' list_of_ids |{/* if lamda */ $$ = NULL; } ; bin_exp: bin1 ';'{ $$ = $1; } | bin2 ';'{ $$ = $1; } | bin3 ';'{ $$ = $1; } | bin4 ';'{ $$ = $1; } ; bin1: TK_ID '+' TK_ID{ … } | TK_ID '+'{ … } ; bin2: TK_ID '*' TK_ID{ … } | TK_ID '*' '{' TK_ID '}'{ … } ; bin3: TK_ID '-' TK_ID{ … } | TK_ID '-' '{' TK_ID '}'{ … } ; bin4: '(' bin1 ')‘ | TK_ID{ … } %
16
16 Input: SET s1; STRING s2; s2 = "John"; s1 = {s2,"Paul","Ringo", "George"}; DISPLAY "The Beatles ---- "; DISPLAY s1; s1 = s1 - {s2}; DISPLAY s1; s1={}; DISPLAY s1; Output: The Beatles ---- {John, Paul, Ringo, George} {Paul, Ringo, George} { }
17
17 Compilation (Makefile): CFLAG = -g sset: y.tab.o g++ -g -o sset y.tab.o y.tab.o: y.tab.c lex.yy.c g++ -c $(CFLAG) y.tab.c y.tab.c: start.y yacc start.y lex.yy.c: start.l flex start.l Other References: http://www.combo.org/lex_yacc_page/lex.html http://www.combo.org/lex_yacc_page/lex.html http://www.epaperpress.com http://www.epaperpress.com http://www.gnu.org http://www.gnu.org http://www.cygnus.com http://www.cygnus.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.