Tutorial 1 Scanner & Parser CPSC 325 - Compiler Tutorial 1 Scanner & Parser
Scanner and Parser Scanner Parser Input token Grammar Parser Tree (Syntax Tree)
Token – example 1 Jeremy sees the cute monkey. Jeremy – noun sees – verb the – determiner cute – adjective monkey – noun
Token – example 2 int x = ( 3 + 21 ) * 6; int – type (reserve/key word) x – variable = – assign (reserve/key word) ( – left-prentices (reserve/key word) 3 – digit + – plus (reserve/key word) 21 – digit ) – right-prentices (reserve/key word) * – multiple (reserve/key word) 6 – digit ; – semi-colon/end (reserve/key word)
Jeremy sees the cute monkeys sleeps Parsing Top-Down Parsing Bottom-up Parsing Ambiguous Jeremy sees the cute monkeys sleeps int x = ( 3 + 21 ) * 6;
Context Free Grammar (CFG)
Simple Debug Miss the end int x = 3 // “;” missing – add it in Extra ending int x = 3&; // “&” is extra – remove it Output the possible part x = 3 + // ??? – what happen here? (Type, second argument, semi-colon, etc.) Note: Some of them is impossible to debug. (For example: misspell, missing argument)
Practice Parse the following: The big dog crush the small kid. double x = y + 3 / 2; // Syntax Write the grammar for the following: sentence Noun phrase Verb phrase noun I verb sit Prep. phrase prep on you
Practice Parse and write the grammar ( 5 + 2 ) * 4 – ( 1 + 2 )
Lex Lex will unify the string which fit the patterns Good for search through a program or a document You can specify in C or C++ for what action should take when an input string been found.
Lex - Structure Declarations/Definitions %% Rules/Production - Lex expression - white space - C statement (optional) Additional Code/Subroutines
Lex – statement Example %% [+++]+.* ; Remove all of comment in Aldor code. - after lex expression, the C statement is empty. (So, takes no action)
Lex – Basic operators * - zero or more occurrences . - “ANY” character .* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345] (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. - - [a-zA-Z] a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.
Lex – Additional Code %% main() { yylex(); [any C statements] } - When you compile the Lex file, __.lex or __.l, it will generates lex.yy.c file, which define yylex(). - Type “man lex” in UNIX system for more information.