Lexical and Syntax Analysis David Woolbright
Syntax Analysers or Parsers Usually based on a formal description of a language Context-free grammars or BNF are used to describe the syntax of a language Parsers are organized into two parts Lexical analyser – small scale language constructs – names, literals Syntax analyser – large scale language constructs – expressions, statements, …
Why Separate? Simplicity – lexical analysis is easier than syntax analysis Efficiency – lexical analysis can be optimized Portability – lexical analysis is platform dependent
Lexical Analysis Pattern matching Languages described by DFA Easily translated Acts as a front end for syntactic analysis Collects characters and outputs lexemes (character groupings). Internal codes are called tokens
Lexical Analysis Produces the “next” lexeme and token Skips comments and white space Inserts user-defined names into a symbol table (for later use by a compiler) Detect syntactic errors in tokens
Building a Lexical Analyser Write a formal description of the token patterns using a descriptive language related to regular expressions. Use a software tool to generate the lexical analyser Design a DFA for the token patterns and write a program that implements the DFA Design a DFA for the token patterns and hand-construct a table driven implementation of the DFA