CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols –Can automate the process of checking correct syntax Semantics give meaning to the (structured) symbols –E.g., kinds of labels, types of variables, layout of classes –For example, what does 11 mean? (at least 3 good answers) Separating syntactic and semantic evaluation helps –Can isolate the problem of syntactic recognition in an engine –Can use the structure produced by the engine directly –Sometimes called syntax-directed (compiler in charge)
CSE 425: Syntax I Syntax and Lexical Structure Syntax gives the structure of statements in a language –E.g., the format of tokens and how they can be arranged –Lexical structure also describes how to recognize them Scanning obtains tokens from a stream characters –E.g., whitespace delimited vs. regular-expression based –Tokens include keywords, constants, symbols, identifiers –Usually based on assumption of taking longest substring Parsing recognizes more complex expressions –E.g., well-formed statements in logic, arithmetic, etc. –Free-format languages ignore indentation, etc. while fixed format languages have specific restrictions/requirements
CSE 425: Syntax I Scanning vs. Parsing Roles It is often possible to simplify a grammar’s structure by making its tokens more sophisticated –For example, scanning for the terminal token NUMBER vs. parsing for the non-terminal number → nonzerodigit digit* Such simplification delegates work to a scanner –Often this is a good separation of concerns, especially since scanning may appropriately specialize it logic, etc. –E.g., a fairly general scanner built from classification functions (which look for all digits, all alphabetic, etc.) can be re-used or refactored easily for scanning different grammars –E.g., the C++11 library is worth studying and using
CSE 425: Syntax I Regular Expressions, DFAs, NDFAs Regular expressions capture lexical structure of symbols that can be built using 3 composition rules –Concatenation (ab), selection (a | b), repetition (b*) Finite automata can recognize regular expressions –Deterministic finite automata (DFAs) associate a unique state with each sequence generated by a regular expression –Non-deterministic finite automata (NDFAs) let multiple states to be reached by the same input sequence (adding “choice”) Can generate a unique (minimal) DFA in 3 steps –Generate NDFA from the regular expression (Scott pp. 56) –Convert NDFA to (possibly larger) DFA (Scott pp ) –Minimize the DFA (Scott pp. 59) to get a unique automaton C++11 library automates all this for you
CSE 425: Syntax I Today’s Studio Exercises We’ll code up some ideas from Scott Chapter –Looking at mechanisms for recognizing tokens and for parsing basic CFGs with straightforward recursion –Next studio we’ll look at more complicated variations Today’s exercises are all in C++ –We’ll write our own code, but check out the library too, since you’ll be allowed to use it for lab assignments! –Please take advantage of the on-line tutorial and reference manual pages that are linked on the course web site –As always, please ask us for help as needed When done, your answers to the course account with “Syntax Studio I” in the subject line