Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

Syntax The Structure of a Language

Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters and collects them into tokens

Tokens Reserved words (keywords) –if while Literals or constants –3.14 “Fred” Special symbols –+ = Identifiers

Principle of Longest Substring At each point, the longest possible string is collected into a single token Natural token separators –Token separators ; + = –White space Spaces and tabs Newlines Comments

FORTRAN violates these rules DO 99 I = 1.10 –Assigns 1.10 to the variable DO99I DO 99 I = 1,10 –Sets up a loop with loop counter I going from 1 to 10 FORTRAN has no reserved words at all

C token conventions Six classes of tokens –Identifiers –Keywords –Constants –String literals –Operators –Other operators White space characters are ignored except as they separate tokens Adheres to the principle of longest substring

Regular Expressions Regular expressions were invented by Stephen Kleene and appeared in a Rand Corporation report in about 1950 Regular expressions represent a form of language definition Each regular expression E denotes a language L(E) defined over the alphabet of the language

Rules defining REs Empty –  is a RE Atom –Any symbol from the alphabet is a RE Alternation –If a and b are REs then so is a|b –All strings identified by a and all those identified by b Concatenation –If a and b are REs then so is ab –All strings formed by concatenating a string identified by b to the end of one identified by a

More rules for REs Kleene Closure –If a is an RE then so is a* –All strings formed by concatenating zero or more strings identified by a Positive Closure –If a is an RE then so is a+ –All strings formed by concatenating one or more strings identified by a

Examples of Res (a|b)c –Recognizes ac and bc but no others (a|b)*c –Recognizes c ac bc aac abc abac (a|b)+c –Does not recognize c but all the others above

Extensions [] – any one of a set of characters –[A-Z] – any capitol letter – [0123456789] – any digit ? – an optional item (0 or 1 of these) –[A-Z][0-9]? – a single capitol letter or a single capitol letter followed by a single digit. (period) – any character

More Examples [0-9]+ –Simple integer constants [0-9]+(\.[0-9])? –Simple floating-point constants

Context-Free Grammars (CFGs) Context-free grammars were developed by Noam Chomsky as a way to specify language Rules are generally specified in Backus-Naur Form (BNF) or ain Extended BNF (EBNF)

What makes up a CFG? A set N of non-terminal symbols A set T of terminal symbols A set P of production rules A special non-terminal symbol S called the start symbol (or goal symbol)

Sample CFG sentence  noun-phrase verb-phrase. noun-phrase  article noun article  a | the noun  girl | dog verb-phrase  verb noun-phrase verb  sees | pets

Parts of the grammar Non-terminal symbols: {sentence, noun-phrase, article, noun, verb- phrase, verb} Terminal Sumbols {.,a, the, girl, dog, sees, pets} Production rules The previous slide provides these Start Symbol sentence

Notes on CFG Non-terminal symbols are those that appear on the left-hand side (lhs) of the production rules Terminal symbols are those that appear only on the right-hand side (rhs) of the production rules  and | are meta-symbols

(Left-Most) Derivation sentence  noun-phrase verb-phrase.  article noun verb-phrase.  the noun verb-phrase.  the girl verb-phrase.  the girl verb noun-phrase.  the girl sees noun-phrase.  the girl sees article noun.  the girl sees a noun.  the girl sees a dog.

Corresponding Parse Tree sentence noun-phraseverb-phrase. articlenoun verb noun-phrase articlenoun the girlsees adog

Ambiguous Grammars A grammar is ambiguous of a sentence has two distinct derivations or two distinct parse trees

Grammar for expressions expr  expr + expr | expr * expr | (expr) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Parse trees for 3 + 5 * 7 expr + * + * number digit 3 number digit 5 number digit 7 number digit 3 number digit 5 number digit 7

Handling Ambiguity The grammar rules for expressions can be modified to eliminate the ambiguity that precedence should take care of Introduce a new non-terminal that forces the higher-precedence operator lower in the parse tree

Precedence handled expr  expr + expr | term term  term * term | ( expr ) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Associativity This grammar is still ambiguous There are two parse trees for 5 + 7 + 9 This may be ok for addition & multiplication, but not for subtraction & addition which are left-associative

Revised Grammar (not ambiguous) expr  expr + term | term term  term * factor | factor factor  ( expr ) | number number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

EBNFs Extended BNF adds more metasymbols { } – a repeated item (0 or more times) [ ] – an optional item (0 or 1 time)

Expression Grammar in EBNF expr  term { + term } term  factor { * factor } factor  ( expr ) | number number  digit { digit } digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

EBNF for if-statement if-statement  if (expression) statement [ else statement ]

Syntax Diagrams Syntax diagrams are an alternative to EBNF Study the diagrams on pp 99-101 and observe the direct relationship of each to the EBNF grammar rules for expressions

Parsers This simplest parser is a recognizer Accepts or rejects strings on whether they are legal strings in the language More general parsers Build parse trees (or abstract syntax trees) May calculate values of expressions, etc.

Bottom-up Parsers Attempts to match the input with the RHSs of the grammar rules When a match occurs, the RHS is replaced by the non-teminal on the LHS of the rule (called a reduce) Sometimes called shift-reduce parsing

Top-down Parsers Non-terminals are expanded to match incoming tokens and the parser directly constructs a derivation

Recursive-Descent Parsing A program made up of a collection of mutually recursive procedures, one for each non-terminal.

Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

Similar presentations

Presentation on theme: "Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

Similar presentations

Presentation on theme: "Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters."— Presentation transcript:

Similar presentations

About project

Feedback