Download presentation
Presentation is loading. Please wait.
Published byLillian Hines Modified over 9 years ago
1
Syntax The Structure of a Language
2
Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters and collects them into tokens
3
Tokens Reserved words (keywords) –if while Literals or constants –3.14 “Fred” Special symbols –+ = Identifiers
4
Principle of Longest Substring At each point, the longest possible string is collected into a single token Natural token separators –Token separators ; + = –White space Spaces and tabs Newlines Comments
5
FORTRAN violates these rules DO 99 I = 1.10 –Assigns 1.10 to the variable DO99I DO 99 I = 1,10 –Sets up a loop with loop counter I going from 1 to 10 FORTRAN has no reserved words at all
6
C token conventions Six classes of tokens –Identifiers –Keywords –Constants –String literals –Operators –Other operators White space characters are ignored except as they separate tokens Adheres to the principle of longest substring
7
Regular Expressions Regular expressions were invented by Stephen Kleene and appeared in a Rand Corporation report in about 1950 Regular expressions represent a form of language definition Each regular expression E denotes a language L(E) defined over the alphabet of the language
8
Rules defining REs Empty – is a RE Atom –Any symbol from the alphabet is a RE Alternation –If a and b are REs then so is a|b –All strings identified by a and all those identified by b Concatenation –If a and b are REs then so is ab –All strings formed by concatenating a string identified by b to the end of one identified by a
9
More rules for REs Kleene Closure –If a is an RE then so is a* –All strings formed by concatenating zero or more strings identified by a Positive Closure –If a is an RE then so is a+ –All strings formed by concatenating one or more strings identified by a
10
Examples of Res (a|b)c –Recognizes ac and bc but no others (a|b)*c –Recognizes c ac bc aac abc abac (a|b)+c –Does not recognize c but all the others above
11
Extensions [] – any one of a set of characters –[A-Z] – any capitol letter – [0123456789] – any digit ? – an optional item (0 or 1 of these) –[A-Z][0-9]? – a single capitol letter or a single capitol letter followed by a single digit. (period) – any character
12
More Examples [0-9]+ –Simple integer constants [0-9]+(\.[0-9])? –Simple floating-point constants
13
Context-Free Grammars (CFGs) Context-free grammars were developed by Noam Chomsky as a way to specify language Rules are generally specified in Backus-Naur Form (BNF) or ain Extended BNF (EBNF)
14
What makes up a CFG? A set N of non-terminal symbols A set T of terminal symbols A set P of production rules A special non-terminal symbol S called the start symbol (or goal symbol)
15
Sample CFG sentence noun-phrase verb-phrase. noun-phrase article noun article a | the noun girl | dog verb-phrase verb noun-phrase verb sees | pets
16
Parts of the grammar Non-terminal symbols: {sentence, noun-phrase, article, noun, verb- phrase, verb} Terminal Sumbols {.,a, the, girl, dog, sees, pets} Production rules The previous slide provides these Start Symbol sentence
17
Notes on CFG Non-terminal symbols are those that appear on the left-hand side (lhs) of the production rules Terminal symbols are those that appear only on the right-hand side (rhs) of the production rules and | are meta-symbols
18
(Left-Most) Derivation sentence noun-phrase verb-phrase. article noun verb-phrase. the noun verb-phrase. the girl verb-phrase. the girl verb noun-phrase. the girl sees noun-phrase. the girl sees article noun. the girl sees a noun. the girl sees a dog.
19
Corresponding Parse Tree sentence noun-phraseverb-phrase. articlenoun verb noun-phrase articlenoun the girlsees adog
20
Ambiguous Grammars A grammar is ambiguous of a sentence has two distinct derivations or two distinct parse trees
21
Grammar for expressions expr expr + expr | expr * expr | (expr) | number number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
22
Parse trees for 3 + 5 * 7 expr + * + * number digit 3 number digit 5 number digit 7 number digit 3 number digit 5 number digit 7
23
Handling Ambiguity The grammar rules for expressions can be modified to eliminate the ambiguity that precedence should take care of Introduce a new non-terminal that forces the higher-precedence operator lower in the parse tree
24
Precedence handled expr expr + expr | term term term * term | ( expr ) | number number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
25
Associativity This grammar is still ambiguous There are two parse trees for 5 + 7 + 9 This may be ok for addition & multiplication, but not for subtraction & addition which are left-associative
26
Revised Grammar (not ambiguous) expr expr + term | term term term * factor | factor factor ( expr ) | number number number digit | digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
27
EBNFs Extended BNF adds more metasymbols { } – a repeated item (0 or more times) [ ] – an optional item (0 or 1 time)
28
Expression Grammar in EBNF expr term { + term } term factor { * factor } factor ( expr ) | number number digit { digit } digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
29
EBNF for if-statement if-statement if (expression) statement [ else statement ]
30
Syntax Diagrams Syntax diagrams are an alternative to EBNF Study the diagrams on pp 99-101 and observe the direct relationship of each to the EBNF grammar rules for expressions
31
Parsers This simplest parser is a recognizer Accepts or rejects strings on whether they are legal strings in the language More general parsers Build parse trees (or abstract syntax trees) May calculate values of expressions, etc.
32
Bottom-up Parsers Attempts to match the input with the RHSs of the grammar rules When a match occurs, the RHS is replaced by the non-teminal on the LHS of the rule (called a reduce) Sometimes called shift-reduce parsing
33
Top-down Parsers Non-terminals are expanded to match incoming tokens and the parser directly constructs a derivation
34
Recursive-Descent Parsing A program made up of a collection of mutually recursive procedures, one for each non-terminal.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.