Presentation is loading. Please wait.

Presentation is loading. Please wait.

S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

Similar presentations


Presentation on theme: "S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract."— Presentation transcript:

1 S YNTAX

2 Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams Nandigam 2

3 Programming Language Specification PLs require precise definitions (i.e. no ambiguity) Language form (Syntax) Language meaning (Semantics) Consequently, PLs are specified using formal notation: Formal syntax Tokens Grammar Formal semantics Operational Denotational Axiomatic Nandigam 3

4 Lexical Structure of PLs Nandigam 4

5 Lexical Structure of PLs (cont.) Main task of scanner: identify tokens Basic building blocks of programs E.g. keywords, identifiers, numbers, punctuation marks Lexeme – an instance of a token. One can think of programs as strings of lexemes rather than of characters A token of a language is a category of its lexemes (or instances) Some tokens can have one or more lexemes E.g. keyword, identifier, number In some cases, a token has only one single possible lexeme E.g. equal_sign, plus_op, mult_op Nandigam 5

6 Lexical Structure of PLs (cont.) Consider the following Java statement: index = 2 * count + 17 ; The lexemes and tokens of this statement are: Nandigam 6 LexemesTokens indexidentifier =equal_sign 2int_literal *mult_op countidentifier +plus_op 17int_literal ;semicolon

7 Lexical Structure of PLs (cont.) Tokens in a programming language are described formally by regular expressions. Regular expressions – descriptions of patterns of characters Regular expression operations Basic operations Concatenationitem sequencing Choice or selection| Repetition * Grouping( ) Additional operations One or more repetitions+ Range of characters[ - ] Optional? Any character. Nandigam 7

8 Lexical Structure of PLs (cont.) Regular expression examples (a|b)*c String that match include ababaac, aac, bbc, c, and babc [0-9]+ Integer constants with one or more digits [0-9]+(\.[0-9]+)? Floating-point literals [a-zA-Z][a-zA-Z0-9_]* Identifiers Nandigam 8

9 Lexical Structure of PLs (cont.) Scanners generators: lex, flex ANTLR – Another Tool for Language Recognition These programs can be used to generate a program (i.e., a scanner) that can extract tokens from a stream of characters. Many PLs provide good support for regular expressions – Java, C#, Perl, Ruby, … Support for regular expressions in Java java.util.regex package split() method of String class Nandigam 9

10 Syntactic Structure of PLs Specifying the form of a programming language Tokens Regular Expression Syntax – organization of tokens Context-Free Grammars (CFGs) Nandigam 10

11 Context-Free Grammar Context-free grammars (CFGs) are used to describe the syntax of PLs. Proposed by Noam Chomsky – a noted linguist BNF (Backus-Naur Form) is a notation for describing syntax. Proposed by John Backus and Peter Naur CFG and BNF are nearly identical and are used interchangeably. BNF is a metalanguage for programming languages. A metalanguage is a language that is used to describe another language. Nandigam 11

12 Context-Free Grammar (cont.) CFG or BNF consists of a series of rules or productions. Productions are made up of: Nonterminals – structures that are broken down into further structures Terminals – things that cannot be broken down Metasymbols Symbols that are part of CFG/BNF These are not actual symbols in the language being described Sometimes, a metasymbol is also an actual symbol in a language One of the nonterminals is designated as the start symbol. The start symbol stands for the entire structure being defined. Nandigam 12

13 Context-Free Grammar (cont.) CFG/BNF Example (Figure 4.2, page 83) (1)sentence → noun-phrase verb-phrase. (2)noun-phrase → article noun (3)article → a | the (4)noun → girl | dog (5)verb-phrase → verb noun-phrase (6)verb → sees | pets Nandigam 13

14 Context-Free Grammar (cont.) The language of a CFG is the set of strings of terminals that can be generated from the start symbol by a derivation: sentence  noun-phrase verb-phrase. (rule 1)  article noun verb-phrase. (rule 2)  the noun verb-phrase. (rule 3)  the girl verb-phrase. (rule 4)  the girl verb noun-phrase. (rule 5)  the girl sees noun-phrase. (rule 6)  the girl sees article noun. (rule 2)  the girl sees a noun. (rule 3)  the girl sees a dog. (rule 4) Nandigam 14

15 Context-Free Grammar (cont.) Derivation – Generating sentences of the language through a sequence of applications of rules (or productions), beginning with a special nonterminal called the start symbol. Leftmost derivation – The replaced nonterminal is always the leftmost nonterminal. Rightmost derivation – The replaced nonterminal is always the rightmost nonterminal. A derivation may be neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar. Nandigam 15

16 Context-Free Grammar (cont.) A grammar for a small language → begin end → | ; → := → + | - | →A | B | C Derive the following program: begin A := B + C ; B := C end Is the language defined by this grammar finite or infinite? Nandigam 16

17 Context-Free Grammar (cont.) Left recursive rule – A BNF rule is left recursive if the left-hand side (LHS) appears at the beginning of its right-hand side (RHS). Right recursive rule – A BNF rule is right recursive if the LHS appears at the right end of the RHS. Examples: number  number digit | digit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr  expr + expr | expr  expr | ( expr ) | number Uses of recursion in BNF: to show repetition to describe complex structures Nandigam 17

18 Parse Trees A parse tree is a graphical representation of hierarchical syntactic structure of sentences. It describes graphically the replacement process in a derivation. A parse tree is labeled by nonterminals at interior nodes and terminals at leaves. A parse tree better expresses the structure inherent in a derivation. Nandigam 18

19 Parse Trees (cont.) Problem 1: → := → + | * | ( ) | → A | B | C Show a leftmost derivation and a parse tree for each of the following statements: A := A + ( B * C ) A := B + C + A A := A * ( B + C ) A := B * ( C * ( A + B ) ) Nandigam 19

20 Parse Trees (cont.) Problem 2: Describe, in English, the language defined by the following grammar: → → a | a → b | b → c | c Problem 3: Consider the following grammar: → a b → b | b → a | a Which of the following sentences are in the language generated by this grammar? baab bbbab bbaaaaa bbaab Nandigam 20

21 Parse Trees (cont.) Problem 4: Consider the following grammar: → a c → | b → c | c → d | Which of the following sentences are in the language generated by the grammar? abcd acccbd acccbcc acd accc Nandigam 21

22 Abstract Syntax Trees Parse trees are still too detailed in their structure, since every step in a derivation is expressed as nodes Abstract Syntax Tree or (just syntax tree) shows the essential structure of a parse tree. AST is more compact than the corresponding parse tree An (abstract) syntax tree condenses a parse tree to its essential structure Language designers and translator writers are most interested in abstract syntax. A programmer is most interested in concrete syntax Examples on the next two slides… Nandigam 22

23 Abstract Syntax Trees (cont.) Nandigam 23 Parse TreeCorresponding AST

24 Abstract Syntax Trees (cont.) Nandigam 24 Parse TreeCorresponding AST

25 Ambiguous Grammars A grammar is ambiguous if it is possible to construct two or more distinct parse trees for the same string Example: Grammar: expr  expr + expr | expr  expr | ( expr ) | NUMBER Expression: 2 + 3 * 4 Parse trees – ambiguity in operator precedence Nandigam 25

26 Ambiguous Grammars (cont.) Another Example: Grammar: expr  expr + expr | expr  expr | ( expr ) | NUMBER Expression: 2 - 3 - 4 Parse trees – ambiguity in operator associativity Nandigam 26

27 Ambiguous Grammars (cont.) Ways to resolve ambiguities in a grammar Revise grammar – desired approach Provide disambiguating rule (semantic help) Revising grammar to address precedence and associativity ambiguities Do not write rules that allow a parse tree to grow on both left and right sides Use left recursive rules for left-associative operators Use right recursive rules for right-associative operators Add new rules that establish “precedence cascade” between rules to specify precedence Make sure operators with higher precedence appear lower in the cascade of rules Revised grammar expr  expr + term | term term  term * factor | factor factor  ( expr ) | NUMBER Nandigam 27

28 Ambiguous Grammars (cont.) Problem 1: → + | - | * | / | ( ) | NUMBER NUMBER= [0-9]+ Show that this grammar is ambiguous by constructing two distinct parse trees for each of the following expressions: 30 + 5 + 2 30 – 5 – 2 30 * 5 * 2 30 / 5 / 2 30 + 5 * 2 Nandigam 28

29 Ambiguous Grammars (cont.) Revised unambiguous grammar → + | - | → * | / | → ( ) | NUMBER NUMBER= [0-9]+ Nandigam 29

30 Ambiguous Grammars (cont.) Problem 2: Show that the following grammar is ambiguous: → → + | → a | b | c Nandigam 30

31 Ambiguous Grammars (cont.) Are there other alternatives to resolving ambiguities? Yes, but they change the language! Fully-parenthesized expressions: expr  ( expr + expr ) | ( expr - expr ) | NUMBER Prefix expressions: expr  + expr expr | - expr expr | NUMBER Nandigam 31

32 Extended BNF Adds new metasymbols (or operations) to BNF to enhance readability and writability. These new extensions do not enhance the descriptive power of BNF. It facilitates development of parsing tools based on an approach called Recursive-Descent Parsing. New metasymbols added to EBNF: { }zero or more repetitions [ ]optional parts ( | )multiple-choice Nandigam 32

33 Extended BNF (cont.) Examples: BNF: → | EBNF: → { } BNF: → + | EBNF: → {+ } BNF: → ^ | EBNF: → [^ ] BNF: → if then | if then else EBNF → if then [else ] BNF: → for := to do | for := downto do EBNF: → for := (to | downto) do Nandigam 33

34 Extended BNF (cont.) More examples: BNF: → + | → * | / | % | → ^ | factor → ( ) | NUMBER NUMBER = [0-9]+ EBNF: → {+ } → { * | / | % } → [^ ] → ( ) | NUMBER NUMBER = [0-9]+ Nandigam 34

35 Syntax Diagrams A graphical representation for a grammar rule An alternative to EBNF Circle or ovals for terminals Squares or rectangles for nonterminals Terminals and nonterminals are connected with lines and arrows Visually appealing but takes up space Rarely seen any more: EBNF is much more compact Nandigam 35


Download ppt "S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract."

Similar presentations


Ads by Google