Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming Languages An Introduction to Grammars Oct 18th 2002.

Similar presentations


Presentation on theme: "Programming Languages An Introduction to Grammars Oct 18th 2002."— Presentation transcript:

1 Programming Languages An Introduction to Grammars Oct 18th 2002

2 Context Free Grammars One or more non terminal symbols One or more non terminal symbols Lexically distinguished, e.g. upper case Lexically distinguished, e.g. upper case Terminal symbols are actual characters in the language Terminal symbols are actual characters in the language Or they can be tokens in practice Or they can be tokens in practice One non-terminal is the distinguished start symbol. One non-terminal is the distinguished start symbol.

3 Grammar Rules Non-terminal ::= sequence Non-terminal ::= sequence Where sequence can be non-terminals or terminals Where sequence can be non-terminals or terminals At least some rules must have ONLY terminals on the right side At least some rules must have ONLY terminals on the right side

4 Example of Grammar S ::= (S) S ::= (S) S ::= S ::= S ::= empty S ::= empty This is the language D2, the language of two kinds of balanced parens This is the language D2, the language of two kinds of balanced parens E.g. (( >)) E.g. (( >)) Well not quite D2, since that should allow things like (())<> Well not quite D2, since that should allow things like (())<>

5 Example, continued So add the rule So add the rule S ::= SS S ::= SS And that is indeed D2 And that is indeed D2 But this is ambiguous But this is ambiguous ()<>() can be parsed two ways ()<>() can be parsed two ways ()<> is an S and () is an S ()<> is an S and () is an S () is an S and <>() is an S () is an S and <>() is an S Nothing wrong with ambiguous grammars Nothing wrong with ambiguous grammars

6 BNF (Backus Naur/Normal Form) Properly attributed to Sanskrit scholars Properly attributed to Sanskrit scholars An extension of CFG with An extension of CFG with Optional constructs in [] Optional constructs in [] Sequences {} = 0 or more Sequences {} = 0 or more Alternation | Alternation | All these are just short hands All these are just short hands

7 BNF Shorthands IF ::= if EXPR then STM [else STM] fi IF ::= if EXPR then STM [else STM] fi IF ::= if EXPR then STM fi IF ::= if EXPR then STM fi IF ::= if EXPR then STM else STM fi IF ::= if EXPR then STM else STM fi STM ::= IF | WHILE STM ::= IF | WHILE STM ::= IF STM ::= IF STM ::= WHILE STM ::= WHILE STMSEQ ::= STM {;STM} STMSEQ ::= STM {;STM} STMSEQ ::= STM STMSEQ ::= STM STMSEQ ::= STM ; STMSEQ STMSEQ ::= STM ; STMSEQ

8 Programming Language Syntax Expressed as a CFG where the grammar is closely related to the semantics Expressed as a CFG where the grammar is closely related to the semantics For example For example EXPR ::= PRIMARY {OP | PRIMARY} EXPR ::= PRIMARY {OP | PRIMARY} OP ::= + | * OP ::= + | * Not good, better is Not good, better is EXPR ::= TERM | EXPR + TERM EXPR ::= TERM | EXPR + TERM TERM ::= PRIMARY | TERM * PRIMARY TERM ::= PRIMARY | TERM * PRIMARY This implies associativity and precedence This implies associativity and precedence

9 PL Syntax Continued No point in using BNF for tokens, since no semantics involved No point in using BNF for tokens, since no semantics involved ID ::= LETTER | LETTER ID ID ::= LETTER | LETTER ID Is actively confusing since the BC of ABC is not an identifier, and anyway there is no tree structure here Is actively confusing since the BC of ABC is not an identifier, and anyway there is no tree structure here Better to regard ID as a terminal symbol. In other words grammar is a grammar of tokens, not characters Better to regard ID as a terminal symbol. In other words grammar is a grammar of tokens, not characters

10 Grammars and Trees A Grammar with a starting symbol naturally indicates a tree representation of the program A Grammar with a starting symbol naturally indicates a tree representation of the program Non terminal on left is root of tree node Non terminal on left is root of tree node Right hand side are descendents Right hand side are descendents Leaves read left to right are the terminals that give the tokens of the program Leaves read left to right are the terminals that give the tokens of the program

11 The Parsing Problem Given a grammar of tokens Given a grammar of tokens And a sequence of tokens And a sequence of tokens Construct the corresponding parse tree Construct the corresponding parse tree Giving good error messages Giving good error messages

12 General Parsing Not known to be easier than matrix multiplication Not known to be easier than matrix multiplication Cubic, or more properly n**2.71.. (whatever that unlikely constant is) Cubic, or more properly n**2.71.. (whatever that unlikely constant is) In practice almost always linear In practice almost always linear In any case not a significant amount of time In any case not a significant amount of time Hardest part by far is to give good messages Hardest part by far is to give good messages

13 Regular Grammars Also called type 3 Also called type 3 Only one non-terminal on right side Only one non-terminal on right side Right side is either Right side is either Terminal Terminal Terminal Non-terminal Terminal Non-terminal Correspond to regular expressions Correspond to regular expressions Used to express grammar of tokens Used to express grammar of tokens Real_Literal ::= {digit}*. {digit}* [E[+|-]integer_literal] Real_Literal ::= {digit}*. {digit}* [E[+|-]integer_literal]

14 Context Sensitive Grammars Also called type 1 Also called type 1 Left side can contain multiple symbols Left side can contain multiple symbols JUNK digit => digit JUNK JUNK digit => digit JUNK But rhs cannot be shorter than left side But rhs cannot be shorter than left side Can express powerful rules Can express powerful rules But hard to parse But hard to parse

15 Type 0 Grammars No restrictions on left/right sides No restrictions on left/right sides Arbitrary mixture of terminals/non- terminals on either side Arbitrary mixture of terminals/non- terminals on either side Very powerful (Turing powerful) Very powerful (Turing powerful) Impossible to parse Impossible to parse

16 In Practice … Context free grammars used to express syntax. Context free grammars used to express syntax. Other rules (formal english, VDL) used to express static semantics Other rules (formal english, VDL) used to express static semantics Possible to do more in grammars Possible to do more in grammars E.g. Algol-68 W-grammars E.g. Algol-68 W-grammars But too difficult to deal with But too difficult to deal with


Download ppt "Programming Languages An Introduction to Grammars Oct 18th 2002."

Similar presentations


Ads by Google