Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.

Similar presentations


Presentation on theme: "Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How."— Presentation transcript:

1 Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How to describe them? -- How to use them? (machine and human) Grammars --Ambiguous (sometimes) Textbook, manuals --Confusing (always) solution: denotation semantics (for nuts only) solution: using unambiguous only

2 English Grammar The man hit the ball. subject verb object 10/18/2015IT 3272 The man saw the girl with a telescope. subject verb object The purpose of grammar: To have a device to generate all valid sentences in the target language (from a root). To tell whether a sentence is valid. Chomsky: (old fashion)

3 Noam Chomsky 1928 - 10/18/2015IT 3273 http://www.canada.com/nationalpost/news/issuesideas/story.html?id=1385b76d-6c34-4c22-942a-18b71f2c4a44 Syntactic Structures (1957) Generative Grammar A valid sentence is generated from a root according to some fixed rules (grammar).

4 A generative grammar in Syntactic Structures 10/18/2015IT 3274 S NP VP TN Verb the | a man | ball | car hit | take | took | run | ran NP T N + VP Verb + + ….. root terminal symbols non-terminal symbols

5 Syntactic Structures 10/18/2015IT 3275 S NP VP TN Verb theman theball hit NP TN the man hit the ball

6 Backus-Naur Form, BNF 10/18/2015IT 3276 ::= ::= the ::= man | ball ::= hit | took ::= ::= loves | hates|eats ::= a | the ::= dog | cat | rat Grammar 1 Grammar 2 ::= | ::= loves | hates|eats |hit | took ::= a | the ::= the ::= dog | cat | rat|man | ball

7 Deviation: the sequence of processes that generate a sentence 10/18/2015IT 3277 <S><S> the the man the man hit the man hit the the man hit the ball ::= ::= the ::= man | ball := hit | took Grammar 1 the man hit the ball

8 10/18/2015IT 3278 Parse: v. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. (American Heritage Dict.) the dog loves the cat the loves dog the cat loves the dog the cat × ×

9 A Parse Tree 10/18/2015IT 3279 <S><S> the dog the cat loves Grammar ::= ::= loves | hates|eats ::= a | the ::= dog | cat | rat “the loves dog the cat” doesn’t have a parse tree

10 A grammar for Arithmetic Expression 10/18/2015IT 32710 ::= + | * | ( ) | a | b | c Example: ((a+b)*c) Is this expression valid? ( ) ( * ) (( ) * ) (( + ) * ) (( a + ) * ) (( a + b ) * ) ((a+b)*c) Yes

11 A Parse Tree for ((a+b)*c) 10/18/2015IT 32711 + ( ) * ( ) a b c

12 Parse Trees for a+b*c 10/18/2015IT 32712 + * ab c * + bc a ? What is the meaning of a+b*c

13 Restrictions on Grammars 10/18/2015IT 32713 Unrestricted Grammars (type-0) Why context sensitive grammars have less restrictions than context free grammars? Right/Left Linear Grammars (type-3) Context Sensitive (type-1) Context Free (type-2) Diagram in terms of the sizes of the set of restrictions

14 Chomsky Hierarchy 10/18/2015IT 32714 Regular Expressions (type-3) Computable (formal) languages (type-0) Context-free languages (type-2) Context-sensitive languages (type-1) Diagram in terms of the sizes of the language families

15 A BNF grammar consists of four parts: –The finite set of tokens (terminal symbols) –The finite set of non-terminal symbols –The start symbol –The finite set of production rules 10/18/2015IT 32715 ::= ::= the ::= man | ball ::= hit | took Grammars in BNF (Backus-Naur Form)

16 Constructing Grammars Using divide and conquer to simplify the job. Data types, variable names (identifiers) One variable, one type (this is not grammar’s job to make sure) 10/18/2015IT 32716 float a; boolean a, b, c; int a, b;

17 Primitive type names Using divide and conquer 10/18/2015IT 32717 ::= ; ::= boolean | byte | short | int | long | char | float | double ::= |, ::= | =

18 Tokens: How is such a program file (a sequence of characters) divided into a sequence of tokens? 10/18/2015IT 32718 e.g. identifiers ( const, x, fact ) keywords ( if, const ) operators ( == ) constants ( 123.4 ), etc. Programs stored in files are just sequences of characters, but we want to prepare them into tokens before further analysis. Reserved words Tokens are atoms of the program

19 Lexical Structure And Phrase Structure Grammars so far have defined phrase structure: how a program is built from a sequence of tokens We also need to define lexical structure: how a text file is divided into tokens 10/18/2015IT 32719

20 Separate Grammars Usually there are two separate grammars –to construct a sequence of tokens from a file of characters ( Lexical Structure) –to construct a parse tree from a sequence of tokens ( Phrase Structure) 10/18/2015IT 32720 ::= | ::= | | ::= | | ::= | | | …

21 Separate Compiler Passes Scanner  tokens string parser  parse tree (more to do afterwards) 10/18/2015IT 32721

22 Historical Note #1 Early languages sometimes did not separate lexical structure from phrase structure –Early Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keyword –Other languages like PL/I or Early Fortran allow keywords to be used as identifiers This makes them difficult to scan and parse It also reduces readability 10/18/2015IT 32722

23 Historical Note #2 Some languages have a fixed-format lexical structure -- column positions are significant –One statement per line (i.e. per card) –First few columns for statement label –Etc. Early dialects of Fortran, Cobol, and Basic Almost all modern languages are free-format: column positions are ignored 10/18/2015IT 32723

24 Other Grammar Forms BNF variations EBNF variations Syntax diagrams 10/18/2015IT 32724

25 BNF Variations Some use  or = instead of ::= Some leave out the angle brackets and use a distinct typeface for tokens Some allow single quotes around tokens, for example to distinguish ‘ | ’ as a token from | as a meta-symbol 10/18/2015IT 32725 Sir, please Step away from the ASR-33 Interesting operator!! Or not!

26 EBNF Variations Additional syntax to simplify some grammar chores: –{x} to mean zero or more repetitions of x –[x] to mean x is optional (i.e. x | ) –() for grouping –| anywhere to mean a choice among alternatives –Quotes around tokens, if necessary, to distinguish from meta-symbols 10/18/2015IT 32726

27 EBNF Examples Anything that extends BNF this way is called an Extended BNF: EBNF There are many variations 10/18/2015IT 32727 ::= { ;} ::= if then [else ] ::= { ( | ) ;}

28 Syntax Diagrams Syntax diagrams (“railroad diagrams”) 10/18/2015IT 32728 ifthenelse exprstmt if-stmt ::= if then else

29 Bypasses 10/18/2015IT 32729 ifthenelse exprstmt if-stmt ::= if then [else ]

30 Branching 10/18/2015IT 32730 ::= + | * | ( ) | a | b | c

31 Loops 10/18/2015IT 32731 ::= {+ }

32 Syntax Diagrams, Pro and Con Easier for human to read (follow) Difficult to perceive the phrase structures (syntax tree)? Harder for machine to read (for automatic parser-generators) 10/18/2015IT 32732

33 Conclusion We use grammars to define programming language syntax, both lexical structure and phrase structure Connection between theory and practice –Two grammars, two compiler passes –Parser-generators can produce code for those two passes automatically from grammars (compiler tools) 10/18/2015IT 32733


Download ppt "Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How."

Similar presentations


Ads by Google