Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.

Similar presentations


Presentation on theme: "CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann."— Presentation transcript:

1 CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

2 Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Today’s Topics Questions/comments? Syntax & Semantics

3 Now Chapter 3... Describing a language –How to give it a clear and precise definition so that implementers (compiler writers) will get it right –How to describe it to users (programmers) of the language Michael Eckmann - Skidmore College - CS 330 - Fall 2007

4 Syntax and Semantics Describing a language involves both it's syntax and semantics Syntax is the form, semantics is the meaning –e.g. English language example: Time flies like an arrow. –Syntactically correct but has 3 different meanings (semantics) Easier to describe syntax formally than it is to describe semantics formally Michael Eckmann - Skidmore College - CS 330 - Fall 2007

5 Syntax A language is a set of strings (or sentences or statements) of characters from an alphabet. Lexemes vs. tokens –tokens (more general) are categories of lexemes (more specific) –e.g. Some tokens might be: identifier, int_literal, plus_op –e.g. Some lexemes might be: idx, 42, + Michael Eckmann - Skidmore College - CS 330 - Fall 2007

6 Syntax Lexemes idx = 42 + Count ; Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Tokens identifier equal_sign int_literal plus_op identifier semicolon idx = 42 + count;

7 Syntax Recognizers and generators are used to define languages. Generators generate valid programs in a language. Recognizers determine whether or not a program is in the language (valid syntactically.) Generators are studied in Chapter 3 (stuff coming up next in this lecture) and recognizers (parsers) in Chapter 4. How many valid programs are there for some particular language, say Java? Michael Eckmann - Skidmore College - CS 330 - Fall 2007

8 Syntax Context Free Grammars (CFGs) developed by Noam Chomsky are essentially equal to Backus-Naur Form (BNF) by Backus and Naur. They are used to describe syntax. These are metalanguages (languages used to describe languages.) Michael Eckmann - Skidmore College - CS 330 - Fall 2007

9 Syntax A Context Free Grammar is a four-tuple (T, N, S, P) where –T is the set of terminal symbols –N is the set of non-terminal symbols –S is the start symbol (which is one of the non- terminals) –P is the set of productions of the form: A -> X 1... X M where –A element of N –X i element of N U T, 1 =0 Michael Eckmann - Skidmore College - CS 330 - Fall 2007

10 Syntax How are CFGs used to describe the syntax of a programming language? –The nonterminals are abstractions –The terminals are tokens and lexemes –The productions are used to describe programs, individual statements, expressions etc. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

11 Syntax Example production:  while ( ) Everything to the left of the arrow is considered the left-hand side, LHS, and to the right the RHS. The only thing that can appear on the LHS is one nonterminal. Multiple RHS's for a LHS are separated by the | or symbol, e.g.  ; | { } Michael Eckmann - Skidmore College - CS 330 - Fall 2007

12 Syntax Recursion is allowed in productions, e.g.  ident | ident, Michael Eckmann - Skidmore College - CS 330 - Fall 2007

13 Syntax An example grammar:   | ;  =  a | b | c | d  + | -  | const Michael Eckmann - Skidmore College - CS 330 - Fall 2007

14 Syntax Derivations are repeated applications of production rules. An example derivation: => => => = => a = => a = + => a = b + => a = b + const Michael Eckmann - Skidmore College - CS 330 - Fall 2007

15 Syntax Every string of symbols in the derivation is a sentential form A sentence is a sentential form that has only terminal symbols A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded in each step of the derivation. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

16 Syntax / Parse Trees Michael Eckmann - Skidmore College - CS 330 - Fall 2007 A hierarchical representation of a derivation (parse trees also hold some semantic information) const a = b +

17 An Ambiguous Expression Grammar  | const  / | - const --//

18 This one is now unambiguous Ambiguity is bad for compilers, so the language description should be unambiguous.  - |  / const | const Compiler examines parse tree to determine the code to generate. Two parse trees for the same syntax causes the meaning (semantics) of the code to not be unique. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

19 Ambiguous? Look at the if statement rules below  if then | if then else  |... Do you think this is ambiguous? That is, can more than one parse tree be generated from the same code? if (a==b) then if (c==d) then print_something() else print_something_else() Michael Eckmann - Skidmore College - CS 330 - Fall 2007

20 Ambiguous? if (a==b) then if (c==d) then print_something() else print_something_else() if (a==b) then if (c==d) then print_something() else print_something_else() Michael Eckmann - Skidmore College - CS 330 - Fall 2007 if (a==b) then if (c==d) then print_something() else print_something_else()

21 Ambiguous? To make it unambiguous take a look at page 131 in our text. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

22 Extended BNF So far the examples we've seen have used BNF. Common extensions to BNF include: Use of square brackets [ ] to enclose optional parts of RHS's. Use of braces { } to enclose parts of RHS's that can be repeated indefinitely or left out. That is, the part in the braces may be repeated 0 or more times. Use of parentheses ( ) around a group of items of which one is chosen. The items are seperated by |. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

23 Extended BNF It should be obvious that these new symbols are not terminal symbols in the language nor are they non- terminals. If the language does require brackets, braces or parentheses as terminal symbols (as many languages do) they have to be denoted in some way like underlining them to differentiate them from the EBNF symbols. What good are these extensions? Michael Eckmann - Skidmore College - CS 330 - Fall 2007

24 CFG's and Recognizers Given a formal description of a language, a recognizer (syntax analyzer, aka parser) for that language can be algorithmically constructed. Therefore a program can be written to do this. yacc is an example of one. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

25 A more complex grammar Let's take a look at the handout for the mini-pascal language. Let's try to determine if some programs are syntactically correct. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

26 Attribute Grammars An attribute grammar is an extension to a CFG. There are some rules of programming languages that cannot be specified in BNF (or by a CFG for that matter.) e.g. All variables must be declared before they are used. Also, there are things that are possible, but just too hairy to specify using CFG's, so Attribute Grammars are used. These kinds of things that cannot be specified using CFGs are termed “static semantics.” This is a bit of a misnomer because they are really still syntax rules not semantics. Michael Eckmann - Skidmore College - CS 330 - Fall 2007

27 Attribute Grammars Michael Eckmann - Skidmore College - CS 330 - Fall 2007 An attribute grammar is a CFG (S, N, T, P) with the following additions: –For each grammar symbol x there is a set A(x) of attribute values –Each rule has a set of functions that define certain attributes of the nonterminals in the rule –Each rule has a (possibly empty) set of predicates to check for attribute consistency Proposed by Knuth in 1968.

28 Attribute Grammars Michael Eckmann - Skidmore College - CS 330 - Fall 2007 The example on page 138 shows the use of an attribute grammar to enhance the BNF of an assignment statement with rules that specify the allowable types that can be assigned to each other. e.g. A float (real) cannot be assigned to a variable whose type is int. But the opposite is allowed. Also, the example shows how one can determine the resulting type of an expression.

29 Attribute Grammars Michael Eckmann - Skidmore College - CS 330 - Fall 2007 I'm not concerned with us knowing all the ins and outs of attribute grammars, but what I feel is important is the general concepts involved and the intended purpose of them. Attribute grammars are generally not used in practice for a few reasons. Can you guess them?

30 Attribute Grammars Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Attribute grammars are generally not used in practice for a few reasons. Can you guess them? –Size and complexity of the grammar will be high for a typical modern programming language –The many attributes and rules that need to be added cause the grammar to be difficult to read and write, formally –The attribute values during parsing would be costly to evaluate (the way it is described in the text.) So, in practice less formal ways are used to check for “static semantics” at compile-time but the ideas are the same.

31 Practice Problems Michael Eckmann - Skidmore College - CS 330 - Fall 2007 Before moving on to our discussion of formally describing the Semantics of a language, let's take a look at problem 8, 10 and 11 on pages 163-164.


Download ppt "CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann."

Similar presentations


Ads by Google