Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.

Similar presentations


Presentation on theme: "Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern."— Presentation transcript:

1 Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern University, Houston January, 2008

2 2 Review and Preview Last lecture  Introduction to programming languages Fundamental concepts Computation models Programming models/paradigms Program processing Today’s lecture  Syntax specifications of programming languages Reference: Chapter 4 of “Foundations of Programming Languages: Design and Implementation”, S. H. Roosta Three mechanisms: regular expressions, formal grammars, attribute grammars

3 3 Language Description A formal language is any set of character strings with characters chosen from a fixed, finite set of an alphabet of symbols  The strings that belong to the language are called its constructs, or phrases Any programming language description can be classified according to its  Syntax, which deals with the formation of phrases  Semantics, which deals with the meaning of phrases  Pragmatics, which deals with the practical use of phrases

4 4 Syntax Syntax refers to the formation of constructs in the language and defines relations between them  It describes the structure of the language without addressing the meaning of the constructs of the language  Syntax of a programming language is similar to the grammar of a natural language Three mechanisms describe the design and implementation of programming languages  Regular expressions  Formal grammars  Attribute grammars

5 5 Regular Expressions Invented by Stephen Kleene in about 1950 Represent a form of language definition  Each regular expression E denotes some language L(E) defined over the alphabet of the language Defined by the following set of rules  Alternation If a and b are regular expressions, then so is (a+b) The language defined by (a+b) has all the strings from the language identified by a and all strings from the language identified by b  Concatenation If a and b are regular expressions, then so is (a*b) The language defined by (a*b) has all the strings formed by concatenating a string from the set of strings identified by a to the end of a string in the set identified by b

6 6 Regular Expressions (cont’) Defined by the following set of rules (cont’)  Kleene closure If a is a regular expression, then so is a* The defined language of a* consists of all the strings formed by concatenating zero or more strings in the language identified by a  Positive closure If a is a regular expression, then so is a + The defined language of a + consists of all the strings formed by concatenating one or more strings in the language identified by a a + is the same as a* except that ε is excluded  Empty ø is a regular expression and defined language consisting of no strings  Atom any single symbol such as a or ε is a regular expression with a defined language consisting of the single string {a} or {ε}

7 7 Defined Language of the Regular Expressions Regular Expression Denoted Language ø L ø = { } Ε L 0 = { ε } aL 1 = {a} (A*B)L(A)*L(B) = {ab | a in L(A) and b in L(B)} (A+B)L(A)+L(B) = {a | a in L(A) or a in L(B)} (A*)L* = {a 1 a 2 … a n | a 1, a 2, …, a n in L(A) and n≥0} (A + )L + = {a 1 a 2 … a n | a 1, a 2, …, a n in L(A) and n>0}

8 8 Formal Grammars A grammar is a notation that you can use to specify a structural description of the various constructs in the language Four components of the grammar of a programming language  Terminal symbols  Variable symbols (nonterminal)  Production rules  Start symbol

9 9 Production Rules Each production rule has  symbols as its left side  the symbol =>  a string over the set of terminals and variables as its right side A production rule indicates that the left-side symbols drive or simply imply the right-side symbols Derivation begins with the start symbol  Each successive string in the sequence derived from the preceding string

10 10 Definitions for Grammar The grammar of a programming language can be defined as a quadruple, G = (T, V, P, S)  T is a finite set of terminal symbols, lowercase characters  V is a finite set of variable symbols (V∩T = ø ), uppercase characters  P is a finite set of production rules of the form α.X.β => δ, where α, β, and δ in (VUT)* and X in V  S in V is the start symbol of the phrase Two grammars, G1 and G2, are equivalent if and only if L(G1) = L(G2)

11 11 Classification of Grammars Type 0: unrestricted grammar  Requires at least one nonterminal symbol on the left side of a production rule Form α => β, where α in (VUT) + and β in (VUT)*  Recursively enumerable grammar, or phrase structured grammar Type 1: context-sensitive grammar  Requires that the right side of a production rule have no fewer symbols than the left side Form α => β, where α = δ 1 Aδ 2, β = δ 1 ωδ 2, A in V, ω in (VUT) + and δ 1, δ 2 in (VUT)*

12 12 Classification of Grammars (cont’) Type 2: context-free grammar  Requires that the left side of a production rule be a single variable symbol and the right side be a combination of terminal and variable terminals Form A => α, where A in V and α in (VUT)*  Backus-Naur Form (BNF) grammar Equivalent to context-free grammar Differ only in the notation Nonterminal enclosed by The symbol ::= is used for derivation

13 13 Classification of Grammars (cont’) Type 3: regular grammar  Restricted to only one terminal or one terminal and one variable on the right side of a production rule  Restrictive grammar  Right-linear grammar Form A => xB or A => x, where A, B in V, x in T Rightmost derivation  Left-linear grammar Form A => Bx or A => x, where A, B in V, x in T Leftmost derivation

14 14 Syntax Tree Two parts of programming language syntax  Lexical syntax: describes the smallest units with significance, called tokens  Phrase-structure syntax: explains how tokens are arranged into programs The syntactic structure of a phrase can be represented with a syntax tree (derivation tree or parse tree)  Terminal nodes – terminal symbols  Internal nodes – variable symbols  Root – start symbol  The label of an internal node – left side of the production rule; the labels of the children of the node (from left to right) – right side of the production rule

15 15 Syntax Tree (cont’) Recognition/representation  Determining whether the phrase is syntactically valid  Production rules are used to construct a syntax tree The grammar-oriented compiling technique consists of two components  A lexical analyzer: convert the stream of input characters to a stream of tokens  A syntactic analyzer: form a derivation tree from the token list, is a combination of A parser An intermediate code generator

16 16 Parsers Parsing: deriving the parse tree Two basic approaches to deriving parse trees  Top-down parsers Begin with the start symbol as the root of the tree Repeatedly replace variable symbols with a string of terminal symbols  Bottom-up parsers Begin with a string of terminal symbols Repeatedly replace sequences in the string with variable symbols The process continues until the start symbol is produced In both cases, the tree is the result of a syntactic analysis of the grammar

17 17 Ambiguity Ambiguous grammar: A grammar represents a phrase of its language in two or more derivation tree  Due to lack of syntactic structure  Should eliminate ambiguity whenever possible Revise the grammar Introduce a disambiguity rule

18 18 BNF Variations Other notational variations  Example: Notation { … } i j can be used to express any number n of occurrences of the enclosed sequence of symbols, for i≤n≤j  Extended BNF grammar Add some extra notations to allow easier description of languages Anything that can be specified with BNF can also be specified with Extended BNF (EBNF) grammar Increases the readability and writability of the production rules  Syntax diagram A pictorial technique, equivalent to BNF grammar In this approach, each production rule is represented as a directed graph whose vertices are symbols Terminal symbols: circles Variable symbols: rectangles

19 19 Attribute Grammars Developed by Donald Knuth in 1968 Powerful and elegant mechanisms that formalize both the context-free and context-sensitive aspects of a language’s syntax  Can be used to determine whether a variable has been declared and whether the use of the variable is consistent with its declaration An extension to a context-free grammar with certain formal primitives  enable syntax aspects of a language to be specified more precisely


Download ppt "Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern."

Similar presentations


Ads by Google