Programming Languages Ezra K. Mugisa, PhD (London), DIC (Imperial College) Visiting Academic Institute Of Computer Science Makerere University Otherwise … Head, Computer Science Section Department of Mathematics & Computer Science The University of the West Indies Kingston, Jamaica
Related Courses MCSC 610: Survey of Computer Languages (3 CU) Content: Organization and types of programming languages. Analysis of imperative, object-oriented, and declarative language paradigms. A study and comparative analysis of high-level languages, fourth- generation languages, and command languages used in the development of software for management information systems. The logical and physical structure of programs and data. Concepts of structured programming. Data structures, file management, and their use in problem-solving. Students will complete a variety of high-level language computer programs. References :
MCSC 611: Programming Languages (3 CU) Content: Formal languages and language hierarchies, syntactic and semantic specification, abstract machines and corresponding languages, context-free languages, abstraction, modularity, and program structure. Fundamental programming language concepts. Analysis of imperative, object-oriented, and declarative language paradigms. Several programming languages will be analyzed. References : 1.Title:Programming Languages; Author:Terrence W. Pratt, Marvin V. Zelkowitz; ISBN: ; Edition:4th edition 2001; Publisher: Prentice Hall 2.Title: Programming Languages: Concepts and Constructs; Author: Ravi Sethi, AT&T Bell Laboratories; ISBN: ; Edition:2nd edition ; Publisher: Addison Wesley
Topics To Be Covered 1.Formal languages and language hierarchies 2.Syntactic and semantic specification 3.Abstract machines and corresponding languages 4.Context-free languages 5.Abstraction, modularity and program structure 6.Fundamental programming language concepts 7.Analysis of Programming Paradigms and the Languages that support them 1.imperative 2.object-oriented 3.declarative 8.(Several programming languages will be analysed.)
So What Is The Course About? This is not a survey course (see MCSC610) We shall develop a Theory of Programming Languages –i.e. we strive for a thorough understanding of what all programming languages are about Our TPL will essentially tell a story about programming languages The TPL will consist of –Coherent facts about Programming Languages –Ways of deriving new facts and verifying them
Course Objectives At the end of the course students should be able to –Define a programming language, essentially as a context-free language –Describe an abstract machine for context-free languages –Identify essential programming language concepts –Relate the (historical) development of programming languages to programming paradigms –Have a unifying view of programming languages
Formal Languages And Language Hierarchies We distinguish between Formal languages and Natural languages. A formal language is man-made (in very simple terms) whereas a natural language develops in less controlled ways. All programming languages are formal. Examples of natural languages include English, Luganda, Swahili, French – basically the languages humans (and other creatures) use for communicating with each other.
Formal Vs. Natural Formal languages are distinguished by an exact syntax (structure) and (hopefully) a precise semantics (meaning) as well. Typically if the syntax of a piece of communication is incorrect it will be rejected by the receiving agent, as its meaning may be difficult to ascertain.
Syntax Is Defined By A Grammar A formal grammar, G, is a finite formal description that defines (generates) a language, L, over some alphabet An alphabet denotes a finite set of symbols G defines the set of valid sentences in L A sentence is a sequence of symbols from a given alphabet L(G) is "the language defined by G"
Defining A Grammar A grammar is the 4-tuple: G = (N,T,P,S) where –N is a finite set of non-terminals (or rule names) –T is a finite set of terminals (token names) –P is a finite set of productions (rules); (a production has a left-hand and a right-hand side and may be seen as a rewriting rule) –S N is the start symbol (the highest level rule - the goal)
An Example G 1 = (N, T, P, S) N = { E, I} T = {1,2,3,4,5, +, *} S = E P consists of the following productions: E → I E → E + E E → E * E I → 1 | 2 | 3 | 4 | 5
A Grammar Defines A Language A Grammar G defines language L(G) L(G) is the set of all possible terminal strings w that you can derive by starting at S and repeatedly applying rules (productions) Applying a rule means to replace the left- hand side of the rule with its right-hand- side
α Derives ß If you can convert α to ß by applying rules, we say that α derives ß or α * ß where means to derive in one step and * means to derive in zero-or-more steps + means to derive in one-or-more steps
An Example The language defined by G 1 is denoted by L(G 1 ) L(G 1 ) =
Deriving A String A partially derived string is called a sentential form and may contain both terminals and non-terminals If S * α then α is a sentential form e.g. these sentential forms are generated by G: I + E, 5 + E, 3 * I A sentence contains only terminals
Language L(G) L(G) = {w T* | S * w}. That is, L(G) is the set of all sentences that can be reached by a derivation from the start symbol, S.
Is A Sentence Generated By A Grammar? This is the first question we try to answer when we compile our programs If you can find a derivation from S to the target sentence, then that sentence is in L(G) In other words if w T* and S * w then w L(G)
An Example Is abc generated by the grammar with the following productions? A → Bc B → ab
Example (Contd.) Yes, the derivation looks like this A Bc abc Therefore since A * abc abc must be a valid sentence of L(G) i.e. abc L(G)
Leftmost And Rightmost Derivations Sometimes you have a choice as to which non-terminal to replace in a sentential form Let us try to derive the sentence 4 * 5 from G 1
Deriving 4 * 5 From G 1 We find that there are multiple derivations How then should we proceed or does it matter? The 2 extremes are 1.Replace the leftmost non-terminal first 2.Replace the rightmost non-terminal first
It Does Not Matter Every sentence has both leftmost and rightmost derivations The order of replacement differs but the same rules are applied for each non-terminal So the sentence structure is the same and, hence, has the same "meaning" It is just examined in a different order We may actually mix the 2 strategies in the same derivation
Derivation Trees Ignore The Order A derivation tree is a two-dimensional tree that records the derivation from start symbol to sentence. Interior nodes are non-terminals and leaves are terminals. Derivation trees are insensitive to derivation order (leftmost or rightmost)--you get the same tree regardless. In fact you can think of them as expanding all non- terminals in parallel. Only changing which rule you apply changes the tree.
Constructing A Derivation Tree Begin construction by creating a root node labelled with start symbol, S. Then, until every leaf node is a terminal node, add a child to non-terminal leaf node, A, for each symbol in α when applying rule A → α.
An Example Construct derivation trees for –4 + 5 –4 + 5 * 3
Does Rule Order Matter? To check w L(G): 1.you choose which non-terminal in a sentential form to replace and 2.which rule to apply For a valid sentence, there will normally be exactly one replacement choice that will lead to complete derivation What happens when you have a choice of rules to apply, both of which yield valid derivations?
Ambiguous Grammar When you have a choice of rules in a valid derivation this means that … The derived sentence has multiple interpretations (or meanings) Your grammar is said to be ambiguous
An (Ambiguous) Example Let us derive * 3
Derivation Tree T 1
Derivation Tree T 2
Two Different Meanings Notice that grammar G 1 does not assign meaning to the symbols or to the productions. If we assign the normal arithmetic meaning to the symbols used in G 1 : Derivation tree T 1 gives : Derivation tree T 2 gives :
Can We Tell? There is no algorithm (it is undecidable) to check to see if a CFG is ambiguous In contrast, for certain subsets of CFGs, you can show that they are unambiguous Specifically, if you can generate a valid parser from the grammar There are tools that can help us
Grammar Types In the 1950's, Noam Chomsky defined a classification or hierarchy of grammars that neatly categorises the difficulty of describing a language.Noam Chomskyhierarchy of grammars There are four categories: Type-0: unrestricted Type-1: Context-sensitive grammar Type-2: Context-free grammar (CFG) Type-3: Regular grammar The difference is in the form of the productions
Comparison Power: Type-3 Type-2 Type-1 Type-0 Type 0 and type 1 languages are not used for computer languages because their power is not needed and no efficient means of generating parsers for them is known. Fortunately, for most things, a context-free grammar immediately presents itself or you can rephrase your problem such that a context-free grammar suffices.
Unrestricted Type-0: unrestricted; all formal grammars. Generates all languages recognizable by a Turing machine. That is, you can write a program by hand to recognize it, even if it's hard to do. Grammars have no restrictions on their form.
Context-Sensitive Grammar Type-1: Context-sensitive grammar. grammars where you can restrict validity of applying a rule to a certain context. Generates the context-sensitive languages has productions of the form: αAβ → αγβ where γ is non-empty A N α, β, γ (N T)* (but you can have S → ε as long as S is not on the right hand side of a production). Replace A with γ but only in the context of α and β
Context-Free Grammar Type-2: Context-free grammar. Generates the context-free languages. These languages can all be recognised with pushdown automata. Productions have the form: A → α where A N α (N T)*
Regular Grammar Type-3: Regular grammars; Generates the regular languages, these can be recognized with simple deterministic finite automata (DFA). Productions have the form: A → a or A → aB | Ba where A, B N a T you can have S → ε as long as S is not on the right hand side of a production. We use regular expressions in practice.
Syntactic and Semantic Specification
Abstract Machines and Corresponding Languages
Context-Free Languages
Abstraction
Modularity
Program Structure
Fundamental Programming Language Concepts
Analysis of Language Paradigms A Paradigm is (From Wikipedia) : –(in the vernacular) a pattern behind a set of typical examples of somethingpattern –(in philosophy of science) a Kuhnian paradigmKuhnianparadigm –(in experimental science) an experimental setupexperimental setup –(in computing) a paradigm is a style of programming, usually enforced by the programming language used.paradigmprogrammingprogramming language –(in linguistics) an inflection paradigminflection paradigm 1.We look at 3 programming paradigms and language support for them. 2.The three are: –imperative, –object-oriented, and –declarative language paradigms. 1.Several programming languages will be analysed.