Description of programming languages 1 Using regular expressions and context free grammars
Description of programming languages 2 Introduction Programming languages must be described in an exact language –No discussion whether a language element is legal or not I will introduce 2 description languages –Regular expressions Used to describes the “small” parts of a programming language –Identifiers, numbers, etc. –Context free grammars Used to describes the “bigger” parts of a programming language –Expressions, statements, classes, etc.
Description of programming languages 3 Regular expressions defined We need an alphabet called Σ –Example alphabets: ASCII, UNICODE Regular expressions are sets –Ø (the empty set) is a regular expression –{ ε } is a regular set ε means the empty string –All sets {a} where a is in the alphabet Σ are regular expressions –From two regular expressions R and S we can generate more regular expressions R | SR U S RSConcatenations of strings from R and from S R*if R is {a} then R* is {ε, a, aa, aaa, … }
Description of programming languages 4 Regular expressions examples Set of positive integers –(0|1|2|3|4|5|6|7|8|9) (0|1|2|3|4|5|6|7|8|9)* Set of words in English –(a|b|…|z)(a|b|…|z)* –Not exactly English … bbz is in the set, but is not an English word
Description of programming languages 5 Regular expressions, short hand notation R+ means R R* –1 or more occurrences R? means ε | R –0 or 1 occurrence [a-z] means a|b|c|…|z [a-zA-Z] means [a-z] | [A-Z] Examples –Integer: -?[0-9]+ –Identifier: [a-zA-Z][a-zA-Z0-9]*
Description of programming languages 6 Regular expressions in Java Java API which uses regular expressions –Class String String[].split(String regex) “Java is my favorite language”.split(“ “) –produces an array {Java, is, my, favorite, language} –“ “ is a very simple regular expression –Package java.util.regex Class Pattern Class Matcher
Description of programming languages 7 What regular expressions can’t do Regular expression can describe simple languages. Regular expressions have no “memory” –Cannot describe parenthesis structures (((a + b) + c) + d) if (…) { if (…) … else …} else … We need something stronger! –Context free grammars
Description of programming languages 8 Context free grammars defined A context free grammar consists of 4 parts –V is an alphabet –Σ is a set of terminals,Σ ⊂ V The elements of the set V − Σ are called non- terminals –R is a set of production rules, (V − Σ) X V* –S the start symbol, S ∈ V − Σ
Description of programming languages 9 Context free grammars examples Example a, b –Alphabet {a, b, A} –Terminals { a, b } Non-terminals { A } –Production {A → Aa, A → Ab, A → a, A → b} –Some derivations A → Aa → Aaa → Abaa → abaa A → Ab → ab A → Ab → bb
Description of programming languages 10 Example: Boolean expressions We only state the productions explicitly –Terminals and non- terminals can be inferred by looking at the productions –Convention Capital letters: Non- terminals Non-capital letters: Terminals Boolean expressions –E → true –E → false –E → E && E –E → E || E –E → (E) –E → !E –Derivations E → E && E → E && (E) → E && (E || E) →* true && (false || true) Sometimes pictured as a (parse) tree.
Description of programming languages 11 What context free grammars can’t do Context free grammars cannot be used to check that a variable is declared before it is used –And by no means to check the variables type
Description of programming languages 12 The phases of a compiler Lexical analysis (scanning) –Using regular expressions Syntax analysis (parsing) –Using context free grammars Semantic analysis –Using a symbol table Code generation
Description of programming languages 13 References Wikipedia –Regular expression –Context-free grammar Friedl Mastering Regular Expressions, 2 nd edition, O’Reilly 2002 –An entire book (460 pages) devoted to regular expressions J2SE 5.0 API specification –package java.util.regex Scott A. Hommel Regular Expressions, The Java Tutorial – Lewis & Papadimitriou Elements of the Theory of Computation, Pearson 1997 –Introduction to regular expressions and context free grammars (and a lot more) Aho, Sethi & Ullman Compilers: Principles, Techniques and Tools, Addison Wesley 1986 –A famous book on compilers. –Referred to as “The Dragon Book”