Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to.

Similar presentations


Presentation on theme: "1 Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to."— Presentation transcript:

1 1 Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose slides this lecture is based on.

2 2 Curricula (Studieordning) The purpose of the course is for the student to gain knowledge of important principles in programming languages and for the student to gain an understanding of techniques for describing and compiling programming languages.

3 3 What was this course about? Programming Language Design –Concepts and Paradigms –Ideas and philosophy –Syntax and Semantics Compiler Construction –Tools and Techniques –Implementations –The nuts and bolts

4 4 The principal paradigms Imperative Programming (C) Object-Oriented Programming (C++) Logic/Declarative Programming (Prolog) Functional/Applicative Programming (Lisp) New paradigms? –Agent Oriented Programming –Business Process Oriented (Web computing) –Grid Oriented –Aspect Oriented Programming

5 5 Criteria in a good language design Readability –understand and comprehend a computation easily and accurately Write-ability –express a computation clearly, correctly, concisely, and quickly Reliability –assures a program will not behave in unexpected or disastrous ways Orthogonality –A relatively small set of primitive constructs can be combined in a relatively small number of ways –Every possible combination is legal –Lack of orthogonality leads to exceptions to rules

6 6 Criteria (Continued) Uniformity –similar features should look similar and behave similar Maintainability –errors can be found and corrected and new features added easily Generality –avoid special cases in the availability or use of constructs and by combining closely related constructs into a single more general one Extensibility –provide some general mechanism for the user to add new constructs to a language Standardability –allow programs to be transported from one computer to another without significant change in language structure Implementability –ensure a translator or interpreter can be written

7 7 Tennent’s Language Design principles

8 8 Important! Syntax is the visible part of a programming language –Programming Language designers can waste a lot of time discussing unimportant details of syntax The language paradigm is the next most visible part –The choice of paradigm, and therefore language, depends on how humans best think about the problem –There are no right models of computations – just different models of computations, some more suited for certain classes of problems than others The most invisible part is the language semantics –Clear semantics usually leads to simple and efficient implementations

9 9 Levels of Programming Languages High-level program class Triangle {... float surface() return b*h/2; } class Triangle {... float surface() return b*h/2; } Low-level program LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,#2 RET LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,#2 RET Executable Machine code 0001001001000101 0010010011101100 10101101001...

10 10 Terminology Translatorinputoutput source program object program is expressed in the source language is expressed in the implementation language is expressed in the target language Q: Which programming languages play a role in this picture? A: All of them!

11 11 Tombstone Diagrams What are they? –diagrams consisting out of a set of “puzzle pieces” we can use to reason about language processors and programs –different kinds of pieces –combination rules (not all diagrams are “well formed”) M Machine implemented in hardware S -> T L Translator implemented in L MLML Language interpreter in L Program P implemented in L L P

12 12 Syntax Specification Syntax is specified using “Context Free Grammars”: –A finite set of terminal symbols –A finite set of non-terminal symbols –A start symbol –A finite set of production rules A CFG defines a set of strings –This is called the language of the CFG.

13 13 Backus-Naur Form Usually CFG are written in BNF notation. A production rule in BNF notation is written as: N ::=  where N is a non terminal and  a sequence of terminals and non-terminals N ::=  is an abbreviation for several rules with N as left-hand side.

14 14 Concrete and Abstract Syntax The previous grammar specified the concrete syntax of Mini Mriangle. The concrete syntax is important for the programmer who needs to know exactly how to write syntactically well- formed programs. The abstract syntax omits irrelevant syntactic details and only specifies the essential structure of programs. Example: different concrete syntaxes for an assignment v := e (set! v e) e -> v v = e

15 15 Abstract Syntax Trees Abstract Syntax Tree for: d:=d+10*n BinaryExpression VNameExp BinaryExpression Ident d + Op Int-Lit 10 * Op SimpleVName IntegerExpVNameExp Ident n SimpleVName AssignmentCmd d Ident VName SimpleVName

16 16 Contextual Constraints Syntax rules alone are not enough to specify the format of well-formed programs. Example 1: let const m~2 in m + x Example 2: let const m~2 ; var n:Boolean in begin n := m<4; n := n+1 end Undefined! Scope Rules Type error! Type Rules

17 17 Semantics Specification of semantics is concerned with specifying the “meaning” of well-formed programs. Terminology: Expressions are evaluated and yield values (and may or may not perform side effects) Commands are executed and perform side effects. Declarations are elaborated to produce bindings Side effects: change the values of variables perform input/output

18 18 Phases of a Compiler A compiler’s phases are steps in transforming source code into object code. The different phases correspond roughly to the different parts of the language specification: Syntax analysis Syntax Contextual analysis Contextual constraints Code generation Semantics

19 19 The “Phases” of a Compiler Syntax Analysis Contextual Analysis Code Generation Source Program Abstract Syntax Tree Decorated Abstract Syntax Tree Object Code Error Reports

20 20 Compiler Passes A pass is a complete traversal of the source program, or a complete traversal of some internal representation of the source program. A pass can correspond to a “phase” but it does not have to! Sometimes a single “pass” corresponds to several phases that are interleaved in time. What and how many passes a compiler does over the source program is an important design decision.

21 21 Single Pass Compiler Compiler Driver Syntactic Analyzer calls Contextual AnalyzerCode Generator calls Dependency diagram of a typical Single Pass Compiler: A single pass compiler makes a single pass over the source text, parsing, analyzing and generating code all at once.

22 22 Multi Pass Compiler Compiler Driver Syntactic Analyzer calls Contextual AnalyzerCode Generator calls Dependency diagram of a typical Multi Pass Compiler: A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. input Source Text output AST input output Decorated AST input output Object Code

23 23 Syntax Analysis Scanner Source Program Abstract Syntax Tree Error Reports Parser Stream of “Tokens” Stream of Characters Error Reports Dataflow chart

24 24 Regular Expressions RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE:  The empty string tGenerates only the string t X YGenerates any string xy such that x is generated by x and y is generated by Y X | YGenerates any string which generated either by X or by Y X*The concatenation of zero or more strings generated by X (X)For grouping,

25 25 FA and the implementation of Scanners Regular expressions, (N)DFA-  and NDFA and DFA’s are all equivalent formalisms in terms of what languages can be defined with them. Regular expressions are a convenient notation for describing the “tokens” of programming languages. Regular expressions can be converted into FA’s (the algorithm for conversion into NDFA-  is straightforward) DFA’s can be easily implemented as computer programs.

26 26 Parsing Parsing == Recognition + determining phrase structure (for example by generating AST) –Different types of parsing strategies bottom up top down

27 27 Look-Ahead Derivation LL-Analyse (Top-Down) Look-Ahead Reduction LR-Analyse (Bottom-Up) Top-Down vs Bottom-Up parsing

28 28 Development of Recursive Descent Parser (1)Express grammar in EBNF (2)Grammar Transformations: Left factorization and Left recursion elimination (3)Create a parser class with –private variable currentToken –methods to call the scanner: accept and acceptIt (4)Implement private parsing methods: –add private parse N method for each non terminal N –public parse method that gets the first token form the scanner calls parse S (S is the start symbol of the grammar)

29 29 LL(1) Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called LL(1) grammars. Basically, an LL(1) grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. What grammars are LL(1)? How can we recognize that a grammar is (or is not) LL(1)?  We can deduce the necessary conditions from the parser generation algorithm.  We can use a formal definition

30 30 Converting EBNF into RD parsers The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! => JavaCC “Java Compiler Compiler”

31 31 JavaCC and JJTree

32 32 LR parsing –The algorithm makes use of a stack. –The first item on the stack is the initial state of a DFA –A state of the automaton is a set of LR(0)/LR(1) items. –The initial state is constructed from productions of the form S:=  [, $] (where S is the start symbol of the CFG) –The stack contains (in alternating) order: A DFA state A terminal symbol or part (subtree) of the parse tree being constructed –The items on the stack are related by transitions of the DFA –There are two basic actions in the algorithm: shift: get next input token reduce: build a new node (remove children from stack)

33 33 Bottom Up Parsers: Overview of Algorithms LR(0) : The simplest algorithm, theoretically important but rather weak (not practical) SLR : An improved version of LR(0) more practical but still rather weak. LR(1) : LR(0) algorithm with extra lookahead token. –very powerful algorithm. Not often used because of large memory requirements (very big parsing tables) LALR : “Watered down” version of LR(1) –still very powerful, but has much smaller parsing tables –most commonly used algorithm today

34 34 JavaCUP: A LALR generator for Java Grammar BNF-like Specification JavaCUP Java File: Parser Class Uses Scanner to get Tokens Parses Stream of Tokens Definition of tokens Regular Expressions JFlex Java File: Scanner Class Recognizes Tokens Syntactic Analyzer

35 35 Steps to build a compiler with SableCC 1.Create a SableCC specification file 2.Call SableCC 3.Create one or more working classes, possibly inherited from classes generated by SableCC 4.Create a Main class activating lexer, parser and working classes 5.Compile with Javac

36 36 Contextual Analysis Phase Purposes: –Finish syntax analysis by deriving context-sensitive information –Associate semantic routines with individual productions of the context free grammar or subtrees of the AST –Start to interpret meaning of program based on its syntactic structure –Prepare for the final stage of compilation: Code generation

37 37 Contextual Analysis -> Decorated AST Program LetCommand SequentialDeclaration n Ident SimpleT VarDecl SimpleT VarDecl Integer c Charc‘&’ nn +1 Ident OpChar.LitInt.Lit SimpleV Char.Expr SimpleV VNameExpInt.Expr AssignCommandBinaryExpr SequentialCommand AssignCommand :char :int result of identification :type result of type checking Annotations: :int

38 38 Nested Block Structure A language exhibits nested block structure if blocks may be nested one within another (typically with no upper bound on the level of nesting that is allowed). There can be any number of scope levels (depending on the level of nesting of blocks): Typical scope rules: no identifier may be declared more than once within the same block (at the same level). for any applied occurrence there must be a corresponding declaration, either within the same block or in a block in which it is nested. Nested

39 39 Type Checking For most statically typed programming languages, type checking is a bottom up algorithm over the AST: Types of expression AST leaves are known immediately: –literals => obvious –variables => from the ID table –named constants => from the ID table Types of internal nodes are inferred from the type of the children and the type rule for that kind of expression

40 40 Contextual Analysis Identification and type checking are combined into a depth-first traversal of the abstract syntax tree. Ident CharLitIdent OpIntLit nIntegercCharc‘&’nn+1 SimpleT SimpleV VarDec VnameExprIntExpr BinaryExpression AssignCommand CharExpr AssignCommand SequentialCommandSequentialDeclaration LetCommand Program

41 41 Visitor Solution NodeVisitor VisitAssignment( AssignmentNode ) VisitVariableRef( VariableRefNode ) TypeCheckingVisitor VisitAssignment( AssignmentNode ) VisitVariableRef( VariableRefNode ) CodeGeneratingVisitor VisitAssignment( AssignmentNode ) VisitVariableRef( VariableRefNode ) Node Accept( NodeVisitor v ) VariableRefNode Accept(NodeVisitor v) {v->VisitVariableRef(this)} AssignmentNode Accept(NodeVisitor v) {v->VisitAssignment(this)} Nodes accept visitors and call appropriate method of the visitor Visitors implement the operations and have one method for each type of node they visit

42 42 Runtime organization Data Representation: how to represent values of the source language on the target machine. Primitives, arrays, structures, unions, pointers Expression Evaluation: How to organize computing the values of expressions (taking care of intermediate results) Register vs. stack machine Storage Allocation: How to organize storage for variables (considering different lifetimes of global, local and heap variables) Activation records, static links Routines: How to implement procedures, functions (and how to pass their parameters and return values) Value vs. reference, closures, recursion Object Orientation: Runtime organization for OO languages Method tables

43 43 RECAP: TAM Frame Layout Summary LB ST local variables and intermediate results dynamic link static link return address Local data, grows and shrinks during execution. Link data arguments Arguments for current procedure they were put here by the caller.

44 44 Garbage Collection: Conclusions Relieves the burden of explicit memory allocation and deallocation. Software module coupling related to memory management issues is eliminated. An extremely dangerous class of bugs is eliminated. The compiler generates code for allocating objects The compiler must also generate code to support GC –The GC must be able to recognize root pointers from the stack –The GC must know about data-layout and objects descriptors

45 45 Code Generation Source Program let var n: integer; var c: char in begin c := ‘&’; n := n+1 end PUSH 2 LOADL 38 STORE 1[SB] LOAD 0 LOADL 1 CALL add STORE 0[SB] POP 2 HALT Target program ~ ~ Source and target program must be “semantically equivalent” Semantic specification of the source language is structured in terms of phrases in the SL: expressions, commands, etc. => Code generation follows the same “inductive” structure.

46 46 Specifying Code Generation with Code Templates The code generation functions for Mini Triangle Phrase Class Function Effect of the generated code Program Command Expres- sion V-name Decla- ration run P execute C evaluate E fetch V assign V elaborate D Run program P then halt. Starting and finishing with empty stack Execute Command C. May update variables but does not shrink or grow the stack! Evaluate E, net result is pushing the value of E on the stack. Push value of constant or variable on the stack. Pop value from stack and store in variable V Elaborate declaration, make space on the stack for constants and variables in the decl.

47 47 Code Generation with Code Templates execute [ while E do C ] = JUMP h g: execute [ C ] h: evaluate[ E ] JUMPIF(1) g C E While command

48 48 Developing a Code Generator “Visitor” public Object visitSequentialCommand( SequentialCommand com,Object arg) { com.C1.visit(this,arg); com.C2.visit(this,arg); return null; } public Object visitSequentialCommand( SequentialCommand com,Object arg) { com.C1.visit(this,arg); com.C2.visit(this,arg); return null; } execute [ C1 ; C2 ] = execute[ C1 ] execute[ C2 ] LetCommand, IfCommand, WhileCommand => later. - LetCommand is more complex: memory allocation and addresses - IfCommand and WhileCommand: complications with jumps

49 49 Code improvement (optimization) The code generated by our compiler is not efficient: It computes values at runtime that could be known at compile time It computes values more times than necessary We can do better! Constant folding Common sub-expression elimination Code motion Dead code elimination

50 50 Optimization implementation Is the optimization correct or safe? Is the optimization an improvement? What sort of analyses do we need to perform to get the required information? –Local –Global

51 51 Programming Language Life cycle The requirements for the new language are identified The language syntax and semantics is designed –BNF or EBNF, experiments with front-end tools –Informal or formal Semantic An informal or formal specification is developed Initial implementation –Prototype via interpreter or interpretive compiler Language tested by designers, implementers and a few friends Feedback on the design and possible reconsiderations Improved implementation

52 52 Programming Language Life cycle Design Specification Manuals, Textbooks Compiler Prototype

53 53 Programming Language Life cycle Lots of research papers Conferences session dedicated to new language Text books and manuals Used in large applications Huge international user community Dedicated conference International standardisation efforts Industry de facto standard Programs written in the languages becomes legacy code Language enters “hall-of-fame” and features are taught in CS course on Programming Language Design and Implementation

54 54 The Most Important Open Problem in Computing Increasing Programmer Productivity –Write programs correctly –Write programs quickly –Write programs easily Why? –Decreases support cost –Decreases development cost –Decreases time to market –Increases satisfaction

55 55 Why Programming Languages? 3 ways of increasing programmer productivity: 1.Process (software engineering) –Controlling programmers 2.Tools (verification, static analysis, program generation) –Important, but generally of narrow applicability 3.Language design --- the center of the universe! –Core abstractions, mechanisms, services, guarantees –Affect how programmers approach a task (C vs. SML) –Multi-paradigm integration

56 56 How to recognize a problem that can be solved with programming language techniques when you see one? Problem - a Scrabble game to be distributed as an applet. Create a dictionary of 50,000 words. Two options –Program 1: create an external file words.txt and read it into an array when program starts while ((word = f.readLine()) != null {words.addElement(word);} –Program 2: create a 50.000 element table in the program and initialize it to the words String [] words = {“hill”, “fetch”, “pail”, “water”,…..}; Advantages/disadvantages of each approach? –performance –flexibility –correctness –…. Example from J. Craig Cleaveland. Program Generators with XML and Java, chapter 1

57 57 A program generator approach import java.io.*; import java.util.*; class Dictionary1Generator { static Vector words = new Vector(); static void loadWords() { // read the words in file words.txt // into the Vector words } static public void main(String[] args) { loadWords(); // Generate Dictionary1 program System.out.println("class Dictionary1{\n"); System.out.println(" String words = {"); for (int j=0; j<words.size(); ++j) { System.out.println("\""+words.elementAt(j)+"\","); }; System.out.println(”} \n }”); }

58 58 Typical program generator Dictionary example The data –simply a list of words Analyzing/transforming data –duplicate word removal –sorting Generate program –simply use print statements to write program text General picture The data –some more complex representation of data formal specs, grammar, spreadsheet, XML, etc. Analyzing/transforming data –parse, check for inconsistencies, transform to other data structures Generate program –generate syntax tree, use templates,…

59 59 The next wave of Program Generators: Model-Driven Development Testing Requirements Analysis & Design Implementation

60 60 New Programming Language! Why Should I Care? The problem is not designing a new language –It’s easy! Thousands of languages have been developed The problem is how to get wide adoption of the new language –It’s hard! Challenges include Competition Usefulness Interoperability Fear “It’s a good idea, but it’s a new idea; therefore, I fear it and must reject it.” --- Homer Simpson The financial rewards are low, but …

61 61 Famous Danish Computer Scientists Peter Nauer –BNF and Algol Per Brinck Hansen –Monitors and Concurrent Pascal Dines Bjørner –VDM and ADA Bjarne Straustrup –C++ Mads Tofte –SML Rasmus Lerdorf –PhP Anders Hejlsberg –Turbo Pascal and C# Jacob Nielsen

62 62

63 63

64 64 Fancy joining this crowd? Join the Programming Language Technology Research Group when you get to DAT5/DAT6 or SW8/SW9 Research Programme underway –How would you like to programme in 20 years? Experimenting with advanced programming –Functional and OO integration –Programmatic Program Construction Developing a new programming language ”The P-gang”: Kurt Nørmark Lone Leth Bent Thomsen Simon Kongshøj (Petur Olsen og Thomas Bøgholm) (Thomas Vestdam)

65 65 2003/2004/2005/2006/2007/2008 Projects DAT5/INF7/SW9 –Java vs..Net Mobile (ver. 1 and 2) –Business Process Management –Quality control in Open Source Development –Impedance mismatch (performance, C#, Java) –XML and programming language representation –Languages and games –Aspect oriented Programming –Testing and PrgL. Design DAT6/INF8/SW10 –Mobile Business Process Infrastructure based on Ambients –Aspect.Net and JTL –Search for WS based on Semantic Web –Performance analysis of J2ME systems –Communication in Open Source Projects –New concurrency constructs in Java –Type inference for Ruby –Dependent types for super computing –Analysis of Real-Time Java Programs –Testing tool for.Net DAT8/D8 –Java vs. C on DSP –Multiple dispatch in C#

66 66 2008/2009 projects Uniform client- and server-side programming –Daniel Solsø Korsgård, Markus Krogh, Michael Stampe Knudsen, Morten Bøgh Sørensen Coherent Declarative, Object Oriented Programming –Martin Skou, Morten Friis, Allan Bødker GPGPU Programming –Christian Eske Hansen, Morten Christiansen Virtual Machines for Dynamic Languages – Simon Kongshøj Verification of Real-Time Java Programs using UPPAAL –Thomas Bøgholm (og Petur Olsen)

67 67 Finally Keep in mind, the compiler is the program from which all other programs arise. If your compiler is under par, all programs created by the compiler will also be under par. No matter the purpose or use -- your own enlightenment about compilers or commercial applications -- you want to be patient and do a good job with this program; in other words, don't try to throw this together on a weekend. Asking a computer programmer to tell you how to write a compiler is like saying to Picasso, "Teach me to paint like you." *Sigh* Well, Picasso tried.

68 68 What I promised you at the start of the course Ideas, principles and techniques to help you –Design your own programming language or design your own extensions to an existing language –Tools and techniques to implement a compiler or an interpreter –Lots of knowledge about programming I hope you feel you got what I promised

69 69 Top 10 reasons COMPILERS must be female 10. Picky, picky, picky. 9. They hear what you say, but not what you mean. 8. Beauty is only shell deep. 7. When you ask what's wrong, they say "nothing". 6. Can produce incorrect results with alarming speed. 5. Always turning simple statements into big productions. 4. Small talk is important. 3. You do the same thing for years, and suddenly it's wrong. 2. They make you take the garbage out. 1. Miss a period and they go wild.


Download ppt "1 Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to."

Similar presentations


Ads by Google