Compilers CSCI/CMPE 3334 David Egle.

Slides:



Advertisements
Similar presentations
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Advertisements

ISBN Chapter 3 Describing Syntax and Semantics.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
COP4020 Programming Languages
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 How are Languages Implemented? Two major strategies: –Interpreters (older, less studied) –Compilers (newer, more extensively studied) Interpreters run.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Syntax and Backus Naur Form
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
PART I: overview material
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
CPS 506 Comparative Programming Languages Syntax Specification.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
ISBN Chapter 3 Describing Syntax and Semantics.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 3304 Comparative Languages
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
PROGRAMMING LANGUAGES
Chapter 6 Compiler.
Describing Syntax and Semantics
CS 326 Programming Languages, Concepts and Implementation
System Software Unit-1 (Language Processors) A TOY Compiler
CS 326 Programming Languages, Concepts and Implementation
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 2 :: Programming Language Syntax
Chapter 3 – Describing Syntax
Syntax Specification and Analysis
What does it mean? Notes from Robert Sebesta Programming Languages
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
PROGRAMMING LANGUAGES
Compiler Construction
Compiler Lecture 1 CS510.
CS 363 Comparative Programming Languages
CSE 3302 Programming Languages
Compiler Design 4. Language Grammars
Programming Language Syntax 2
COP4020 Programming Languages
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Chapter 2 :: Programming Language Syntax
Chapter 3 Describing Syntax and Semantics.
BNF 9-Apr-19.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 2 :: Programming Language Syntax
High-Level Programming Language
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 10: Compilers and Language Translation
Discrete Maths 13. Grammars Objectives
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Compilers CSCI/CMPE 3334 David Egle

Trends in programming languages Programming language and its compiler: programmer’s key tools Languages undergo constant change from C to C++ to Java in just 21 years C in 1970 C++ in 1979 Java in 1991 (project started at Sun) be prepared to program in new ones

Review of historic development wired interconnects von Neumann machines & machine code procedures assembly (compile by hand) assemblers FORTRAN I “cleaner” loops object-oriented programming in C virtual calls

Where will languages go from here? As you just saw, the trend is towards higher level abstractions express the algorithm concisely! which means hiding often repeated code fragments new language constructs hide more of these low level details. Or at least try to detect more bugs when the program is compiled stricter type checking

Three execution environments Interpreters Scheme, lisp, perl, python popular interpreted languages later got compilers Compilers C Java (compiled to bytecode) Virtual machines Java bytecode runs on an interpreter interpreter often aided by a JIT compiler

The Structure of a Compiler 1. Scanning (Lexical Analysis) 2. Parsing (Syntactic Analysis) 3. Type checking (Semantic Analysis) 4. Optimization 5. Code Generation The first 3, at least, can be understood by analogy to how humans comprehend English.

Lexical Analysis Lexical analyzer divides program text into “words” or “tokens” if x == y then z = 1; else z = 2; Units: if, x, ==, y, then, z, =, 1, ;, else, z, =, 2, ;

Parsing Once words are understood, the next step is to understand sentence structure Parsing = Diagramming Sentences The diagram is a tree

Diagramming a Sentence This line is a longer sentence article noun verb article adjective noun subject object sentence

Parsing Programs Parsing program expressions is the same Consider: if x == y then z = 1; else z = 2; Diagrammed: x == y z 1 z 2 relation assign assign predicate then-stmt else-stmt if-then-else

Semantic Analysis in English Example: Jack said Jerry left his assignment at home. Who does “his” refer to? Jack or Jerry? Even worse: Jack said Jack left his assignment at home? How many Jacks are there? Which one left the assignment?

Semantic Analysis I Programming languages define strict rules to avoid such ambiguities This Java code prints “4”; the inner definition is used { int Jack = 3; int Jack = 4; System.out. print(Jack); }

Semantic Analysis II Compilers also perform checks to find bugs Example: Jack left her homework at home. A “type mismatch” between her and Jack we know they are different people (presumably Jack is male)

Code Generation A translation into another language Analogous to human translation Compilers for Java, C, C++ produce machine or assembly code Code generators produce C or Java

Languages A language is a set of sentences (strings of symbols) with well defined structures and meaning Syntax of a language the rules specifying valid constructions of a language e.g. syntax of algebra: x+2 is valid; x2+ is not valid Semantic of a language the interpretation of symbols and strings e.g. semantics of algebra: x+2 is the sum of the values of x and 2

Language Definition All languages contain an unlimited (or very large) number of valid sentences it is not possible to store a list of all valid strings English is not suitable for defining languages formally because it is too vague Formal language definition A meta-language (formal system) is used to talk about the object language

Formal Specification An alphabet T is a finite set of terminal symbols. A string (sentence) is a concatenation of symbols. A language, L, is a subset of the set of finite concatenations of symbols in an alphabet T. The terminal symbols are the symbols of the alphabet T. The nonterminal symbols are a set N of symbols (not in T) that represent intermediate states in a string generation process. The starting symbol is a distinguished nonterminal symbol from which all strings of the language are derived.

Formal Grammar A production is a string transformation rule having a left-hand side that is a pattern to match a substring (possibly all) of the string transformed, and a righthand side that indicates a replacement for the matched portion of the string. A formal grammar G is a 4-tuple G = (T,N,E,P) where T is the set of terminal symbols N is the set of nonterminal symbols (T ∩ N is empty) E is the starting symbols (E ∈ N) P is the set of productions α β where α is not null; α, β ∈ (N ∪ T)*

Example1 A language consists of all strings formed from a string of ‘a’s followed by a string of ‘b’s T = {a, b} N = (A, B, E) P = { E AB A aA A a B Bb B b }

Example 2 A language consists of all strings formed from a string of ‘a’s followed by an equal number of ‘b’s T = {a, b} N = (A, E) P = { E A A aAb A ab}

Hierarchy of Languages Type 0 grammar: No restrictions on the productions Productions that eliminate symbols are permitted. e.g. aAB aB Called: Contracting context-sensitive grammar Type 1 grammar: requires the right-hand side of every production to have at least as many symbols as the left-hand side. Called: non-contracting context-sensitive grammar e.g. context-sensitive: σατ σβτ

Hierarchy of Languages – 2 Type 2 grammar: the left-hand side of the production is restricted to a single nonterminal symbol Its application cannot be dependent on the context in which the symbol occurs Called: context-free grammars Type 3 grammar: restricts the number of terminals and nonterminals that each step can create Called: regular or finite state grammar

Regular language Linear production Right linear production At most one non-terminal symbol is used in both the right- and left-hand sides of a production Right linear production The non-terminal occurs to the right of all other symbols on the right- hand side of a production e.g. A aB; A a Left linear production The non-terminal occurs to the left of all other symbols on the righthand side of a production e.g. A Ba; A a A regular language can be generated by a right- or left-linear grammar Regular languages can be recognized by a finite-state machine

Regular Expressions Regular expressions are a suitable compact specification to define a language Used as the input to a scanner generator define each token, and also define white-space, comments, etc These do not correspond to tokens, but must be recognized and ignored.

Example1: Pascal identifier (id) Lexical specification (in English): a letter, followed by zero or more letters or digits. Lexical specification (as a regular expression): letter . (letter | digit)* | means “or” . means “followed by” * means zero or more instances of ( ) used for grouping

Operands of a regular expression "letter" is a shorthand for a | b | c | ... | z | A | ... | Z the special character ε (the empty string) "digit“ is a shorthand for 0 | 1 | … | 9 sometimes we put the characters in quotes necessary when denoting | . * Consider regular expressions: letter.letter | digit* letter.(letter | digit)*

Example2: Integer Literals (int) An integer literal with an optional sign can be defined in English as: “(nothing or + or -) followed by one or more digits” The corresponding regular expression is: (+|-|ε).(digit.digit*) A new convenient operator ‘+’ digit.digit* is the same as digit+ which means "one or more digits"

Language Defined by a Regular Expression Recall: language = set of strings Language defined by an automaton the set of strings accepted by the automaton Language defined by a regular expression the set of strings that match the expression. Regular Expression Corresponding set of strings ε {""} a {"a"} a.b.c {"abc"} a | b | c {"a", "b", "c"} (a | b | c)* {"", "a", "b", "c", "aa", "ab", ..., "bccabb" ...}

Backus-Naur Form (BNF) BNF is a notation for writing grammars that is commonly used to specify the syntax of programming languages Nonterminals are written as names enclosed in corner- brackets ‘< >’ The sign is written ‘::=‘ (read “is replaced by”) Alternate ways of writing a given nonterminal are separated by a vertical bar | (read “or”)

Example: Pascal Identifier <id> ::= <letter>|<id><letter>|<id><digit> <letter> ::=A|B|C|…|Z <digit> ::=0|1|2|…|9

BNF for a Simplified Pascal Grammar 1. <prog>::=PROGRAM<prog-name>VAR<dec-list>BEGIN<stmt-list>END. 2. <prog-name>::= id 3. <dec-list>::=<dec>|<dec-list>;<dec> 4. <dec>::=<id-list>:<type> 5. <type>::=INTEGER 6. <id-list>::=id|<id-list>, id 7. <stmt-list>::=<stmt>|<stmt-list>; <stmt> 8. <stmt>::=<assign>|<read>|<write>|<for> 9. <assign>::= id := <exp> 10. <exp>::=<term>|<exp> + <term>|<exp> - <term> 11. <term>::=<factor>|<term> * <factor>|<term> DIV <factor> 12. <factor>::= id | int|(<exp>)

Simplified Pascal Grammar (cont’d) 13. <read>::= READ ( <id-list> ) 14. <write>::=WRITE ( <id-list> ) 15. <for>::=FOR <index-exp> DO <body> 16. <index-exp>::= id := <exp> TO <exp> 17. <body>::=<stmt> | BEGIN <stmt-list> END Note: Recursive rules (e.g. rule 6) Multiplication and division have higher precedence than addition and subtraction (rules 10-12)