Download presentation
Presentation is loading. Please wait.
1
Compilers CSCI/CMPE 3334 David Egle
2
Trends in programming languages
Programming language and its compiler: programmer’s key tools Languages undergo constant change from C to C++ to Java in just 21 years C in 1970 C++ in 1979 Java in 1991 (project started at Sun) be prepared to program in new ones
3
Review of historic development
wired interconnects von Neumann machines & machine code procedures assembly (compile by hand) assemblers FORTRAN I “cleaner” loops object-oriented programming in C virtual calls
4
Where will languages go from here?
As you just saw, the trend is towards higher level abstractions express the algorithm concisely! which means hiding often repeated code fragments new language constructs hide more of these low level details. Or at least try to detect more bugs when the program is compiled stricter type checking
5
Three execution environments
Interpreters Scheme, lisp, perl, python popular interpreted languages later got compilers Compilers C Java (compiled to bytecode) Virtual machines Java bytecode runs on an interpreter interpreter often aided by a JIT compiler
6
The Structure of a Compiler
1. Scanning (Lexical Analysis) 2. Parsing (Syntactic Analysis) 3. Type checking (Semantic Analysis) 4. Optimization 5. Code Generation The first 3, at least, can be understood by analogy to how humans comprehend English.
7
Lexical Analysis Lexical analyzer divides program text into “words” or “tokens” if x == y then z = 1; else z = 2; Units: if, x, ==, y, then, z, =, 1, ;, else, z, =, 2, ;
8
Parsing Once words are understood, the next step is to understand sentence structure Parsing = Diagramming Sentences The diagram is a tree
9
Diagramming a Sentence
This line is a longer sentence article noun verb article adjective noun subject object sentence
10
Parsing Programs Parsing program expressions is the same Consider:
if x == y then z = 1; else z = 2; Diagrammed: x == y z z 2 relation assign assign predicate then-stmt else-stmt if-then-else
11
Semantic Analysis in English
Example: Jack said Jerry left his assignment at home. Who does “his” refer to? Jack or Jerry? Even worse: Jack said Jack left his assignment at home? How many Jacks are there? Which one left the assignment?
12
Semantic Analysis I Programming languages define strict rules to avoid such ambiguities This Java code prints “4”; the inner definition is used { int Jack = 3; int Jack = 4; System.out. print(Jack); }
13
Semantic Analysis II Compilers also perform checks to find bugs
Example: Jack left her homework at home. A “type mismatch” between her and Jack we know they are different people (presumably Jack is male)
14
Code Generation A translation into another language
Analogous to human translation Compilers for Java, C, C++ produce machine or assembly code Code generators produce C or Java
15
Languages A language is a set of sentences (strings of symbols) with well defined structures and meaning Syntax of a language the rules specifying valid constructions of a language e.g. syntax of algebra: x+2 is valid; x2+ is not valid Semantic of a language the interpretation of symbols and strings e.g. semantics of algebra: x+2 is the sum of the values of x and 2
16
Language Definition All languages contain an unlimited (or very large) number of valid sentences it is not possible to store a list of all valid strings English is not suitable for defining languages formally because it is too vague Formal language definition A meta-language (formal system) is used to talk about the object language
17
Formal Specification An alphabet T is a finite set of terminal symbols. A string (sentence) is a concatenation of symbols. A language, L, is a subset of the set of finite concatenations of symbols in an alphabet T. The terminal symbols are the symbols of the alphabet T. The nonterminal symbols are a set N of symbols (not in T) that represent intermediate states in a string generation process. The starting symbol is a distinguished nonterminal symbol from which all strings of the language are derived.
18
Formal Grammar A production is a string transformation rule having a left-hand side that is a pattern to match a substring (possibly all) of the string transformed, and a righthand side that indicates a replacement for the matched portion of the string. A formal grammar G is a 4-tuple G = (T,N,E,P) where T is the set of terminal symbols N is the set of nonterminal symbols (T ∩ N is empty) E is the starting symbols (E ∈ N) P is the set of productions α β where α is not null; α, β ∈ (N ∪ T)*
19
Example1 A language consists of all strings formed from a string of ‘a’s followed by a string of ‘b’s T = {a, b} N = (A, B, E) P = { E AB A aA A a B Bb B b }
20
Example 2 A language consists of all strings formed from a string of ‘a’s followed by an equal number of ‘b’s T = {a, b} N = (A, E) P = { E A A aAb A ab}
21
Hierarchy of Languages
Type 0 grammar: No restrictions on the productions Productions that eliminate symbols are permitted. e.g. aAB aB Called: Contracting context-sensitive grammar Type 1 grammar: requires the right-hand side of every production to have at least as many symbols as the left-hand side. Called: non-contracting context-sensitive grammar e.g. context-sensitive: σατ σβτ
22
Hierarchy of Languages – 2
Type 2 grammar: the left-hand side of the production is restricted to a single nonterminal symbol Its application cannot be dependent on the context in which the symbol occurs Called: context-free grammars Type 3 grammar: restricts the number of terminals and nonterminals that each step can create Called: regular or finite state grammar
23
Regular language Linear production Right linear production
At most one non-terminal symbol is used in both the right- and left-hand sides of a production Right linear production The non-terminal occurs to the right of all other symbols on the right- hand side of a production e.g. A aB; A a Left linear production The non-terminal occurs to the left of all other symbols on the righthand side of a production e.g. A Ba; A a A regular language can be generated by a right- or left-linear grammar Regular languages can be recognized by a finite-state machine
24
Regular Expressions Regular expressions are a suitable compact specification to define a language Used as the input to a scanner generator define each token, and also define white-space, comments, etc These do not correspond to tokens, but must be recognized and ignored.
25
Example1: Pascal identifier (id)
Lexical specification (in English): a letter, followed by zero or more letters or digits. Lexical specification (as a regular expression): letter . (letter | digit)* | means “or” . means “followed by” * means zero or more instances of ( ) used for grouping
26
Operands of a regular expression
"letter" is a shorthand for a | b | c | ... | z | A | ... | Z the special character ε (the empty string) "digit“ is a shorthand for 0 | 1 | … | 9 sometimes we put the characters in quotes necessary when denoting | . * Consider regular expressions: letter.letter | digit* letter.(letter | digit)*
27
Example2: Integer Literals (int)
An integer literal with an optional sign can be defined in English as: “(nothing or + or -) followed by one or more digits” The corresponding regular expression is: (+|-|ε).(digit.digit*) A new convenient operator ‘+’ digit.digit* is the same as digit+ which means "one or more digits"
28
Language Defined by a Regular Expression
Recall: language = set of strings Language defined by an automaton the set of strings accepted by the automaton Language defined by a regular expression the set of strings that match the expression. Regular Expression Corresponding set of strings ε {""} a {"a"} a.b.c {"abc"} a | b | c {"a", "b", "c"} (a | b | c)* {"", "a", "b", "c", "aa", "ab", ..., "bccabb" ...}
29
Backus-Naur Form (BNF)
BNF is a notation for writing grammars that is commonly used to specify the syntax of programming languages Nonterminals are written as names enclosed in corner- brackets ‘< >’ The sign is written ‘::=‘ (read “is replaced by”) Alternate ways of writing a given nonterminal are separated by a vertical bar | (read “or”)
30
Example: Pascal Identifier
<id> ::= <letter>|<id><letter>|<id><digit> <letter> ::=A|B|C|…|Z <digit> ::=0|1|2|…|9
31
BNF for a Simplified Pascal Grammar
1. <prog>::=PROGRAM<prog-name>VAR<dec-list>BEGIN<stmt-list>END. 2. <prog-name>::= id 3. <dec-list>::=<dec>|<dec-list>;<dec> 4. <dec>::=<id-list>:<type> 5. <type>::=INTEGER 6. <id-list>::=id|<id-list>, id 7. <stmt-list>::=<stmt>|<stmt-list>; <stmt> 8. <stmt>::=<assign>|<read>|<write>|<for> 9. <assign>::= id := <exp> 10. <exp>::=<term>|<exp> + <term>|<exp> - <term> 11. <term>::=<factor>|<term> * <factor>|<term> DIV <factor> 12. <factor>::= id | int|(<exp>)
32
Simplified Pascal Grammar (cont’d)
13. <read>::= READ ( <id-list> ) 14. <write>::=WRITE ( <id-list> ) 15. <for>::=FOR <index-exp> DO <body> 16. <index-exp>::= id := <exp> TO <exp> 17. <body>::=<stmt> | BEGIN <stmt-list> END Note: Recursive rules (e.g. rule 6) Multiplication and division have higher precedence than addition and subtraction (rules 10-12)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.