PART I SISTEM UTILITIES Lecture 6 Compilers Ştefan Stăncescu 1
COMPILERS 2 “high level language” HLL, w/complex grammar laws, closer to human language HLL mean for man computer link human language binary language HLL binary language COMPILER - Automatic translation machine
COMPILERS 3 Source Code =>in HLL language Object code =>in binary language (machine code) COMPILATION – cf. HLL grammar law lexical laws language elements type and structure syntactic laws composition rules of language elements "semantic" laws (translation programs) syntactic law correspondent in object code, “semantic programs” for machine
COMPILERS 4 Compiling = review + translate HLL source text lexical laws scanner syntactic laws parser "semantic" laws object code generator (at the VM – intermediate code - "bytecode“)
COMPILERS 5 SCANER identifies tokens language elements - one or many adjacent single characters separated by characters sp, LF,FF, etc.) words START, STOP, LABEL01 operators +*/- special signs(){}//.,
COMPILERS 6 SCANER step I scanning HLL source text determine the token list by boundary identify HLL tokens identify programmer invented tokens create look-up table with numerical symbols for tokens
COMPILERS 7 SCANNER step 2 create intermediate source file with replaced tokens with numerical symbols from the look-up table created in step 1
COMPILERS 8 BNF – Bachus-Naur Form syntactic rule REPRESENTATION A rule - law in BNF format a valid construction in HLL language formatted template of a rule applied in a line in source file (and a rule applied for lines in a line list)
COMPILERS 9 Syntactic rule valid construction in HLL A template have the name of the new built and checked element that can be part of other construction (including one with the same pattern) New build name “nonterminal” symbol BNF rule form: :: = building template
COMPILERS 10 Parsing discovery in HLL source file of successive valid BNF rules (templates) until there are no more undiscovered laws (no more “nonterminal” symbols) Parsing ends only on tokens (“terminal” symbols) Chaining BNF rules (templates) => syntax tree The purpose parsing => the discovery of the syntax tree of the source file
COMPILATOARE 11 Line in the source file: S = A + B (A, B, S - integer variables - tokens) The code generator must explain to the machine the templates finded The scanner identifies tokens “S” “=“ “A” “+” “B” tokens “A”, “B”, “S” as variables token “+” operator, token “=“ assign
COMPILATOARE 12 The parser verifies also the coherence of variables, if are the same (if all A, B, S integers – OK) if one is different, the templates for “+” and “=“ need conversion to coherent type Ex: if S is real, A,B integer “+” rule OK, result integer “=“ (assignment rule) add format conversion integer => real(float)
COMPILERS 13 I-st parser operation - structures consistency (conversion, if needed) II-nd parser operation - A+B (result in temporary memory) III-rd parser operation - assigning result to S (S=A+B) Applicable BNF rules: conversion, addition, assignment, in that order
COMPILERS 14 EXAMPLE II (bottom-up parsing) S=A+B*C – D scan the line, discover operations to be performed first result become “nonterminal” symbol => The precedence of operators( + -) Assuming algebraic expression rules Syntactic algebraic rule of multiplication ::= * Syntactic law of addition ::=( + )|( - )
COMPILERS 15 EXEMPLE II (bottom-up parsing) ::=B*C ::=A+N1 ::=N2-D Syntactic tree of expression A+B*C-D
COMPILERS 16 EXEMPLE II (bottom-up parsing) S=A+(B*C-D) S=ATTRIB(N3) N3=SUM(A,N2) N2=SCAD(N1,D) N1=PROD(B,C) Syntactic tree of expression A+B*C-D
COMPILERS 17 STANDARD PROGRAM IN PASCAL SIMPLIFIED LANGUAGE 1 MEDIA ANALYSIS PROGRAM 2VAR 3NRCRT, I: INTEGER; 3SARITM, SARMON, DIF: REAL 4BEGIN 5 SARITM:=0; 6 SARMON:=0; 7FORI:=0TO100DO 8BEGIN 9READ (NRCRT); 10SARITM:= SARITM + NRCRT; 11SARMON:= SARMON + 1 DIV NRCRT; 12END; 13DIF:=SARITM DIV 100 – 100 DIV SARMON; 14 WRITE (DIF); 15END.
COMPILERS 18 GRAMMAR (BNF) PASCAL SIMPLIFIED LANGUAGE 1. ::= PROGRAM VAR BEGIN END. 2. ::=id 3. ::= | ; 4. ::= : 5. ::=INTEGER | REAL 6. ::=id |, id 7. ::= | ; 8. ::= | | | 9. ::=id := 10. ::= | + | ::= | * | DIV 12. ::=id | int | ( ) 13. ::=READ(id_list) 14. ::=WRITE(id_list) 15. ::=FOR DO ; 16. ::=id:= TO 17. ::= | BEGIN END
COMPILERS 19 Token NameCod PROGRAM1 VAR2 BEGIN3 END.4 END5 INTEGER6 REAL7 READ8 WRITE9 FOR10 TO11 DO12 ;13 :14,15 := DIV19 (20 )21 ID22 INT23
COMPILERS Fisier elaborat de scaner 20 LINITOKENSpecificity 11 22^ STATUS : ^ I
COMPILERS 21 STANDARD 9.READ (NRCRT); BNF: 13. ::=READ(id_list) 6. ::=id | ) ; id
COMPILERS 22 STANDARD 15.DIF:=SARITM DIV 100 – 100 DIV SARMON; BNF: 9. ::=id := 10. ::= | ::= | DIV 12. ::=id | int| ( )
COMPILERS 23
COMPILERS 24 PRO GRA M VAR BEG IN END. END INTE GER REA L REA D WRI TE FOR TO DO ;:, := + - DIV () ID INT PROG RAM.=. <. VAR.=. <. BEGIN.=. <. END. END.> INTEG ER.> REAL.> READ.=. WRITE.=. FOR.=. <. TO.> <. DO <..> <. ;.> <. :,.=. :=.>.=..> <. +.> <..><. -.> <..><. DIV.> <..><. (.=.<. ).> ID.>.=..> INT.>
COMPILERS 25 PROGRAM.=. VAR BEGIN <. FOR ;.> END. Vide pairs - grammatical errors Precedence relations– only one (consistency grammar)
COMPILERS 26 Generating semantic programs DIF:=SARITM DIV 100 – 100 DIV SARMON id1 := id2 DIV int - int DIV id4 id1 := exp1 - exp2 id1 := exp3 DIVSARITM#100i1 DIV#100SARMONi2 -i1i2i3 :=i4,DIF
COMPILERS 27 (1):=#0,SARITM{SARITM:=0} (2):=#0,SARMON{SARMON:=0} (3):=#1,I{FOR i=1 to 100} (4)JGTI#100(15) (5)CALLX READ{READ(NRCRT)} (6)PARAMNRCRT (7)+SARITMNRCRTi1{SARITM:=SARITM+NRCRT} (8):=i1,SARITM (9)DIV#1NRCRTi2{SARMON:=SARMON+1 DIV NRCRT) (10)+SARMONi2i3 (11):=i3,SARMON (12)+I#1i4{sfîrşit FOR} (13):=i4,I (14)J(4) (15)DIVSARITM#100i6{DIF :=SARITM DIV DIV SARMON} (16)DIV#100SARMONi7 (17)-i6i7i8 (18):=i8,DIF (19)CALLX WRITE (20)PARAMDIF
COMPILERS 28 1.L.L. Beck, „System Software: An introduction to systems programming”, Addison Wesley. 3’rd edition, A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, „Compilers: Principles, Techniques, and Tools”, 2'nd Edition. Addison-Wesley, Wirth Niklaus ""Compiler Construction", Addison- Wesley, 1996, 176 pages. Revised November Knuth, Donald E. "Backus Normal Form vs. Backus Naur Form", Communications of the ACM 7 (12), 1964, p735– 736.