System Software Unit-1 (Language Processors) A TOY Compiler Prepared By :- Bhavin Dalsaniya MEFGI-MCA Studant
The Front End The front end performs Lexical Analysis Syntax Analysis Semantic Analysis of the source program. Each kind of analysis involves the following functions Determine validity of a source statement Determine the ‘content’ of a source statement Construct the IC of a source statement for use by subsequent analysis functions.
‘content’ word The word ‘content’ has different meaning in laxical,syntax and semantic analysis. In lexical analysis, the content is the lexical class to which each lexical unit belongs. In syntax analysis it is the syntactic structure of a source statement. In semantic analysis the content is the meaning of a statement.
After Analysis of ‘content’ It generates information in form of Tables of information Description of the source statement Subsequent analysis uses this information for its own purpose and either adds information to these tables and description. For example :- syntax analysis uses the information generated by lexical analysis and construct a representation for the syntactic structure of source statement . Semantic analysis uses the information generated by syntax analysis and construct representation for the meaning of the statement. The tables and descriptions at the end of semantic analysis form the IR (Intermediate Representation) of the front end. Its more clear from the following diagram.
Diagram of front end toy compiler Source Program ------------------------------ ||||||||||||| |||||||||||||| Lexical Or Scanning Lexical Errors Symbol table, Constant table, Other tables… Tokens Syntax OR Parsing Syntax Errors Trees Semantic Analysis Semantic Errors ------------------------------ IC IR
1.Lexical Analysis(Scanning) Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes. E.g. id’s,constants,reserved id’s etc and enters them into different tables Lexical analysis builds a descriptor, called a token, for each lexical unit. A token contains two fields—class code and number in class. Class code identifies the class to which a lexical unit belongs. Number in class the entry number of the lexical unit in the relevant table. We depict a token as Code #no
Example :- i : integer a,b : real The statement a:=b+i; Symbol Table Intermediate Code 1. Convert (Id,#1) to real ,giving (Id,#4) 2. Add (Id,#4) to (Id,#3), giving (Id,#5) 3. Store (Id,#5) in (Id,#2) Id,#2 Op,#5 Id,#3 Op,#3 Id,#1 Op,#10 No Symbol Type Length Address 1 i int 2 a real 3 b 4 i* 5 temp
2.Syntax Analysis(Parsing) Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class, e.g. assignment statement, if statement , etc. It then builds an IC which represents the structure of the statement. The IC is passed to semantic analysis to determine the meaning of the statement A tree form is chosen for IC because a tree can represent the hierarchical structure of a PL statement appropriately. a:= b+i; := real a + a b b i
3.Semantic Analysis Semantic analysis identifies the sequence of actions necessary to implement the meaning of a source statement When semantic analysis determines the meaning of a subtree in the IC,it adds information to a table or adds an action to the sequence of actions. It then modifies the IC to enable further semantic analysis. The analysis ends when the tree has been completely processed.
Example of Semantic Analysis Source statement a:=b+i; No of Analysis Steps :- Add type Right hand side Expression evaluated first in assignment. Before Add , perform Conversion int to real Addition operation and store into temp. temp store into a. Its more clear from the tree shown in front. := A) a, real + b, real i, int := B) a, real + b, real i*, real := C) a, real temp, real
* The Back End The back end performs two task as follows Memory Allocation Code generation Memory Allocation :-memory allocation is a simple task given the presence of the symbol table. The memory requirement of an identifier is computed from its type, length an dimensionality and memory is allocated to it. The address of the memory area is entered in the symbol table.
Conti…
Conti… Code Generation :- code generation uses knowledge of the target architecture.. Knowledge of instruction and addressing modes in the target computer, to select the appropriate instruction. The important issues in code generation are : Determine the places where the intermediate results should be kept. either it is in memory location or in machine register. Determine which instructions should be used for type conversion operation. Determine which addressing modes should be used for accessing variables.
Conti…
Toy Compiler
Programming Language Grammar A language L can be considered to be a collection of valid sentences. Each sentences can be looked upon as a sequence of words , and each word as a sequence of letters or graphic symbols acceptable in L. A Language specified in this manner is known as a “Formal Language”. Terminal Symbol :- The alphabet of L, denoted by the Greek symbol ∑, is a collection of symbol in its character set. We will use lower case letters a , b , c , etc. to denote symbols in ∑. A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of a set , e . g ∑={ a ,b , c …..z,0,1,2 …9}
Conti… Here the symbol {, ‘,’ and} are part of the notation . we call them metasymbols to differentiate them from the terminal symbols. Strings :- A string is a finite sequence of symbols . we will represent strings by Greek symbols α β γ etc. α= axy is a string over ∑. The length of a string is the number of symbols in it. Note that absence of any symbol is also a string, the null string €. Concatenation operation combines two strings into single strings.
Conti… Nonterminal symbols :- Productions :- A nonterminal symbol (NT) is the name of a syntax category of a language. E.g noun, verb etc. An NT is written as a single capital letter or as a name enclosed between <….>, e.g A or <Noun>. During grammatical analysis, a nonterminal symbol represents an instance of the category . thus,<Noun> represents a noun. Productions :- A production also called a rewriting rule, is a rule of grammar. A production has the form A nonterminal symbol::= String of Ts and NTs
Conti… Each grammar G defines a language Lg . G contains an NT called the distinguished symbol or start NT of G . unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string α of Lg is obtained by using the following procedure Let α=‘S’ While α is not a string of terminal symbols Select an NT appearing in α,say X Replace X by a string appearing on the RHS of a production of X. Grammar Derivation Reduction Parse Tree