Mini-Pascal Compiling Mini-Pascal (MPC) language Subset of the Pascal programming language Somewhat similar to the Java and “C” programming languages There are many differences, however Differences make it much easier to compile ☺ We will discuss details of the actual language when it becomes important
Scanning 1st stage of compiling a program Code written in a programming language High-level languages are supposed to resemble English… …but they don’t. Contain many features specifically designed for the computer How often do you use a semi-colon?
Scanning Raw text is hard for computer to understand Much easier for it to work with objects Scanning converts text into tokens Object encoding a single text idea This is a very common problem, not just in compilers
Scanning Only consider token being processed This stage of compilation only generates tokens Looks for obvious lexical errors --- text that cannot be legal Does not track past tokens Does not worry if text has any real meaning Understanding meaning occurs later in the process
Lexical analysis for tokens in English Legal: ? “”, Snap p. 35 crackle < pop ppo quack! Illegal: ¡ I am excited ! Illegal: gemütlichkeit façade
Lexical analysis for tokens in English Legal: ? “”, Snap p. 35 crackle < pop ppo quack! Illegal: ¡ I am excited ! Illegal: gemütlichkeit façade
Types of Tokens in Mini-Pascal Operator All the meaningful symbols in Mini-Pascal: Numerical: + - * ^ Comparative: < > <= >= <> == Separator: ( ) [ ] . ; , Assignment: := Spaces are meaningful “:=” is one token “: =” is two tokens
Types of Tokens in Mini-Pascal Int Includes all numbers defined by Mini-Pascal Mini-Pascal does not include real numbers Int token includes an uninterrupted series of integers “1354934573212” is one token “13 45” is two tokens – for “13” and “45” “13.65” is three tokens – for “13”, “.”, and “65” “2,585” is three tokens – for “2”, “,”, and “585”
Types of Tokens in Mini-Pascal String The literal strings in Mini-Pascal Java strings begin and end with double quote (“”) Pascal strings begin and end with single quote (‘’) Can include any set of characters, letters, and numbers, but cannot go across multiple lines ‘Hi Mom. #1’ is one token -- “Hi Mom. #1” Note: The quotes are not included in the token
Types of Tokens in Mini-Pascal Identifier/Id Includes keywords (reserved) in Mini-Pascal: and array begin case const div do downto else end for function if mod nil not of or procedure program record repeat then to type until var while Also potential variable and method names Begin with letter and then any combination of letters and numbers DO NOT worry (yet) if it is an actual name
Other Work While Scanning Comments Pascal also includes comments Begins with either a “{“ or “(*” Then include any legal characters including letters, numbers, spaces, newlines End with either “}” or “*)” { This is a legal comment *) (* and so is this } There is no comment token --- it is not used in compilation