Principles of Programming Languages www.cs.bgu.ac.il/~ppl192 Lesson 9 –Variables in Scheme
What we saw so far Static properties are computed by traversing the AST - independently of any execution / runtime information. Systematic design patterns which consists of traversing the AST and testing each possible type in the AST data type. We distinguish variable declarations and variable references. Variable declarations define a scope in which the variable is visible.
S-Expressions The S-Expressions parser performs tokenization and handles nested parentheses. It abstracts most of the complexity of parsing from us. S-expressions are nested lists of strings (which are the only atomic types). The function parseL1exp() performs the actual work (from s-exp to AST).
L1 BNF We present a first version of the TypeScript program which encodes the following BNF in a set of disjoint union types in TypeScript. <program> ::= (L1 <exp>+) // program(exps:List(exp)) <exp> ::= <define-exp> | <cexp> <define-exp> ::= (define <var-decl> <cexp>) // def- exp(var:var-decl, val:cexp) <cexp> ::= <num-exp> // num-exp(val:Number) | <bool-exp> | <prim-op> | <var-ref> // bool-exp(val:Boolean) // prim-op(op:String) // var-ref(var:String) | (<cexp> <cexp>*) // app-exp(rator:cexp, rands:List(cexp)) <prim-op> ::= + | - | * | / | < | > | = | not <num-exp> ::= a number token <bool-exp> ::= #t | #f <var-ref> ::= an identifier token <var-decl> ::= an identifier token
Recipe Interface of L1
Type Guards The parser switches according to the type of the parameter. We use a switch code structure implemented as a chained ternary conditionals: E.g., e1 ? e2 : e3 ? e4 : ... which is an expression in JavaScript (as opposed to switch or if else if which are statements). The type of the parameters are disjoint unions. The clauses of the switch are all type predicates also called guards. After a guard, the TypeScript type checker verifies that the parameter has the requested type(we call this part of the code the guarded clause). In parseL1Sexp, the parameter sexp is of type any, but after the call to the guard isArray?(sexp), it is of type array. Therefore, we can call a function that accepts any[] as a parameter.
Error Handling We deal with errors by returning a value of type Error. Error is a primitive type in TypeScript. Error is used in general as a parameter to exceptions (with try / catch / throw ). We avoid exceptions and explicitly handle errors. We add the Error type as a another disjoint type to the expression type. Similarly, we use Error as one of the disjoint types of the Value data representing the possible values computed by the interpreter of the L1 language.
Examples of Error Handling export const parseL1 = (x: string): Program | DefineExp | CExp | Error => parseL1Sexp(parseSexp(x)); export const parseL1Sexp = (sexp: any): Program | DefineExp | CExp | Error => isEmpty(sexp) ? Error("Unexpected empty") : isArray(sexp) ? parseL1Compound(sexp) : isString(sexp) ? parseL1Atomic(sexp) : Error("Unexpected type "+sexp); const parseL1Compound = (sexps: any[]): Program | DefineExp | CExp | Error => first(sexps) === 'L1' ? makeProgram(map(parseL1Sexp, rest(sexps))) : first(sexps) === 'define' ? makeDefineExp(makeVarDecl(sexps[1]), parseL1CExp(sexps[2])) : parseL1CExp(sexps); const parseL1Atomic = (sexp:string):CExp => sexp === '#t' ? makeBoolExp(true) : sexp === '#f' ? makeBoolExp(false) : isNumericString(sexp) ? makeNumExp(+sexp) : isPrimitiveOp(sexp) ? makePrimOp(sexp) : makeVarRef(sexp); const isPrimitiveOp = (x:string):boolean => x === '+' || x === '-' || x === '*' || x === '/' || x === '>' || x === '<' || x === '=' || x === 'not'; const parseL1CExp = (sexp: any): CExp | Error => isArray(sexp) ? makeAppExp(parseL1CExp(first(sexp)), map(parseL1CExp, rest(sexp))) : isString(sexp) ? parseL1Atomic(sexp) : Error("Unexpected type"+sexp);
References and Declarations We start with an example of variable binding. In Scheme variables occur as references and as declarations A variable reference uses a variable. For example, in the expression (+ 1 x) , x refers to a value that was previously attached to the variable. A variable declaration defines a new variable as an abstraction (a name) for a value. For example, the expressions (lambda (x) ...) or (let ((x ...)) ...) , x is declared as a new variable. In the lambda case, the value of x will be provided when the function is invoked; In the let case, the value of x is provided in the binding location of the let-expression.
Scope Variable declarations usually have limited scope in the program. This means that the name x in different locations of the program may refer to different variables. In the case of lambda and let , the declared variables are visible only within the scope of the body of the expressions. Programming languages come binding rules which determine how variable references relate to variable declarations. In Scheme, these rules are syntactic rules - that is, they can be computed by analyzing the AST of the program without executing it. Another way of saying this is that binding is a static property as opposed to a dynamic property which would depend on a specific execution of the program. Static properties are defined through structural induction - that is, they are defined for all possible types of expressions by going over the list of all possible expression types defined in the abstract syntax of the language.
Binding Rules for Scheme In an expression of the form (lambda (<variable>) <body>) <variable> is a declaration that binds all references to that variable in <body>unless some intervening declaration of the same variable occurs. In an expression of the form (let ((<variable> <value>)) <body>) <variable> is a declaration that binds all references of that variable in <body> unless some intervening declaration of the same variable occurs. Free and Bound Variables A variable x is free in expression E if and only if there is a reference of x in E that is not bound within the expression E. A variable x is bound in an expression E if and only if all references of x in are bounded within the expression E.
Example Which variable is free and which is bound? ((lambda (x) x) y)
Example ((lambda (x) x) y) Which variable is free and which is bound? x is bound - since the definition of x in the ‘formals’ of the lambda binds its occurrence in the body of the lambda. y is free.
Example (lambda (y) ((lambda (x) x) y)) Which variable is free and which is bound? (lambda (y) ((lambda (x) x) y))
Example (lambda (y) ((lambda (x) x) y)) Which variable is free and which is bound? (lambda (y) ((lambda (x) x) y)) x is bound. y is bound. The declaration of y in the first line binds its reference in the second line.
Free or Bound algorithm The following algorithm uses the recipe as we used to compute the height of an expression ( Eheight ): this is a structural induction over the disjoint union types of Scheme‘s AST. const occursFree = (v: string, e: Exp): boolean => isBoolExp(e) ? false : isNumExp(e) ? false : isStrExp(e) ? false : isLitExp(e) ? false : isVarRef(e) ? (v === e.var) : isIfExp(e) ? occursFree(v, e.test) || occursFree(v, e.then) || occursFree(v, e.alt) : isProcExp(e) ? ! (map((p) => p.var, e.args).includes(v)) && some((b) => occursFree(v, b), e.body) : isPrimOp(e) ? false : isAppExp(e) ? occursFree(v, e.rator) || some((rand) => occursFree(v, rand), e.rands) : isDefineExp(e) ? (v !== e.var.var) && occursFree(v, e.val) : false;
Collecting Variable References The referenceVars algorithm recursively collects all referenced variables in an expression. export const referencedVars = (e: Parsed | Error): ReadonlyArray<VarRef> => isBoolExp(e) ? [] : isNumExp(e) ? [] : isStrExp(e) ? [] : isLitExp(e) ? [] : isPrimOp(e) ? [] : isVarRef(e) ? [e] : // @ts-ignore: Expected 1-2 arguments, but got 3. isIfExp(e) ? union(referencedVars(e.test), referencedVars(e.then), referencedVars(e .alt)) : isAppExp(e) ? union(referencedVars(e.rator), reduce(union, [], map(referencedVars, e.rands))) : isProcExp(e) ? reduce(union, [], map(referencedVars, e.body)) : isDefineExp(e) ? referencedVars(e.val) : isProgram(e) ? reduce(union, [], map(referencedVars, e.exps)) : isLetExp(e) ? [] : // TODO []; Note that this function an almost identical structure to any AST visitor. By combining referencedVars and occursFree we can obtain the list of variables that occur free within an expression.
Variable Declaration and References in Abstract Syntax We represent declarations and references in two different data types as reflected by this updated BNF. Where we define the category <cexpLA> for "expression with lexical address": ;; <cexpLA> ::= <number> / num-exp(val:number) / bool-exp(val:boolean) / str-exp(val:string) / literal-exp(val:sexp) / var-ref(var:string) ;; | <boolean> | <string> | ( quote <sexp> ) | <var-ref> | ( lambda ( <var-decl>* ) <cexpLA>+ ) / proc-expLA(params:List(var- decl), body:List(cexp)) ;; | ( if <cexpLA> <cexpLA> <cexpLA> ) / if-expLA(test: cexpLA, the n: cexpLA, else: cexpLA) ;; | ( <cexpLA> <cexpLA>* ) / app-expLA(rator:cexpLA, ran ds:List(cexpLA)) To simplify - we ignore here define-exp and let-exp. We distinguish between var-decl and var-ref. Identifiers in the paramater list of a lambda-expression are var-decl, and identifiers elsewhere are var-ref. Compound expressions have the same structure as in the original syntactic definition, but refer to the new type cexpLA instead of cexp. We use the same atomic types AST definitions (number, boolean, string), and the same Literal expressions.
Practice: In the lexically scoped languages the same variable name refers to different declarations. These relations between variable reference and variable declarations are static properties - they only depend on the syntactic structure of the expression. Which x is referenced in each line? ((lambda (x) (* x x)) ; 1 ((lambda (x)(+ x x)) ; 2 2))
Determining the Scope of Variable Declarations In the lexically scoped languages the same variable name refers to different declarations. These relations between variable reference and variable declarations are static properties - they only depend on the syntactic structure of the expression. Which x is referenced in each line? ((lambda (x) (* x x)) ; 1 ((lambda (x)(+ x x)) ; 2 2)) Solution: The variable references in line 1 refer to the declaration in the first lambda in line 1, and those in line 2, to the second lambda declaration in line 2.
Determining the Scope of Variable Declarations Which variable is referenced in each line? (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Determining the Scope of Variable Declarations Which variable is referenced in each line? (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3 Solution: The variable reference x in line 2 refers to the declaration in line 2; The variable reference y in line 2 refers to the declaration in line 1; The variable reference x in line 3 refers to the declaration in line 1.
Lexical Address We disambiguate variable references with a lexical address that matches a variable reference with its declaration. The contour of a sub-expression within an embedding expression defines the scope of each variable declaration inside it (e.g., lambda and let expressions). Contours are embedded into each other. In the example above, there is a contour started at line 1 with the lambda declaration, and a second embedded contour in line 2. Variable references can refer to the declarations in the contours in which they appear - starting from the inner declaration, and looking outwards. For example, in line 2, the x reference looks up to the declaration in the inner contour in line 2; the y reference looks up to the external declaration in the outer contour in line 1. To indicate these relations, we define a lexical address as a tuple: [var : depth pos] where: var is the name of the variable depth is the number of contours that are crossed to reach the variable declaration pos is the offset of the variable within the declaration. For example, the lexical addresses annotations for the expressions above is: ((lambda (x) (* [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) (+ [x : 0 0] [x : 0 0])) ; 2 2)) The variable references in line 1 refer to the declaration in the first lambda in line 1, and those in line 2, to the second lambda declaration in line 2. (lambda (x y) ; 1 ((lambda (x) (+ [x : 0 0] [y : 1 1])) ; 2 (+ [x : 0 0] [x : 0 0])) 1) ; 3 Note that the variable references + and * in these examples are not bound to any declaration. This is because they occur free in the expression. In this case, we annotate them as [var free] as follows: ((lambda (x) ([* free] [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) ([+ free] [x : 0 0] [x : 0 0])) ; 2 ((lambda (x) ([+ free] [x : 0 0] [y : 1 1])) ; 2 ([+ free] [x : 0 0] [x : 0 0])) 1) ; 3 Find the contours. (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Lexical Address We disambiguate variable references with a lexical address that matches a variable reference with its declaration. The contour of a sub-expression within an embedding expression defines the scope of each variable declaration inside it (e.g., lambda and let expressions). Contours are embedded into each other. Solution: There is a contour that starts at line 1 with the lambda declaration, and a second embedded contour in line 2. We match variable references with definitions in the contours in which they appear starting from the inner declaration, and looking outwards. For example, in line 2, the x reference looks up to the declaration in the inner contour in line 2; the y reference looks up to the external declaration in the outer contour in line 1. To indicate these relations, we define a lexical address as a tuple: [var : depth pos] where: var is the name of the variable depth is the number of contours that are crossed to reach the variable declaration pos is the offset of the variable within the declaration. For example, the lexical addresses annotations for the expressions above is: ((lambda (x) (* [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) (+ [x : 0 0] [x : 0 0])) ; 2 2)) The variable references in line 1 refer to the declaration in the first lambda in line 1, and those in line 2, to the second lambda declaration in line 2. (lambda (x y) ; 1 ((lambda (x) (+ [x : 0 0] [y : 1 1])) ; 2 (+ [x : 0 0] [x : 0 0])) 1) ; 3 Note that the variable references + and * in these examples are not bound to any declaration. This is because they occur free in the expression. In this case, we annotate them as [var free] as follows: ((lambda (x) ([* free] [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) ([+ free] [x : 0 0] [x : 0 0])) ; 2 ((lambda (x) ([+ free] [x : 0 0] [y : 1 1])) ; 2 ([+ free] [x : 0 0] [x : 0 0])) 1) ; 3 Find the contours. (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Lexical Address Match the contours. (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Lexical Address Match the contours. Solution: (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3 Solution: For example, in line 2, the x reference looks up to the declaration in the inner contour in line 2; the y reference looks up to the external declaration in the outer contour in line 1.
Implementation We define a lexical address as a tuple: [var : depth pos] where: var is the name of the variable depth: is the number of contours that are crossed to reach the variable declaration pos: is the offset of the variable within the declaration. Find the lexical addresses: (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Implementation We define a lexical address as a tuple: [var : depth pos] where: var is the name of the variable depth: is the number of contours that are crossed to reach the variable declaration pos: is the offset of the variable within the declaration. Solution: (lambda (x y) ; 1 ((lambda (x) (+ [x : 0 0] [y : 1 1])) ; 2 (+ [x : 0 0] [x : 0 0])) 1) ; 3 The variable references in line 1 refer to the declaration in the first lambda in line 1, and those in line 2, to the second lambda declaration in line 2. Note that the variable references + and * in these examples are not bound to any declaration. This is because they occur free in the expression. In this case, we annotate them as [var free] as follows: ((lambda (x) ([* free] [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) ([+ free] [x : 0 0] [x : 0 0])) ; 2 2)) The variable references in line 1 refer to the declaration in the first lambda in line 1, and those in line 2, to the second lambda declaration in line 2. ((lambda (x) ([+ free] [x : 0 0] [y : 1 1])) ; 2 ([+ free] [x : 0 0] [x : 0 0])) 1) ; 3 Find the lexical addresses: (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Free Variables Find the free variables: (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Free Variables Find the free variables: Solution: The variable references + and * are not bound to any declaration because they occur free in the expression. Handling Free Variables: We annotate free variables as [var free]: ((lambda (x) ([* free] [x : 0 0] [x : 0 0])) ; 1 ((lambda (x) ([+ free] [x : 0 0] [x : 0 0])) ; 2 2)) Find the free variables: (lambda (x y); 1 ((lambda (x) (+ x y)) ; 2 (+ x x)) 1) ; 3
Algorithm for Lexical Addresses <lexical-address> ::= [<identifier> : <number> <number>] / lexical-address(var:string, depth:Number, pos:Number] */ export type LexAddress = FreeVar | LexicalAddress; export const isLexAddress = (x: any): x is LexAddress => isFreeVar(x) || isLexicalAddre ss(x); export interface FreeVar { tag: "FreeVar"; var: string; }; export const isFreeVar = (x: any): x is FreeVar => (typeof(x) === 'object') && (x.tag = == "FreeVar"); export const makeFreeVar = (v: string): FreeVar => ({tag: "FreeVar", var: v}); export interface LexicalAddress { tag: "LexicalAddress"; var: string; depth: number; pos: number; export const isLexicalAddress = (x: any): x is LexicalAddress => (typeof(x) === "object") && (x.tag === "LexicalAddress"); export const makeLexicalAddress = (v: string, depth: number, pos: number): LexicalAddre ss => ({tag: "LexicalAddress", var: v, depth: depth, pos: pos}); export const makeDeeperLexicalAddress = (la: LexicalAddress): LexicalAddress => makeLexicalAddress(la.var, la.depth + 1, la.pos);
Algorithm for Lexical Addresses /* Purpose: get the closest enclosing lexical address given a variable name. Signature: getLexicalAddress(var, lexicalAddresses) Pre-conditions: Lexical-addresses are sorted by depth Examples: getLexicalAddress((var-ref b), [[lex-addr a 0 0], [lex-addr b 0 1]]) => [LexAddr b 0 1] getLexicalAddress((var-ref c), [[lex-addr a 0 0], [lex-addr b 0 1]]) => [FreeVar c] getLexicalAddress((var-ref a), [[lex-addr a 0 0], [lex-addr b 0 1], [lex-add a 1 1]]) => [LexAddr a 0 0] */ export const getLexicalAddress = (v: VarRef, lexAddresses: LexAddress[]): LexAddress => { const loop = (addresses: LexAddress[]): LexAddress => isEmpty(addresses) ? makeFreeVar(v.var) : v.var === first(addresses).var ? first(addresses) : loop(rest(addresses)); return loop(lexAddresses); } Note how we mark the variable as occurring free when it is not found in any of the visible declarations. Observe how we implement iteration by defining a local recursive procedure called loop and invoke it inside the main body of the procedure. Traversing the Whole AST The algorithm to compute the lexical address of all variable references is thus implemented as follows:
Traversing the Whole AST export const addLexicalAddresses = (exp: CExpLA | Error): CExpLA | Error => { const visitProc = (proc: ProcExpLA, addresses: LexAddress[]): ProcExpLA | Error => { let newAddresses = crossContour(proc.params, addresses); return makeProcExpLA(proc.params, map((b) => visit(b, newAddresses), proc.body )); }; const visit = (exp: CExpLA | Error, addresses: LexAddress[]): CExpLA | Error => isBoolExp(exp) ? exp : isNumExp(exp) ? exp : isStrExp(exp) ? exp : isVarRef(exp) ? getLexicalAddress(exp, addresses) : isFreeVar(exp) ? Error("unexpected LA ${exp}") : isLexicalAddress(exp) ? Error("unexpected LA ${exp}") : isLitExp(exp) ? exp : isIfExpLA(exp) ? makeIfExpLA(visit(exp.test, addresses), visit(exp.then, addresses), visit(exp.alt, addresses)) : isProcExpLA(exp) ? visitProc(exp, addresses) : isAppExpLA(exp) ? makeAppExpLA(visit(exp.rator, addresses), map((r) => visit(r, addresses), exp.rands)) : exp; return isError(exp) ? exp : visit(exp, []);
Summary Static properties of expressions can be computed by traversing the AST without executing it. We distinguish variable declarations and variable references in the syntax. We defined the contour and scope of variable declarations. We defined binding rules to match references and declerations. Some variables occur bound, some occur free. We defined the lexical address of a bound variable reference. We saw an algorithm to match lexical addresses to variables.