CS 320: Compiling Techniques David Walker
People David Walker (Professor) 412 Computer Science Building office hours: after each class Guilherme Ottoni (TA) 417 Computer Science Building office hours: Mondays 2-2:30 PM Fridays 2-3 PM
Information Web site: ng05/cos320/index.htm ng05/cos320/index.htm Mailing list: To subscribe: To post to this list, send your to:
Books Modern Compiler Implementation in ML, Andrew Appel A reference manual for SML best choice: Online references see course web site several hardcopy books Elements of ML Programming, Jeffrey D. Ullman
Assignment 0 Write your name and other information on the sheet circulating Find, skim and bookmark the course web pages Subscribe to course list Begin assignment 1 Figure out how to run & use SML Due next Thursday February 10
onward!
What is a compiler? A compiler is program that translates a source language into an equivalent target language
What is a compiler? while (i > 3) { a[i] = b[i]; i ++ } mov eax, ebx add eax, 1 cmp eax, 3 jcc eax, edx C program assembly program compiler does this
What is a compiler? class foo { int bar;... } struct foo { int bar;... } Java program compiler does this C program
What is a compiler? class foo { int bar;... } Java program compiler does this Java virtual machine program
What is a compiler? \newcommand{.... } \sfd\sf\fadg Latex program compiler does this Tex program
What is a compiler? \newcommand{.... } \sfd\sf\fadg Tex program compiler does this Postscript program
What is a compiler? Other places: Web scripts are compiled into HTML assembly language is compiled into machine language hardware description language is compiled into a hardware circuit...
Compilers are complex text file to abstract syntax lexing; parsing abstract syntax to intermediate form (IR) analysis; optimizations; data layout IR to machine code code generation; register allocation front-end middle-end back-end
Course project Fun Source Language simple imperative language Only 1 IR (the initial abstract syntax generated by the parser) type checking; high-level optimizations Code Generation instruction selection algorithms; register allocation via graph coloring front-end middle-end back-end
Standard ML Standard ML is a domain-specific language for building compilers Support for Complex data structures (abstract syntax, compiler intermediate forms) Memory management like Java Large projects with many modules Advanced type system for error detection
Introduction to ML You will be responsible for learning ML on your own. Today I will cover some basics Resources: Robert Harper’s Online book “an introduction to ML” is a good place to start See course webpage for pointers and info about how to get the software
Intro to ML Highlights Data Structures for compilers Data type definitions Pattern matching Strongly-typed language Every expression has a type Certain errors cannot occur Polymorphic types provide flexibility Flexible Module System Abstract Types Higher-order modules (functors)
Intro to ML Interactive Language Type in expressions Evaluate and print type and result Compiler as well High-level programming features Data types Pattern matching Exceptions Mutable data discouraged
Preliminaries start sml in Unix by typing sml at a prompt: tux% sml Standard ML of New Jersey, Version , September 28, 2000 [CM; autoload enabled] - (* quit SML by pressing ctrl-D *) (* just so you know, comments can be (* nested *) *)
Preliminaries Read – Eval – Print – Loop ;
Preliminaries Read – Eval – Print – Loop ; > 5: int
Preliminaries Read – Eval – Print – Loop ; > 5: int - it + 7; > 12 : int
Preliminaries Read – Eval – Print – Loop ; > 5: int - it + 7; > 12 : int - it – 3; > 9 : int true; stdIn: Error: operator and operand don't agree [literal] operator domain: int * int operand: int * bool in expression: 4 + true
Preliminaries Read – Eval – Print – Loop - 3 div 0; Failure : Divrun-time error
Basic Values - (); > () : unit=> like “void” in C (sort of) => the uninteresting value/type - true; > true : bool - false; > false : bool - if it then 3+2 else 7;“else” clause is always necessary > 7 : int - false andalso loop_Forever; > false : booland also, or else short-circuit eval
Basic Values Integers ; > 5 : int (if not true then 5 else 7); > 10 : intNo division between expressions and statements Strings - “Dave” ^ “ “ ^ “Walker”; > “Dave Walker” : string - print “foo\n”; foo > 3 : int Reals ; > 3.14 : real
Using SML/NJ Interactive mode is a good way to start learning and to debug programs, but… Type in a series of declarations into a “.sml” file - use “foo.sml” [opening foo.sml] … list of declarations with their types
Larger Projects SML has its own built in interactive “make” Pros: It automatically does the dependency analysis for you No crazy makefile syntax to learn Cons: May be more difficult to interact with other languages or tools
Compilation Manager % sml - OS.FileSys.chDir “ ~/courses/510/a2 ” ; - CM.make(); looks for “ sources.cm ”, analyzes dependencies [compiling … ] compiles files in group [wrote … ] saves binaries in./CM/ - CM.make ’ “ myproj/ ” (); specify directory sources.cm c.smlb.smla.sig Group is a.sig b.sml c.sml
What is next? ML has a rich set of structured values Tuples: (17, true, “stuff”) Records: {name = “Dave”, ssn = } Lists: 3::4::5::nil or Datatypes Functions And more! Rather than list all the details, we will write a couple of programs
An interpreter Interpreters are usually implemented as a series of transformers: stream of characters abstract syntax lexing/ parsing evaluate abstract value print stream of characters
A little language (LL) An arithmetic expression e is a boolean value an if statement (if e1 then e2 else e3) an integer an add operation a test for zero (isZero e)
LL abstract syntax in ML datatype term = Bool of bool | If of term * term * term | Num of int | Add of term * term | IsZero of term -- by convention, constructors are capitalized -- constructors can take a single argument of a particular type type of a tuple another eg: string * char vertical bar separates alternatives
LL abstract syntax in ML Add (Num 2, Num 3) represents the expression “2 + 3” Add Num 23
LL abstract syntax in ML If (Bool true, Num 0, Add (Num 2, Num 3)) represents “if true then 0 else 2 + 3” Add Num 2 3 true BoolNum 0 If
Function declarations fun isValue t = case t of Num n => true | Bool b => true | _ => false function name function parameter default pattern matches anything
What is the type of the parameter t? Of the function? fun isValue t = case t of Num n => true | Bool b => true | _ => false function name function parameter default pattern matches anything
What is the type of the parameter t? Of the function? fun isValue (t:term) : bool = case t of Num n => true | Bool b => true | _ => false val isValue : term -> bool ML does type inference => you need not annotate functions yourself (but it can be helpful)
A type error fun isValue t = case t of Num n => n | _ => false ex.sml: Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: Successor t2 => true
A type error Actually, ML will give you several errors in a row: ex.sml: Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: Successor t2 => true ex.sml: Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => false
A very subtle error fun isValue t = case t of num => true | _ => false The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong?
A very subtle error fun isValue t = case t of num => true | _ => false The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong? -- num is not capitalized (and has no argument) -- ML treats it like a variable pattern (matches anything!)
Exceptions exception Error of string fun debug s : unit = raise (Error s)
Exceptions exception Error of string fun debug s : unit = raise (Error s) - debug "hello"; uncaught exception Error raised at: ex.sml: in SML interpreter:
Evaluator fun isValue t =... exception NoRule fun eval t = case t of Bool _ | Num _ => t |...
Evaluator... fun eval t = case t of Bool _ | Num _ => t | If(t1,t2,t3) => let val v = eval t1 in case v of Bool b => if b then (eval t2) else (eval t3) | _ => raise NoRule end let statement for remembering temporary results
Evaluator exception NoRule fun eval1 t = case t of Bool _ | Num _ =>... |... | Add (t1,t2) => case (eval v1, eval v2) of (Num n1, Num n2) => Num (n1 + n2) | (_,_) => raise NoRule
Finishing the Evaluator fun eval1 t = case t of... |... | Add (t1,t2) =>... | IsZero t =>... be sure your case is exhaustive
Finishing the Evaluator fun eval1 t = case t of... |... | Add (t1,t2) =>... What if we forgot a case?
Finishing the Evaluator ex.sml: Warning: match nonexhaustive (Bool _ | Zero) =>... If (t1,t2,t3) =>... Add (t1,t2) =>... fun eval1 t = case t of... |... | Add (t1,t2) =>... What if we forgot a case?
Last Things Learning to program in SML can be tricky at first But once you get used to it, you will never want to go back to imperative languages Check out the reference materials listed on the course homepage