Cyclone: Safe C-Level Programming (With Multithreading Extensions) Dan Grossman Cornell University October 2002 Joint work with: Trevor Jim (AT&T), Greg.

Slides:



Advertisements
Similar presentations
Cyclone: A Memory-Safe C-Level Programming Language Dan Grossman University of Washington Joint work with: Trevor JimAT&T Research Greg MorrisettHarvard.
Advertisements

Chapter 17 vector and Free Store John Keyser’s Modifications of Slides By Bjarne Stroustrup
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
C. FlanaganSAS’04: Type Inference Against Races1 Type Inference Against Races Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College.
Summer School on Language-Based Techniques for Integrating with the External World Types for Safe C-Level Programming Part 3: Basic Cyclone-Style Region-
Type-Safe Programming in C George Necula EECS Department University of California, Berkeley.
Typed Memory Management in a Calculus of Capabilities David Walker (with Karl Crary and Greg Morrisett)
Cyclone, Regions, and Language-Based Safety CS598e, Princeton University 27 February 2002 Dan Grossman Cornell University.
Run time vs. Compile time
Existential Types for Imperative Languages Dan Grossman Cornell University Eleventh European Symposium on Programming April 2002.
Region-Based Memory Management in Cyclone Dan Grossman Cornell University June 2002 Joint work with: Greg Morrisett, Trevor Jim (AT&T), Michael Hicks,
Type-Safe Multithreading in Cyclone Dan Grossman Cornell University TLDI Jan 2003.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
1 Pointers, Dynamic Data, and Reference Types Review on Pointers Reference Variables Dynamic Memory Allocation –The new operator –The delete operator –Dynamic.
Advanced Type Systems for Low-Level Languages Greg Morrisett Cornell University.
Cyclone: Safe Programming at the C Level of Abstraction Dan Grossman Cornell University May 2002 Joint work with: Trevor Jim (AT&T), Greg Morrisett, Michael.
Cyclone: Safe Programming at the C Level of Abstraction Dan Grossman Cornell University Joint work with: Trevor Jim (AT&T), Greg Morrisett, Michael Hicks.
CSE341: Programming Languages Lecture 11 Type Inference Dan Grossman Winter 2013.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 13: Pointers, Classes, Virtual Functions, and Abstract Classes.
Chapter 7: Runtime Environment –Run time memory organization. We need to use memory to store: –code –static data (global variables) –dynamic data objects.
Pointer Data Type and Pointer Variables
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 14: Pointers, Classes, Virtual Functions, and Abstract Classes.
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
CSE 425: Data Types II Survey of Common Types I Records –E.g., structs in C++ –If elements are named, a record is projected into its fields (e.g., via.
Compiler Construction
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Computer Science and Software Engineering University of Wisconsin - Platteville 2. Pointer Yan Shi CS/SE2630 Lecture Notes.
Experience with Safe Memory Management in Cyclone Michael Hicks University of Maryland, College Park Joint work with Greg Morrisett - Harvard Dan Grossman.
Basic Semantics Associating meaning with language entities.
CSE 425: Data Types I Data and Data Types Data may be more abstract than their representation –E.g., integer (unbounded) vs. 64-bit int (bounded) A language.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
C++ Memory Overview 4 major memory segments Key differences from Java
Pointers and Dynamic Memory Allocation Copyright Kip Irvine 2003, all rights reserved. Revised 10/28/2003.
Dynamic Memory Allocation. Domain A subset of the total domain name space. A domain represents a level of the hierarchy in the Domain Name Space, and.
Sharing Objects  Synchronization  Atomicity  Specifying critical sections  Memory visibility  One thread’s modification seen by the other  Visibility.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
CSCI Rational Purify 1 Rational Purify Overview Michel Izygon - Jim Helm.
Combining Garbage Collection and Safe Manual Memory Management Michael Hicks University of Maryland, College Park Joint work with Greg Morrisett - Harvard,
11th Nov 2004PLDI Region Inference for an Object-Oriented Language Wei Ngan Chin 1,2 Joint work with Florin Craciun 1, Shengchao Qin 1,2, Martin.
Concepts of programming languages Chapter 5 Names, Bindings, and Scopes Lec. 12 Lecturer: Dr. Emad Nabil 1-1.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 10 – C: the heap and manual memory management.
Efficient Detection of All Pointer and Array Access Errors Todd M.Austin Scott E.Breach Gurindar S.Sohi Computer Sciences Department University of Wisconsin-Madison.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator.
Dr. M. Al-Mulhem Introduction 1 Chapter 6 Type Systems.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
Object Lifetime and Pointers
CSE341: Programming Languages Lecture 11 Type Inference
Names and Attributes Names are a key programming language feature
CSE 374 Programming Concepts & Tools
CS 215 Final Review Ismail abumuhfouz Fall 2014.
Cyclone A Very Short Introduction Dan Grossman
Threads and Memory Models Hal Perkins Autumn 2011
Names, Binding, and Scope
Chapter 15 Pointers, Dynamic Data, and Reference Types
Programming with Regions
Dan Grossman University of Washington 26 July 2007
Pointers C#, pointers can only be declared to hold the memory addresses of value types int i = 5; int *p; p = &i; *p = 10; // changes the value of i to.
Threads and Memory Models Hal Perkins Autumn 2009
CSE341: Programming Languages Lecture 11 Type Inference
Cyclone: A safe dialect of C
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE341: Programming Languages Lecture 11 Type Inference
SPL – PS2 C++ Memory Handling.
Presentation transcript:

Cyclone: Safe C-Level Programming (With Multithreading Extensions) Dan Grossman Cornell University October 2002 Joint work with: Trevor Jim (AT&T), Greg Morrisett, Michael Hicks, James Cheney, Yanling Wang (Cornell)

9 Oct 2002Dan Grossman, Cyclone2 A disadvantage of C Lack of memory safety means code cannot enforce modularity/abstractions: void f(){ *((int*)0xBAD) = 123; } What might address 0xBAD hold? Memory safety is crucial for your favorite policy No desire to compile programs like this

9 Oct 2002Dan Grossman, Cyclone3 Safety violations rarely local void g(void**x,void*y); int y = 0; int *z = &y; g(&z,0xBAD); *z = 123; Might be safe, but not if g does *x=y Type of g enough for separate code generation Type of g not enough for separate safety checking

9 Oct 2002Dan Grossman, Cyclone4 Some other problems One safety violation can make your favorite policy extremely difficult to enforce So prohibit: incorrect casts, array-bounds violations, misused unions, uninitialized pointers, dangling pointers, null-pointer dereferences, dangling longjmp, vararg mismatch, not returning pointers, data races, …

9 Oct 2002Dan Grossman, Cyclone5 What to do? Stop using C –YFHLL is usually a better choice Compile C more like Scheme –type fields, size fields, live-pointer table, … –fail-safe for legacy whole programs Static analysis –very hard, less modular Restrict C –not much left

9 Oct 2002Dan Grossman, Cyclone6 Cyclone in brief A safe, convenient, and modern language at the C level of abstraction Safe: memory safety, abstract types, no core dumps C-level: user-controlled data representation and resource management, easy interoperability, “manifest cost” Convenient: may need more type annotations, but work hard to avoid it Modern: add features to capture common idioms “New code for legacy or inherently low-level systems”

9 Oct 2002Dan Grossman, Cyclone7 The plan from here Not-null pointers Type-variable examples –parametric polymorphism –region-based memory management –multithreading Dataflow analysis Status Related work I will skip many very important features

9 Oct 2002Dan Grossman, Cyclone8 Subtyping: < t * but t < t  Downcast via run-time check, often avoided via flow analysis Not-null pointers t*t* pointer to a t value or NULL pointer to a t value /

9 Oct 2002Dan Grossman, Cyclone9 Example FILE* fopen(const const int int void g() { FILE* f = fopen(“foo”, “r”); while(fgetc(f) != EOF) {…} fclose(f); } Gives warning and inserts one null-check Encourages a hoisted check

9 Oct 2002Dan Grossman, Cyclone10 The same old moral FILE* fopen(const const int int Richer types make interface stricter Stricter interface make implementation easier/faster Exposing checks to user lets them optimize Can’t check everything statically (e.g., close-once)

9 Oct 2002Dan Grossman, Cyclone11 “Change void* to alpha” struct Lst { void* hd; struct Lst* tl; }; struct Lst* map( void* f(void*), struct Lst*); struct Lst* append( struct Lst*, struct Lst*); struct Lst { `a hd; struct Lst * tl; }; struct Lst * map( `b f(`a), struct Lst *); struct Lst * append( struct Lst *, struct Lst *);

9 Oct 2002Dan Grossman, Cyclone12 Not much new here Closer to C than ML: less type inference allows first-class polymorphism and polymorphic recursion data representation may restrict α to pointers, int (why not structs? why not float ? why int ?) Not C++ templates

9 Oct 2002Dan Grossman, Cyclone13 Existential types Programs need a way for “call-back” types: struct T { void (*f)(void*, int); void* env; }; We use an existential type (simplified for now): struct T { void int); `a env; }; more C-level than baked-in closures/objects

9 Oct 2002Dan Grossman, Cyclone14 The plan from here Not-null pointers Type-variable examples –parametric polymorphism (α, , , λ) –region-based memory management –multithreading Dataflow analysis Status Related work I will skip many very important features

9 Oct 2002Dan Grossman, Cyclone15 Regions a.k.a. zones, arenas, … Every object is in exactly one region Allocation via a region handle All objects in a region are deallocated simultaneously (no free on an object) An old idea with recent support in languages (e.g., RC) and implementations (e.g., ML Kit)

9 Oct 2002Dan Grossman, Cyclone16 Cyclone regions heap region: one, lives forever, conservatively GC’d stack regions: correspond to local-declaration blocks: {int x; int y; s} dynamic regions: scoped lifetime, but growable: region r {s} allocation: rnew(r,3), where r is a handle handles are first-class –caller decides where, callee decides how much –no handles for stack regions

9 Oct 2002Dan Grossman, Cyclone17 That’s the easy part The implementation is really simple because the type system statically prevents dangling pointers void f() { int* x; if(1) { int y = 0; x = &y; // x not dangling } *x = 123; // x dangling }

9 Oct 2002Dan Grossman, Cyclone18 The big restriction Annotate all pointer types with a region name (a type variable of region kind) means “pointer into the region created by the construct that introduces `r ” –heap introduces `H – L:… introduces `L – region r {s} introduces `r r has type region_t

9 Oct 2002Dan Grossman, Cyclone19 Region polymorphism Apply what we did for type variables to region names (only it’s more important and could be more onerous) void x, y){ int tmp = *x; *x = *y; *y = tmp; } sumptr(region_t r,int x,int y){ return rnew(r) (x+y); }

9 Oct 2002Dan Grossman, Cyclone20 Type definitions struct ILst { hd; struct ILst *`r2 tl; };

9 Oct 2002Dan Grossman, Cyclone21 Region subtyping If p points to an int in a region with name `r1, is it ever sound to give p type int* `r2 ? If so, let int* `r1 < int* `r2 Region subtyping is the outlives relationship region r1 {… region r2 {…}…} LIFO makes subtyping common

9 Oct 2002Dan Grossman, Cyclone22 Soundness Ignoring , scoping prevents dangling pointers int*`L f() { L: int x; return &x; } End of story if you don’t use  For , we leak a region bound: struct T { :regions(`a) > `r void int); `a env; }; A powerful effect system is there in case you want it

9 Oct 2002Dan Grossman, Cyclone23 Regions summary Annotating pointers with region names (type variables) makes a sound, simple, static system Polymorphism, type constructors, and subtyping recover much expressiveness Inference and defaults reduce burden With additional run-time checks, can move beyond LIFO, but checks can fail Key point: do not check on every access

9 Oct 2002Dan Grossman, Cyclone24 The plan from here Not-null pointers Type-variable examples –parametric polymorphism (α, , , λ) –region-based memory management –multithreading Dataflow analysis Status Related work I will skip many very important features

9 Oct 2002Dan Grossman, Cyclone25 Data races break safety Data race: One thread accessing memory while another thread writes it On shared-memory MPs, a data race can corrupt a pointer Atomic word writes insufficient –struct with array bound and pointer to array –more generally, existential types Cyclone must prevent data races

9 Oct 2002Dan Grossman, Cyclone26 Preventing data races Static –Don’t have threads –Don’t have thread-shared memory –Require mutexes for all memory –Require mutexes for shared memory –Require sound synchronization for shared memory –... Dynamic –Detect races as they occur –Control scheduling and preemption –...

9 Oct 2002Dan Grossman, Cyclone27 Mutual exclusion support Require mutual exclusion for shared memory: –For each shared object, there exists a lock that must be acquired before access –Thread-local data must not escape its thread New terms: –spawn(f,p,sz) run f(p2) in a thread where *p2 is a shallow copy of *p1 and sz is sizeof(*p1) –newlock() create a new lock –nonlock a pseudolock for thread-local data –sync e s acquire lock e, run s, release lock Only sync requires language support

9 Oct 2002Dan Grossman, Cyclone28 Example (w/o types) void p){*p = *p + 1;} void inc2(lock_t p){sync m inc(p);} struct LkInt {lock_t m; p;}; void g(struct s){inc2(s->m, s->p);} void f(){ lock_t lk = newlock(); p1 = new 0; p2 = new 0; struct s = new LkInt{.m=lk,.p=p1}; spawn(g, s, sizeof(*s)); inc2(lk, p1); inc2(nonlock, p2); } Once again, this is the easy part

9 Oct 2002Dan Grossman, Cyclone29 Haven’t we been here before Annotate all pointers and locks with a lock name (e.g., lock_t, ) Special lock name loc for thread-local ( nonlock has type lock_t ) newlock has type  `L. lock_t sync e s where e has type lock_t allows *p in s where p has type default is caller locks (perfect for thread-local): void p;{`L}){*p=*p+1;}

9 Oct 2002Dan Grossman, Cyclone30 More about access rights For each program point, there is a set of lock names describing “held locks” –loc is always in the set –functions have set annotations, but default is caller-locks –sync adds appropriate name to the set Lexical scope for sync keeps rules simple, but is not essential

9 Oct 2002Dan Grossman, Cyclone31 Analogy with regions region_t int*`r `H region r s lock_t int*`L loc {let m =newlock(); sync m s} Access rights: region live or lock held Static rights amplified in lexical scope: region, sync Can ignore for prototyping or common case: `H, loc

9 Oct 2002Dan Grossman, Cyclone32 Differences as well... region r s... {let m =newlock(); sync m s} A region’s objects are accessible from region creation to region deletion (which happens once) A lock’s objects are accessible within a sync (which happens many times) So region combines newlock and sync So locks don’t induce subtyping Language/type-system design reflects reality

9 Oct 2002Dan Grossman, Cyclone33 Safe multithreading, so far Terms newlock, nonlock, sync, spawn Types lock_t, t*`L, lock_t, t*loc Type system assigns access rights to each program point Strikingly similar to memory management But have we prevented data races? If we never pass thread-local data to spawn!

9 Oct 2002Dan Grossman, Cyclone34 Enforcing loc A possible type for spawn: void spawn(void ;{}), sizeof_t ;{`L}); But not any `a will do We already have different kinds of type variables: R for regions, L for locks, B for pointer types, A for all types Examples: loc::L, `H::R, int*`H::B, struct T :: A

9 Oct 2002Dan Grossman, Cyclone35 Enforcing loc cont’d Enrich kinds with sharabilities, S or U loc::LU newlock() has type  `L::LS. lock_t A type is sharable only if every part is sharable Every type is unsharable Unsharable is the default void spawn sizeof_t ;{`L});

9 Oct 2002Dan Grossman, Cyclone36 Threads summary A type system where: –thread-shared data must have locks –thread-local data must not escape –locks are first-class and code is reusable Like regions except locks are reacquirable and thread-local is harder than lives-forever Did not discuss: thread-shared regions (must not deallocate until all threads are done with it)

9 Oct 2002Dan Grossman, Cyclone37 Threads shortcomings Global variables need top-level locks –otherwise, single-threaded code works unchanged Shared data enjoys an initialization phase Object migration Read-only data and reader/writer locks Semaphores, signals,... Deadlock (not a safety problem)

9 Oct 2002Dan Grossman, Cyclone38 The plan from here Not-null pointers Type-variable examples –parametric polymorphism (α, , , λ) –region-based memory management –multithreading Dataflow analysis Status Related work I will skip many very important features

9 Oct 2002Dan Grossman, Cyclone39 Example int*`r* f(int*`r q) { int **p = malloc(sizeof(int*)); // p not NULL, points to malloc site *p = q; // malloc site now initialized return p; } Harder than in Java because of pointers Analysis includes must-points-to information Interprocedural annotation: “initializes” a parameter

9 Oct 2002Dan Grossman, Cyclone40 Flow-analysis strategy Current uses: definite assignment, null checks, array- bounds checks, must return When invariants are too strong, program-point information is more useful Checked interprocedural annotations keep analysis local Two hard technical issues: –sound and explainable with respect to aliases –under-specified evaluation order

9 Oct 2002Dan Grossman, Cyclone41 Status Cyclone really exists (except for threads) –110KLOC, including bootstrapped compiler, web server, multimedia overlay network, … –gcc back-end (Linux, Cygwin, OSX, …) –user’s manual, mailing lists, … –still a research vehicle –more features: exceptions, tagged unions, varargs, … Publications (threads work submitted) –overview: USENIX 2002 –regions: PLDI 2002 –existentials: ESOP 2002

9 Oct 2002Dan Grossman, Cyclone42 Related work: higher and lower Adapted/extended ideas: –polymorphism [ML, Haskell, …] –regions [Tofte/Talpin, Walker et al., …] –lock types [Flanagan et al., Boyapati et al.] –safety via dataflow [Java, …] –existential types [Mitchell/Plotkin, …] –controlling data representation [Ada, Modula-3, …] Safe lower-level languages [TAL, PCC, …] –engineered for machine-generated code Vault: stronger properties via restricted aliasing

9 Oct 2002Dan Grossman, Cyclone43 Related work: making C safer Compile to make dynamic checks possible –Safe-C [Austin et al., …] –Purify, Stackguard, Electric Fence, … –CCured [Necula et al.] performance via whole-program analysis more array-bounds, less memory management inherently single-threaded RC [Gay/Aiken]: reference-counted regions, unsafe stack and heap LCLint [Evans]: unsound-by-design, but very useful SLAM: checks user-defined property w/o annotations; assumes no bounds errors

9 Oct 2002Dan Grossman, Cyclone44 Plenty left to do Beyond LIFO memory management Resource exhaustion (e.g., stack overflow) More annotations for aliasing properties More “compile-time arithmetic” (e.g., array initialization) Better error messages (not a beginner’s language)

9 Oct 2002Dan Grossman, Cyclone45 Summary Memory safety is essential for your favorite policy C isn’t safe, but the world’s software-systems infrastructure relies on it Cyclone combines advanced types, flow analysis, and run-time checks to create a safe, usable language with C-like data, resource management, and control best to write some code