Download presentation
Presentation is loading. Please wait.
1
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1 George Necula Scott McPeakWes Weimer Presented by Anastasia Braginsky Some slides were taken from George Necula presentation: http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275
2
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary C is popular; it is part of the infrastructure C is also unsafe and has a weak type system that can cause subtle bugs 2
3
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Add type safety to C – Make C “ feel ” as safe as Java Catch memory safety errors, by static analysis as much as possible Add run-time checks to C programs, as less as possible (performance) Minimal user effort Add type inference to C 3
4
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 4 C Program CCured Translator CCured Translator Instrumented C Program Compile & Execute Compile & Execute Halt: Memory Safety Violation Success
5
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Usually in C a large part of the program can be verified statically to be type safe The remaining part can be instrumented with run-time checks to ensure that the execution is memory safe In many applications, some loss of performance due to run-time checks is an acceptable price for the type safety 5
6
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Boxed integer 31 bit 1 bit Un-boxing C type int* is used to represent boxed integer 6 integer or pointer tag 0011…11101001 0 0101…10101110 0 0001…11000101 1
7
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1 int * * a; //array 2 int i; // index 3 int acc; // accumulator 4 int * * p; // element ptr 5 int * e; // unboxer 6 acc = 0; 7 for (i=0; i<100; i++) { 8 p = a + i; // ptr arithmetic 9 e = *p; // read element 10 while ( (int)e%2 == 0 ) { // check tag 11 e = * (int * * ) e; // unbox 12 } 13 acc += ((int)e >> 1); // strip tag 14 } 7 0011…11101001 0 0101…10101110 1 0001…11000101 1 0101…10101001 0 1101…10110110 1 a a p p e e 0101…10101110 1 SAFE SEQuence DYNamic
8
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1 int * * a; //array 2 int i; // index 3 int acc; // accumulator 4 int * * p; // element ptr 5 int * e; // unboxer 6 acc = 0; 7 for (i=0; i<100; i++) { 8 p = a + i; // ptr arithmetic 9 e = *p; // read element 10 while ( (int)e%2 == 0 ) { // check tag 11 e = * (int * * ) e; // unbox 12 } 13 acc += ((int)e >> 1); // strip tag 14 } 8 0011…11101001 0 0101…10101110 1 0001…11000101 1 0101…10101001 0 1101…10110110 1 a a p p e e 0101…10101110 1 SAFE SEQuence DYNamic But due to aliases all are considered to point to dynamic!
9
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 9 SAFE pointer to type ptr On use: - null check Can do: - dereference
10
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 10 SEQ pointer to type baseptr On use: - null check - bounds check Can do: - dereference - pointer arithmetic end
11
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 11 DYN int homeptr DYN pointer len tags On use: - null check - bounds check - tag check/update Can do: - dereference - pointer arithmetic - arbitrary typecasts 110
12
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary To simplify the presentation, it is described formally for a small language: CCured Then it is described informally how to extend the approach to handle the remaining C constructs 12
13
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Types: τ ::= int | τ ref SAFE | τ ref SEQ | DYNAMIC Expressions: e ::= x | e 1 op e 2 | ( τ ) e | e 1 ⊕ e 2 | ! e Commands: c ::= skip | c 1 ; c 2 | e 1 := e 2 13 Only integers or pointers ML syntax of references Doesn’t carry the type of the pointed value Integer literals Assortment of binary integer operations Casting Pointers arithmetic Like *e in C Memory update through a pointer, like *e 1 = e 2 in C
14
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1 int * 1 * 2 a; //array 2 int i; // index 3 int acc; // accumulator 4 int * 3 * 4 p; // element ptr 5 int * 5 e; // unboxer 6 acc = 0; 7 for (i=0; i<100; i++) { 8 p = a + i; // ptr arithmetic 9 e = *p; // read element 10 while ( (int)e%2 == 0 ) { // check tag 11 e = * (int * 6 * 7 ) e; // unbox 12 } 13 acc += ((int)e >> 1); // strip tag 14 } 14 1DYNAMIC ref SEQ a; // array 2int ref SAFE p_i; // index 3int ref SAFE p_acc; // accumulator 4DYNAMIC ref SAFE ref SAFE p_p; // element ptr 5DYNAMIC ref SAFE p_e; // unboxer 6p_acc := 0; 7for ( p_i := 0 ; !p_i<100 ; p_i := !p_i + 1 ) { 8 p_p := (DYNAMIC ref SAFE) (a ⊕ !p_i); // ptr arith 9 p_e := !!p_p; // read element 10 while ( (int) !p_e % 2 == 0 ) { // check tag 11p_e := !! p_e; // unbox 12 } 13 p_acc := !p_acc + ((int)!p_e >> 1); // strip tag 14} Sequence pointer to DYN Safe pointer to DYN Dynamic
15
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1 int * 1 * 2 a; //array 2 int i; // index 3 int acc; // accumulator 4 int * 3 * 4 p; // element ptr 5 int * 5 e; // unboxer 6 acc = 0; 7 for (i=0; i<100; i++) { 8 p = a + i; // ptr arithmetic 9 e = *p; // read element 10 while ( (int)e%2 == 0 ) { // check tag 11 e = * (int * 6 * 7 ) e; // unbox 12 } 13 acc += ((int)e >> 1); // strip tag 14 } 15 1DYNAMIC ref SEQ a; // array 2int ref SAFE p_i; // index 3int ref SAFE p_acc; // accumulator 4DYNAMIC ref SAFE ref SAFE p_p; // element ptr 5DYNAMIC ref SAFE p_e; // unboxer 6p_acc := 0; 7for ( p_i := 0 ; !p_i < 100 ; p_i := !p_i + 1 ) { 8 p_p := (DYNAMIC ref SAFE) (a ⊕ !p_i); // ptr arith 9 p_e := !!p_p; // read element 10 while ( (int) !p_e % 2 == 0 ) { // check tag 11p_e := !! p_e; // unbox 12 } 13 p_acc := !p_acc + ((int)!p_e >> 1); // strip tag 14}
16
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary The purpose is to maintain the separation between the statically typed and the un-typed words For presented type system assume that the program contains complete pointer kind information Type environment is provided with the types for every variable name It needs to give types, using derivation rules, to expressions and commands 16
17
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary “ a ≤ b ” – it is possible to convert type a to type b τ ≤ τ reflexivity τ ≤ int reading addresses int ≤ τ ref SEQ pointers arithmetic int ≤ DYN dereferences are prevented by run-time checks; the pointer has lost its capability to perform memory operations τ ref SEQ ≤ τ ref SAFE reference types can’t change; bounds are checked by run-time checks 17
18
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary “ x : τ ” – expression x is from type τ ( τ ref SAFE) 0 : τ ref SAFE creating safe null pointer IF e : τ ref SAFE THAN ! e : τ memory operations only for IF e : DYN THAN ! e : DYN safe and dynamic pointers IF ( e : τ ’ AND τ ’ ≤ τ ) THAN ( τ ) e : τ casting rules IF ( e 1 : int AND e 2 : int ) THAN e 1 op e 2 : int binary integer operations IF ( e 1 : τ ref SEQ AND e 2 : int ) THAN e 1 ⊕ e 2 : τ ref SEQ IF ( e 1 : DYN AND e 2 : int ) THAN e 1 ⊕ e 2 : DYN pointer arithmetic only for sequence and dynamic pointers 18
19
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary IF ( e 1 : τ ref SAFE AND e 2 : τ ) THAN e 1 := e 2 IF ( e 1 : DYN AND e 2 : DYN ) THAN e 1 := e 2 19
20
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary H is a set of memory allocated areas (which are called homes ) A home is represented by its starting address and its size All homes are disjoint A special null-home: 0 H size(0)=1 Safe pointers and integers have no representation overhead over C Sequence and dynamic pointers carry with them their home 20 Home starting at h 1 Home starting at h 1 Home - h 2
21
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Any integer with value n, can be casted to sequence or dynamic pointer with value n with null-home No further memory operations Any sequence or dynamic pointers with value n and with home with starting address h, can be cast to integer with value n+h Any dynamic pointer can be cast to different dynamic pointer with same value and home No dynamic ↔ sequence since it is not allowed by type system Any sequence pointer with value n and with home with starting address h, can be cast to safe pointer with value n+h. Only if 0 ≤ n < size(home) run-time check 21
22
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary A null-pointer check for memory operation that uses safe pointer Memory access boundaries Non-pointer check (null-home) for sequence and dynamic pointers Programs that cast pointers to integers and then back to pointers will not be able to use the resulting pointers as memory addresses 22
23
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Can fail Due to failed run-time check Can not fail Due to unexpected types Due to trying to access an invalid memory location 23
24
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary IF e : τ (for valid type τ ) AND The contents of each memory address corresponds to the typing constraints of the home to which it belongs THEN EITHER One of the run-time checks fails during the evaluation of the expression e OR ELSE e evaluates to value v AND v is the valid value of type τ 24
25
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary For any command c which is built from valid types IF The contents of each memory address corresponds to the typing constraints of the home to which it belongs THEN EITHER The command execution fails due to run-time checks OR ELSE The commands succeeds and still the contents of each memory address corresponds to the typing constraints of the home to which it belongs 25
26
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Given a C program, translate the pointer types to make the program well-typed in the CCured type system The C program already uses types of the form “ τ ref ”. It is needed to discover whether it should be safe, sequence or dynamic. τ ref q where q is a qualifier ranging over the set {SAFE, SEQ, DYN} The overall strategy is to find as many SAFE and SEQ pointers as possible 26
27
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 1. Introduce a qualifier variable for each syntactic occurrence of the pointer type constructor in the C program 2. Scan the program and collect a set of constrains C on these qualifier variables 3. Solve the system of constrains to produce a substitution S of qualifier variables with qualifier values S ( int ) = int S ( τ ref q )= DYNAMIC if S ( q )=DYN S ( τ ) ref S ( q )otherwise 4. Apply the substitution to the types of C program to produce a CCured program 27
28
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Convertibility int ≤ τ ref q { q ≠ SAFE} C τ 1 ref q 1 ≤ τ 2 ref q 2 { q 1 ← q 2 } { q 1 =q 2 =DYN OR τ 1 = τ 2 =int } C q 1 ← q 2 = SEQ can be cast to SAFE ( q 1 is SEQ and q 2 is SAFE) or qualifiers are equal Expressions and commands If e 1 : τ ref q and e 2 : int than e 1 ⊕ e 2 : τ ref q { q ≠ SAFE} C (pointer arithmetic) 28
29
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary Additional rules to bridge the gap between C and CCured Allow memory access through SEQ (not just SAFE) pointers Allow int s to be read or written through DYNAMIC pointers In both cases implicit cast, no run-time checks In a memory write allow a conversion of the value being written to the type of the referenced type For each type of the form τ ref q’ ref q collect a constraint q =DYN => q’ =DYN 29
30
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary ARITH: q ≠ SAFE CONV: q ← q ’ POINTSTO: q = DYN => q ’ = DYN ISDYN: q = DYN EQ: q = q ’ 30 Constraint Solving 1. Propagate the ISDYN constrains using the constraints EQ, CONV, and POINTSTO. 2. All qualifier variables involved in ARITH constrains are set to SEQ and this information is propagated using the constraints EQ and CONV 3. Make all the other variables SAFE The whole type inference process is linear in the size of the program!
31
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary In the DYNAMIC world, structures and arrays are simply alternative notations for saying how bytes of storage to allocate Explicit de-allocation is ignored (Garbage Collecor is used) The address-of operator in C can yield a pointer to a stack-allocated variable – additional run-time check that stack pointer is not copied to a heap or globals DYNAMIC function pointers and variable-argument functions are handled by passing a hidden argument which specifies the types of all arguments passed (checked by callee) … 31
32
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary There are still a few cases in which legal program will stop with a failed run-time check – some manual invention is still necessary Pointer to integer then back to pointer make it all void* Some programs attempt to store stack variables into a memory allocate on the heap Calling functions in libraries that were not compiled with CCured write wrapper function 32
33
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 33 LOC%Safe%Seq%Dyn CCured Ratio Purify Ratio compress1590871201.2528 go2931596402.0151 ijpeg31371361622.1530 li776193601.8650 bh2053801801.5394 bisort707901001.0342 em3d557851502.447 ks97392801.4731 health72593700.9425
34
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary ks passes FILE* to printf, not char* compress, ijpeg: array bound violations go: 8 array bound violations go: 1 uninit variable as array index Many involve multi-dimensional arrays Purify only found go uninit bug ftpd buffer overrun bug 34
35
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary C is a popular and useful program language, but need to have type safety Even in C programs most pointers can be verified to be type safe, rest can be checked in run-time This work provide us ability to infer simple and accurately which pointers need to be checked in run-time Since majority of the pointers are safe, the overheads are smaller then those of comparable tools The presented type system is formally defined and proved 35
36
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments Summary 36 Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.