Detecting Type-Based Alias Analysis Violations in C Iain Ireland (University of Alberta) Jose Nelson Amaral (University of Alberta) Raul Silvera (IBM Canada) Shimin Cui (IBM Canada)
Contents Background Analysis Future Work
Use programming language types to rule out potential aliases: Type-Based Alias Analysis int * double * ?
History Amer Diwan, Kathryn S. McKinley, and J. Eliot B. Moss 'Type-based alias analysis' August G. Reinig 'Alias Analysis in the DEC C and Digital C++ Compilers' Rakesh Ghiya, Daniel Lavery, and David Sehr 'On the importance of points-to analysis and other memory disambiguation methods for C programs'
Status Quo General adoption: Compiler support: gcc XL C/C++ Clang (as of April 2011) C standard So what's the problem?
The C Standard
The Problem Lots of code violates the standard TBAA on non-compliant programs is unsafe "Solution": turn off TBAA
The Problem Can we do better?
The Problem Problem: Identify points in a program where memory objects may be accessed in a way which violates the restrictions in the C standard.
6.5: [...] 7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: — a type compatible with the effective type of the object, — a qualified version of a type compatible with the effective type of the object, — a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or — a character type. The C Standard: Valid Accesses Case 2: There is structural aliasing. Case 3: The access is through a character type. Case 1: The types are the same, ignoring qualifiers and signedness. (The type of the lvalue, and the effective type of the accessed object)
6.5: [...] 6. The effective type of an object for an access to its stored value is the declared type of the object, if any. (Allocated objects have no declared type.) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access. The C Standard: Effective Types Case 1: The object has a declared type. The effective type is the same. Case 2: The object has no declared type. Use the type of the last store. Case 3: memcpy, memmove, and char arrays: type of the copied object Case 4: Otherwise, type of the lvalue used for the access
High Level Reasoning Everything starts safe. If every statement preserves safety, then everything stays safe. Approach: find the statements that do not preserve safety.
Analysis implementation in XL C compiler pass over the intermediate representation on-demand flow sensitivity requires points-to analysis can't use TBAA
Example int *ip = […]; double *dp = […]; ip = (int *) dp; for (…) *dp += (double) *ip;
Example int i, *ip; double d, *dp; void *vp; ip = &i; vp = ip; dp = &d; for (…) ip = (int *) vp; *ip += 1; vp = dp; dp = (double *) vp; *dp += 1.0; Prev {}{double}{int}{int,double}
Future Work Interprocedural analysis Automatic adjustment
Conclusion TBAA is not safe on arbitrary C code We can detect points where it becomes unsafe Long run: a safer TBAA
Thanks.