Efficient Field-Sensitive Pointer Analysis for C David J. Pearce, Paul H.J. Kelly and Chris Hankin Imperial College, London, UK
What is Pointer Analysis? Determine pointer targets without running program What is flow-insensitive pointer analysis? >One solution for all statements – so precision lost >This is a trade-off for efficiency over precision >This work considers flow-insensitive pointer analysis only int a,b,*p,*q = NULL; p = &a; if(…) q = p; // p {a,b}, q {a,NULL} p = &b;
Pointer analysis via set-constraints Generate set-constraints from program and solve them >Use constraint graph for efficient solving int a,b,c,*p,*q,*r; p = &a; r = &b; q = &c; if(...) q = p; else q = r; (program)
Pointer analysis via set-constraints int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r (program)(constraints) Generate set-constraints from program and solve them >Use constraint graph for efficient solving
Pointer analysis via set-constraints int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r pqr {a}{b} (program)(constraints)(constraint graph) {c} Generate set-constraints from program and solve them >Use constraint graph for efficient solving
Pointer analysis via set-constraints int a,b,c,*p,*q,*r; p = &a; // p { a } r = &b; // r { b } q = &c; // q { c } if(...) q = p; // q p else q = r; // q r pqr {a}{b} (program)(constraints)(constraint graph) {a,b,c} Generate set-constraints from program and solve them >Use constraint graph for efficient solving
Field-Sensitivity How to deal with aggregate types ? >Standard approach treats them as single variables typedef struct { int *f1; int *f2; } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x p x.f2 = q; // x q r = x.f1; // r x pxq {a} {b} {} r
Field-Sensitivity How to deal with aggregate types ? >Standard approach treats them as single variables typedef struct { int *f1; int *f2; } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x p x.f2 = q; // x q r = x.f1; // r x pxq {a} {b} {a,b}{a,b} r {a,b}{a,b}
Field-Sensitivity – A simple solution Use a separate node per field for each aggregate >Node “x” split in two typedef struct { int *f1; int *f2 } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x f1 p x.f2 = q; // x f2 q r = x.f1; // r x f1 px f2 q {a} {b} {} r x f1 {}
Field-Sensitivity – A simple solution Use a separate node per field for each aggregate >Node “x” split in two typedef struct { int *f1; int *f2 } t1; int a,b,*p,*q,*r; t1 x; p = &a; // p { a } q = &b; // q { b } x.f1 = p; // x f1 p x.f2 = q; // x f2 q r = x.f1; // r x f1 px f2 q {a} {b} {a}{a} r {a}{a} x f1 {b}{b}
Problem – can take address of field in C System thus far has no mechanism for this First idea – use string concatenation operator || >Works well for this example typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p ? x f2 {..} x f1 {..}
Problem – can take address of field in C System thus far has no mechanism for this First idea – use string concatenation operator || >Works well for this example typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p (*s) || f2 x f2 {..} x f1 {..}
Problem – can take address of field in C System thus far has no mechanism for this First idea – use string concatenation operator || >Works well for this example typedef struct { int *f1; int *f2; } t1; int **p; t1 x,*s; s = &x; // s { x } p = &(s->f2); // p (*s) || f2 p { x } || f2 p { x f2 } x f2 {..} x f1 {..}
Problem – compatible types First idea – use string concatenation operator || >Casting identical types except for field names >Derivation same as before - but,node x f2 no longer exists! typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x } p = &(s->f2); // p (*s) || f2 x f4 {..} x f3 {..}
Problem – compatible types First idea – use string concatenation operator || >Casting identical types except for field names >Derivation same as before - but,node x f2 no longer exists! typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x } p = &(s->f2); // p (*s) || f2 p { x } || f2 p { x f2 } x f4 {..} x f3 {..}
Field-Sensitivity – Our Solution typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x f3 } p = &(s->f2); // p s + 1 Our solution – map variables to integers >Solution sets become integer sets >Use integer addition to model taking address of field >Address of aggregate modelled by address of its first field psx f3 x f4 0123
Field-Sensitivity – Our Solution typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x f3 } s { 2 } p = &(s->f2); // p s + 1 Our solution – map variables to integers >Solution sets become integer sets >Use integer addition to model taking address of field >Address of aggregate modelled by address of its first field psx f3 x f4 0123
Field-Sensitivity – Our Solution typedef struct { int *f1; int *f2; } t1; typedef struct { int *f3; int *f4; } t2; int **p; t1 *s; t2 x; s = (t1*) &x; // s { x f3 } s { 2 } p = &(s->f2); // p s + 1 p { 2 } + 1 p { 3 } Our solution – map variables to integers >Solution sets become integer sets >Use integer addition to model taking address of field >Address of aggregate modelled by address of its first field psx f3 x f4 0123
Experimental Study Time (s)Avg Deref Size bash (55324 LOC) Field-insensitive Field-sensitive emacs (93151 LOC) Field-insensitive Field-sensitive sendmail (49053 LOC) Field-insensitive Field-sensitive Named (75599 LOC) Field-insensitive Field-sensitive ghostscript ( LOC) Field-insensitive Field-sensitive
Conclusion Field-sensitive Pointer Analysis >Presented new technique for C language >Elegantly copes with language features -Taking address of field -Compatible types and casting -Technique also handles function pointers without modification >Experimental evaluation over 7 common C programs -Considerable improvements in precision obtained -But, much higher solving times -And, relative gains appear to diminish with larger benchmarks
Constraint Graphs (continued) What about statements involving a pointer dereference? >Cannot be represented in the constraint graph >Instead, add edges as solution of q becomes known >Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s pq {r} sr {} {a} (program)(constraints)(constraint graph) {}
Constraint Graphs (continued) What about statements involving a pointer dereference? >Cannot be represented in the constraint graph >Instead, add edges as solution of q becomes known >Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s pq {r} sr {} {a} (program)(constraints)(constraint graph) {r}{r}
Constraint Graphs (continued) What about statements involving a pointer dereference? >Cannot be represented in the constraint graph >Instead, add edges as solution of q becomes known >Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s pq {r} sr {} {a} (program)(constraints)(constraint graph) {r}{r}
Constraint Graphs (continued) What about statements involving a pointer dereference? >Cannot be represented in the constraint graph >Instead, add edges as solution of q becomes known >Thus, computation similar to dynamic transitive closure int a,*r,*s,**p,**q; p = &r; // p { r } s = &a; // s { a } q = p; // q p *q = s; // *q s r s pq {r} sr {a}{a} {a} (program)(constraints)(constraint graph) {r}{r}