Shape Analysis for Low-level Code Hongseok Yang (Seoul National University) (Joint work with Cristiano Calcagno, Dino Distefano and Peter O’Hearn)
Dream Automatically verify the memory safety of systems code, such as device derivers and memory managers. Challenges: Pointer arithmetic. Scalability. Concurrency.
Proved memory safety and even partial correctness. Our Analyzer Handles programs for dynamic memory management. Experimental results (Pentium 3.2GHz,4GB) Found a hidden assumption of the K&R memory manager. These are “fixed” versions. Proved memory safety and even partial correctness.
Sample Analysis Result Program: ans = malloc_bestfit_acyclic(n); Precondition: n¸2 Æ mls(freep,0) Postcondition: (ans=0 Æ n¸2 Æ mls(freep,0)) Ç (n¸2 Æ nd(ans,q’,n) * mls(freep,0)) Ç (n¸2 Æ nd(ans,q’,n) * mls(freep,q’) * mls(q’,0))
Hidden Assumption in K&R Malloc/Free Heap Global Vars Stack 220
Hidden Assumption in K&R Malloc/Free Heap Global Vars Stack 220
Hidden Assumption in K&R Malloc/Free Heap Global Vars Stack 220
Hidden Assumption in K&R Malloc/Free Heap Global Vars Stack 220
Hidden Assumption in K&R Malloc/Free Heap Stack Global Vars 220
Multiword Lists 15 3 18 3 24 5 nil 2 15 lp 18 24 Link Field Size Field
Coalescing 15 3 18 3 24 5 nil 2 5 15 18 24 p p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 18 3 24 5 nil 2 5 15 18 24 p
Coalescing 15 3 18 3 24 5 nil 2 5 15 18 24 p p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 18 3 24 5 nil 2 5 15 18 24 p
Coalescing 15 3 18 3 24 5 nil 2 5 15 18 24 p q p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 18 3 24 5 nil 2 5 15 18 24 p q
Coalescing 15 3 18 3 24 5 nil 2 5 15 18 24 p q p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 18 3 24 5 nil 2 5 15 18 24 p q
Coalescing 15 3 18 8 24 5 nil 2 5 15 18 24 p q p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 18 8 24 5 nil 2 5 15 18 24 p q
Coalescing 15 3 24 8 24 5 nil 2 5 15 18 24 p q p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 24 8 24 5 nil 2 5 15 18 24 p q
Coalescing 15 3 24 8 nil 2 5 15 24 p p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } 15 3 24 8 nil 2 5 15 24 p
Coalescing Nodeful High-level View Nodeless Low-level View p = lp; while (p!=0) { local q = *p; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = q; } } Nodeful High-level View Nodeless Low-level View Nodeful High-level View Complex numerical relationships are used only for reconstructing a high-level view. 15 3 24 8 nil 2 5 15 24 p=0
Separation Logic blk(p+2,p+5) nd(p,q,5) =def (pq) * (p+15) * blk(p+2,p+5) mls(p,q) p+2 p+5 p p+5 q 5 p q 3 4 2
Symbolic Heaps 9x’,y’. (P1 Æ P2 Æ … Æ Pn) Æ (H1 * H2 * … * Hm) where P ::= E=F | E·F | E!=F | … H ::= EF | blk(E,F) | mls(E,F) | nd(E,F,G) |…
y=x+z Æ x y*x+1 z*blk(x+2,0)*mls(y,0) Abstract Domain nd(x,y,z) * mls(y,0) P(CanSymH)>,µ {Q1, Q2, … ,Qn} P(Emb) P(Abs) Pfin(SymH)>,µ {T1,T2,…,Tn} y=x+z Æ x y*x+1 z*blk(x+2,0)*mls(y,0)
Our Analysis Nodeless View: Pfin(SymH)> Nodeful View: P(CanSymH)> while(B) { C; } {Q1, Q2, … ,Qn} Emb; Rearrangement {T1,T2,…,Tn} Sym. Execution Abstraction { T’1,T’2,…,T’m} {Q’1, Q’2, … ,Q’m}
Our Analysis Nodeless View: Pfin(SymH)> Nodeful View: P(CanSymH)> while(B) { C; } {Q1, Q2, … ,Qn} {T1,T2,…,Tn} { T’1,T’2,…,T’m} {Q’1, Q’2, … ,Q’m}
Analysis «C¬ : Pfin(SymH)> ! Pfin(SymH)> «A¬d = P(SymExec(A) o Rearrange(A))d «while b C¬d = FixComp(P(Abs) o F) where F : P(CanSymHeaps) ! P(CanSymHeaps) F(d’) = P(Abs)(d [ «C¬d’)
Analysis «C¬ : Pfin(SymH)> ! Pfin(SymH)> «A¬d = (P(SymExec(A)) o lift(Rearrange(A)))d «while b C¬d = FixComp(P(Abs) o F) where F : P(CanSymHeaps) ! P(CanSymHeaps) F(d’) = P(Abs)(d [ «C¬d’) SymExec(A) : Proof Rules in Sep. Log. Rearrange(A) : Unrolling of mls and nd
Widened Differential Fixpoint Algorithm Analysis Widened Differential Fixpoint Algorithm «C¬ : Pfin(SymH)> ! Pfin(SymH)> «A¬d = (P(SymExec(A)) o lift(Rearrange(A)))d «while b C¬d = FixComp(F) where F : P(CanSymH)> ! P(CanSymH)> F(d’) = P(Abs)(d [ («C¬o P(Emb))d’) Abs : SymH ! CanSymH Information Loss Emb: CanSymH !SymH
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (5 · x+x Æ p+3=z’) Æ (p q’ * p+1 3 * blk(p+2,z’) * mls(q’,0))
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0))
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0)) (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0) * r 4)
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0)) (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0) * true)
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (5 · x+x Æ p+3=z’) Æ (nd(p,q’,3) * mls(q’,0))
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (nd(p,q’,3) * mls(q’,0))
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. (nd(p,q’,3) * mls(q’,0))
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. mls(p,0)
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. Precondition: true … (xx’,s) * blk(x+2,x+s) Ã … nd(x,x’,s) x x+s x+2 x+s x x’ s x’ s
Abstraction Function Abs Abs : SymH ! CanSymH Package all nodes. Drop numerical relationships. Combine two connected multiword lists. Precondition: s = s’+i … (xx’,s) * blk(x+2,x+i) * nd(x+i,y’,s’) Ã … nd(x,x’,s) x x+2 x+i x+i+s’ x x+s x’ s y’ s’ x’ s
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0)
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=q’Æmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*q’r’,t’*blk(q’+2,q’+t’)*mls(r’,0)
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=q’Æmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*nd(q’,r’,t’) *mls(r’,0)
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=q’Æmls(lp,p)*nd(p,r’,s’+t’)* *mls(r’,0)
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) mls(lp,p)*nd(p,r’,s’+t’)* *mls(r’,0)
p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) Coalescing … mls(lp,p) * mls(p,0) while (p!=0){local q=p*; if (p + *(p+1) == q) { *(p+1) = *(p+1) + *(q+1); *p = *q; } else { p = *p; } p!=0 Æ mls(lp,p) * p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0 Æ p+s’=q Æ mls(lp,p)*p q,s’ * blk(p+2,p+s’) * mls(q,0) p!=0Æp+s’=qÆmls(lp,p)* pq,s’+t’ * blk(p+2,p+s’) *qr’,t’*blk(q+2,q+t’)*mls(r’,0) p!=0Æp+s’=qÆmls(lp,p)*pr’,s’+t’*blk(p+2,p+s’)*qr’,t’*blk(q+2,q+t’)*mls(r’,0) mls(lp,p)*mls(p,0)
Theorem Prover for “Q1 ` Q2” without prover with prover malloc_K&R about 20 hours 502.23 secs free_K&R 23.844 secs 9.69 secs
Put Prover inside Hoare Powerdomain? P(CanSymH), µ vs. PH(CanSymH), v Q1 ` Q2, Q3 ` Q4 {Q1, Q2, Q3, Q4} x0 = {} x1 = F(x0) = {Q1, Q2, Q4} x2 = F(x1) = {Q1, Q2, Q3, Q4} {Q2, Q3} v But, works only when ` is transitive.
Put Prover inside Hoare Powerdomain? P(CanSymH), µ vs. PH(CanSymH), v Q1 ` Q2, Q2 ` Q3, Q3 ` Q1 x0 = {} x1 = F(x0) = {Q1, Q2} x2 = F(x1) = {Q2, Q3} x3 = F(x2) = {Q3, Q1} x4 = F(x3) = {Q1, Q2} But, works only when ` is transitive.
Put Prover inside Widening! r : P(CanSymH) £ P(CanSymH) ! P(CanSymH) x0r x1 =def x0 [ { Q 2 x1 | 8Q’ 2 x0. Q ` Q’ } x0 = {} x1 = x0 r F(x0) x2 = x1 r F(x1) xn+1 = xn r F(xn) … x0 µ x1 µ x2 µ x3 …
Nonstandard Fixpoint Algorithm: NOT y µ (x r y). Add Differencing F : P(CanSymH) ! P(CanSymH) x0 = {} x1 = x0rF({}) = {Q1} x2 = x1rF({Q1}) = {Q1,Q2} x3 = x2rF({Q1,Q2}) = {Q1,Q2,Q3} x4 = x3rF({Q1,Q2,Q3}) = {Q1,Q2,Q3} Nonstandard Fixpoint Algorithm: NOT y µ (x r y). NOT F(wdfix F) µ wdfix F. NOT (F(wdfix F)) µ (wdfix F) Mention Cai, Eo and Yi, Ahn and Kwon. Mention ASTREE. Cousot&Cousot 92 and 79. xn+1 = xnrF(yn), yn+1 = xn+1-xn
Analysis results can be compiled into separation-logic proofs. Soundness Analysis results can be compiled into separation-logic proofs.
Widened Differential Fixpoint Algo. «while (*) C¬d0 = ?? x0 = d0 x1 = x0r F(x0) y1 = x1 – x0 x2 = x1r F(y1) y2 = x2 – x1 x3 = x2r F(y2) = x2 (x3) µ (d0) [ (y1) [ (y2) (x3) (d0) [ (F(d0)) [ (F(y1)) [ (F(y2)) x3 = d0r F(d0) r F(y1) r F(y2)
Widened Differential Fixpoint Algo. Consequence: (x3) (d0) [ (F(d0)) [ (F(y1)) [ (F(y2)) {d0} C {F(d0)} {y1} C {F(y1)} {y2} C {F(y2)} {d0} C {x3} {y1} C {x3} {y2} C {x3} {d0 Ç y1 Ç y2} C {x3} {x3} C {x3} {x3} while (*) C {x3} {d0} while (*) C {x3} Designing the rewriting rules for abstraction is not trivial, and needs insights for the programs in mind. But price to pay. Can get some help from manual counter-example driven abstraction refinements. 2. Abstract Interpretation viewed Proof Search. Consequence: (x3) µ (d0) [ (y1) [ (y2) Disjunction Rule