Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Similar presentations


Presentation on theme: "Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin."— Presentation transcript:

1 Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin

2 Overview Introduction Challenges Background Recovering A-locs via Iteration An Abstraction for Heap-Allocated Storage Experiments

3 Introduction The Need of Analyzing Executables –What You See Is Not What You eXecute Many Obstacles in Analyzing Executables –Data Objects are Not Easily Identifiable. –Absence of Symbol Table & Debugging Information –Determining the Memory Addresses of Data Objects –Difficult to Track the Flow of Data through Memory –Challenging to get useful information about the heap e.g) memset(password, ‘\0’, len); free(password);

4 Challenges (1/3) Recovering Variable-like Entities –The layout of Memory is known at Compile time or Assembly time (IDAPro’ Approach) –To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined. void main() { int x, y; x = 1; y = 2; return; } proc main 1 mov ebp, esp 2 sub esp, 8 3 mov [ebp-8], 1 4 mov eax, ebp 5 mov [eax-4], 2 6 add esp, 8 7 retn

5 Challenges (2/3) Granularity of Recovered Variable-like Entities –Affects the complexity and accuracy of subsequent analyses The Structure of Heap-Allocated Objects –Only the Size of the Allocated Block is Known. –Using Abstract-Refinement Algorithm

6 Challenges (3/3) Resolving Virtual-Function Calls –A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update) one-variable-per-malloc-site abstraction

7 Background (1/6) Abstract Locations (A-locs) –Memory Region A Set of Disjoint Memory Areas Represents a Group of Locations that have Similar Runtime Properties –Abstract Locations Locations between two addresses/offsets in Memory-Region Address & Offsets are Statically Determined

8 Background (2/6) Abstract Locations (cont’d) proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn

9 Background (3/6) Value-Set Analysis (VSA) –Combined Numeric-Analysis & Pointer-Analysis –Over-Approximation of the values that each a-loc holds at each program point –Value-Set The Set of Addresses and Numeric Values N-tuple of strided intervals of the form s[l, u] (Global Region, Procedure Region, …) (1[0, 9], ∮ ) versus ( ∮, -8[-40, -8]) e.g) 8[-40, -8] = {-40, -32, -24, -16, -8} N : the number of memory-regions

10 Background (4/6) Value-Set Analysis (cont’d) –The Value-Set of eax at L1 ( ∮, 8[-40, -8]) eax holds the offsets {-40, -32, -24, -16, -8} Starting Addresses of Field x of p proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn Typedef struct { int x, y; } Point; int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y; }

11 Background (5/6) Aggregate Structure Identification (ASI) –Can Distinguish between Accesses to Different Parts of the Same Aggregate –Aggregate is broken up into smaller parts (atoms) –Data-Access Constraint Language (DAC) Specifying Data-Access Pattern in the Program DataRefReference to a set of sequences of bytes UnifyConstraintFlow of Data in the Program

12 Background (6/6) Aggregate Structure Identification (cont’d) –Data-Access Constraint Language (DAC) DataRef [l : u] refers to bytes l through u in DataRef DataRef n : n is the number of elements –ASI DAG e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11] return_main p[0:39] 5[0:3] ≈ const_1[0:3]; p[0:39] 5[4:7] ≈ const_2[0:3]; return_main[0:3] ≈ p[4:7]

13 Recovering A-locs via Iteration Problems of VSA –Can only Represent a Contiguous Sequence of Memory Locations –Cannot Detect Internal Substructure Basic Idea 1.VSA is used to obtain memory-access patterns in the executable; 2.ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA. IDAPro ASIVSA Final Value-Sets

14 Recovering A-locs via Iteration Generating Data-Access Constraints from Value if s[l,u] is a singleton then return else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return enf if e.g) s[l, l]Actual Byte Range The number of array elements Input : (r, s[l, u], length) Output : (ASI Ref, Boolean) (AR_main, 8[-40, -8], length) => {AR_main[(-40):(-1)] 5[0:7]} AR_main[-40:-33][0:7] AR_main[-32:-25][0:7] AR_main[-24:-17][0:7] AR_main[-16:-9][0:7] AR_main[-8:-1][0:7]

15 Recovering A-locs via Iteration Generating Data-Access Constraints from Value if (s 1 [l 1,u 1 ] or s 2 [l 2,u 2 ] is a singleton then return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) end if if s 1 ≥ (u 2 – l 2 + length) then baseSI ← s 1 [l 1, u 1 ] indexSI ← s 2 [l 2, u 2 ] else if s 2 ≥ (u 1 – l 1 + length) then baseSI ← s 2 [l 2, u 2 ] indexSI ← s 1 [l 1, u 1 ] else return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) end if ← SI2ASI(r, baseSI, stride(baseSI)) if exactRef is false then return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) else return concat(baseRef, SI2ASI(‘’, indexSI, length)) endif Determine base register Row-major order Base Addr Index Addr e.g) eax : (1[0:9], ∮ ) ecx : ( ∮, 16[-160, -16]) In case of [ecx+eax] => AR[-160:-1] 10[0:15] [0:9] 10[0:0]

16 Recovering A-locs via Iteration Interpreting Indirect Memory-References –Lookup Algorithm NodeDesc : NodeDescList : An Ordered List of NodeDesc Three Operations name :the name associated with the ASI tree node length : the length of above node e.g) [nd 1, nd 2, …, nd n ] NameOutput GetChildren(aloc)List of Child Nodes GetRange(start, end)List of Nodes with offsets in the given range [start, end] GetArrayElements(m)List of Nodes with m elements

17 Recovering A-locs via Iteration Lookup Algorithm Examples e.g) Lookup p[0:39] 5[0:3] GetChildren(p) = [,, ] GetRange(0, 39) = [,, ] GetArrayElements(5) = [, ], [, ] GetRange(0, 3) = [, ]

18 An Abstraction for Heap-Allocated Storage Previous Abstraction Recency Abstraction –Allowing VSA & ASI to recover Info. About virtual- function tables –Use Two Memory-Regions per allocation site s MRAB[s] : Most Recently Allocated Block NMRAB[s] : Non-Most Recently Allocated Block count : How many concrete blocks the memory-region represents (MRAB[s].count, NMRAB[s].count) –SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]} size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size) All of the nodes allocated at a given allocation site s are folded together into a single summary node n s.

19 An Abstraction for Heap-Allocated Storage Operation –AbsEnv[s] : MRAB[s]/NMRAB[s] → –AlocEnv = a-loc → ValueSet –Allocation site s transforms absEnv to absEnv’ absEnv’(MRAB[s]) = absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count + absEnv(MRAB[s]).count absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪ absEnv(MRAB[s]).size absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv ∪ absEnv(MRAB[s]).alocEnv

20 An Abstraction for Heap-Allocated Storage

21 Experiments Environments Software OSCompilerLanguageTarget Files WindowsVisual Studio 6.0C++.obj

22 Experiments Results of Virtual-Function Call Resolution

23 Experiments Results of A-loc Identification –Comparing the Results of Algorithm with Debugging Information The structure of 87% of the local variables is correct

24 Experiments Results of A-loc Identification The structure of 72% of the objects in the heap is correct

25 Q & A


Download ppt "Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin."

Similar presentations


Ads by Google