Download presentation
Presentation is loading. Please wait.
Published byBrett Houston Modified over 9 years ago
1
Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin
2
Overview Introduction Challenges Background Recovering A-locs via Iteration An Abstraction for Heap-Allocated Storage Experiments
3
Introduction The Need of Analyzing Executables –What You See Is Not What You eXecute Many Obstacles in Analyzing Executables –Data Objects are Not Easily Identifiable. –Absence of Symbol Table & Debugging Information –Determining the Memory Addresses of Data Objects –Difficult to Track the Flow of Data through Memory –Challenging to get useful information about the heap e.g) memset(password, ‘\0’, len); free(password);
4
Challenges (1/3) Recovering Variable-like Entities –The layout of Memory is known at Compile time or Assembly time (IDAPro’ Approach) –To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined. void main() { int x, y; x = 1; y = 2; return; } proc main 1 mov ebp, esp 2 sub esp, 8 3 mov [ebp-8], 1 4 mov eax, ebp 5 mov [eax-4], 2 6 add esp, 8 7 retn
5
Challenges (2/3) Granularity of Recovered Variable-like Entities –Affects the complexity and accuracy of subsequent analyses The Structure of Heap-Allocated Objects –Only the Size of the Allocated Block is Known. –Using Abstract-Refinement Algorithm
6
Challenges (3/3) Resolving Virtual-Function Calls –A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update) one-variable-per-malloc-site abstraction
7
Background (1/6) Abstract Locations (A-locs) –Memory Region A Set of Disjoint Memory Areas Represents a Group of Locations that have Similar Runtime Properties –Abstract Locations Locations between two addresses/offsets in Memory-Region Address & Offsets are Statically Determined
8
Background (2/6) Abstract Locations (cont’d) proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn
9
Background (3/6) Value-Set Analysis (VSA) –Combined Numeric-Analysis & Pointer-Analysis –Over-Approximation of the values that each a-loc holds at each program point –Value-Set The Set of Addresses and Numeric Values N-tuple of strided intervals of the form s[l, u] (Global Region, Procedure Region, …) (1[0, 9], ∮ ) versus ( ∮, -8[-40, -8]) e.g) 8[-40, -8] = {-40, -32, -24, -16, -8} N : the number of memory-regions
10
Background (4/6) Value-Set Analysis (cont’d) –The Value-Set of eax at L1 ( ∮, 8[-40, -8]) eax holds the offsets {-40, -32, -24, -16, -8} Starting Addresses of Field x of p proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn Typedef struct { int x, y; } Point; int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y; }
11
Background (5/6) Aggregate Structure Identification (ASI) –Can Distinguish between Accesses to Different Parts of the Same Aggregate –Aggregate is broken up into smaller parts (atoms) –Data-Access Constraint Language (DAC) Specifying Data-Access Pattern in the Program DataRefReference to a set of sequences of bytes UnifyConstraintFlow of Data in the Program
12
Background (6/6) Aggregate Structure Identification (cont’d) –Data-Access Constraint Language (DAC) DataRef [l : u] refers to bytes l through u in DataRef DataRef n : n is the number of elements –ASI DAG e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11] return_main p[0:39] 5[0:3] ≈ const_1[0:3]; p[0:39] 5[4:7] ≈ const_2[0:3]; return_main[0:3] ≈ p[4:7]
13
Recovering A-locs via Iteration Problems of VSA –Can only Represent a Contiguous Sequence of Memory Locations –Cannot Detect Internal Substructure Basic Idea 1.VSA is used to obtain memory-access patterns in the executable; 2.ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA. IDAPro ASIVSA Final Value-Sets
14
Recovering A-locs via Iteration Generating Data-Access Constraints from Value if s[l,u] is a singleton then return else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return enf if e.g) s[l, l]Actual Byte Range The number of array elements Input : (r, s[l, u], length) Output : (ASI Ref, Boolean) (AR_main, 8[-40, -8], length) => {AR_main[(-40):(-1)] 5[0:7]} AR_main[-40:-33][0:7] AR_main[-32:-25][0:7] AR_main[-24:-17][0:7] AR_main[-16:-9][0:7] AR_main[-8:-1][0:7]
15
Recovering A-locs via Iteration Generating Data-Access Constraints from Value if (s 1 [l 1,u 1 ] or s 2 [l 2,u 2 ] is a singleton then return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) end if if s 1 ≥ (u 2 – l 2 + length) then baseSI ← s 1 [l 1, u 1 ] indexSI ← s 2 [l 2, u 2 ] else if s 2 ≥ (u 1 – l 1 + length) then baseSI ← s 2 [l 2, u 2 ] indexSI ← s 1 [l 1, u 1 ] else return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) end if ← SI2ASI(r, baseSI, stride(baseSI)) if exactRef is false then return SI2ASI(r, s 1 [l 1, u 1 ] ⊕ s 2 [l 2, u 2 ], length) else return concat(baseRef, SI2ASI(‘’, indexSI, length)) endif Determine base register Row-major order Base Addr Index Addr e.g) eax : (1[0:9], ∮ ) ecx : ( ∮, 16[-160, -16]) In case of [ecx+eax] => AR[-160:-1] 10[0:15] [0:9] 10[0:0]
16
Recovering A-locs via Iteration Interpreting Indirect Memory-References –Lookup Algorithm NodeDesc : NodeDescList : An Ordered List of NodeDesc Three Operations name :the name associated with the ASI tree node length : the length of above node e.g) [nd 1, nd 2, …, nd n ] NameOutput GetChildren(aloc)List of Child Nodes GetRange(start, end)List of Nodes with offsets in the given range [start, end] GetArrayElements(m)List of Nodes with m elements
17
Recovering A-locs via Iteration Lookup Algorithm Examples e.g) Lookup p[0:39] 5[0:3] GetChildren(p) = [,, ] GetRange(0, 39) = [,, ] GetArrayElements(5) = [, ], [, ] GetRange(0, 3) = [, ]
18
An Abstraction for Heap-Allocated Storage Previous Abstraction Recency Abstraction –Allowing VSA & ASI to recover Info. About virtual- function tables –Use Two Memory-Regions per allocation site s MRAB[s] : Most Recently Allocated Block NMRAB[s] : Non-Most Recently Allocated Block count : How many concrete blocks the memory-region represents (MRAB[s].count, NMRAB[s].count) –SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]} size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size) All of the nodes allocated at a given allocation site s are folded together into a single summary node n s.
19
An Abstraction for Heap-Allocated Storage Operation –AbsEnv[s] : MRAB[s]/NMRAB[s] → –AlocEnv = a-loc → ValueSet –Allocation site s transforms absEnv to absEnv’ absEnv’(MRAB[s]) = absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count + absEnv(MRAB[s]).count absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪ absEnv(MRAB[s]).size absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv ∪ absEnv(MRAB[s]).alocEnv
20
An Abstraction for Heap-Allocated Storage
21
Experiments Environments Software OSCompilerLanguageTarget Files WindowsVisual Studio 6.0C++.obj
22
Experiments Results of Virtual-Function Call Resolution
23
Experiments Results of A-loc Identification –Comparing the Results of Algorithm with Debugging Information The structure of 87% of the local variables is correct
24
Experiments Results of A-loc Identification The structure of 72% of the objects in the heap is correct
25
Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.