Download presentation
Presentation is loading. Please wait.
Published byDayna Morgan Crawford Modified over 9 years ago
1
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08)
2
Motivation -- Most software takes structural input
3
Applications -- Software Testing/Debugging Using Input Grammar to Generate Test Cases K. Hanford. Automatic Generation of Test Cases. In IBM Systems Journal, 9(4), 1970. P. Purdom. A sentence generator for testing parsers. In BIT Numerical Mathematics, 12(3), 1972 Grammar based whitebox fuzz [PLDI’08] Delta Debugging Reducing large failure input [TSE’02] Hierarchical Delta Debugging (HDD) [ICSE’06] Execution Fast Forwarding Reducing Event Log for failure replay[FSE’06]
4
Applications -- Computer Security Malware, Attack instance Signature generation Exploit (input) Signature Payload length, keywords, Field structure… Penetration testing Software vulnerability Play with Input (fuzz) Packet Vaccine [CCS’06] ShieldGen [IEEE S&P’07] Malware Protocol Replayer Malware feature Replay the protocolInput Format
5
Challenges Input structure exists in a machine unfriendly way Plain text (ASCII Stream, e.g., C File) Binary Code (Protocol Message Stream) Known specification (RFC) Implementation Deviation Unknown Specification Malware Bot Botnet protocol Legal software SAMBA protocol (12 years for open source community)
6
Challenges May not have the Source Code Access Penetration testing Malware analysis Legal software Working on binary
7
Our Contributions 2 different approaches to handling 2 types of parsers Using Dynamic Control Dependency to handle top down parsers A new dynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees
8
Outline Motivation Technical Description Handling Inputs with A Top-down Parser Handling Inputs with A Bottom-up Parser Evaluation Discussion Related Work Conclusion
9
I. Top down Parser Parse input in a top-down manner. S B H SHB HhNhN N1|21|2 BbB|εbB|ε hN 1 Bb ε B b h1bbε
10
Implementation Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SHB HhNhN N1|21|2 BbB|εbB|ε H B
11
Execution Trace Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() break c = getchar() h 1 while(c==‘b’) b1b1 if(c==‘ε’’) b2b2 while(c==‘b’) b2b2 if(c==‘ε’’) ε h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes
12
Execution Trace Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 c=getchar() if(c==‘h’) c = getchar() while(c==‘b’) break if(c==‘ε’’) c = getchar() h if(c==‘1’||’2’) 1 b1b1 if(c==‘ε’’) c = getchar() b2b2 while(c==‘b’) b2b2 ε h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes if(c==‘ε’’) c = getchar() while(c==‘b’)
13
Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Control dependency graph for the execution trace c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() while(c==‘b’) if(c==‘ε’’) c = getchar() break while(c==‘b’) if(c==‘ε’’) c = getchar() h 1 b2b2 b1b1 b2b2 ε START A Control Dependency Graph: A Graph in which any given node directly controls its child node execution S B H hN 1 Bb ε B b
14
Eliminate non data use node c=getchar() if(c==‘h’) c = getchar() if(c==‘1’||’2’) c = getchar() while(c==‘b’) if(c==‘ε’’) c = getchar() break while(c==‘b’) if(c==‘ε’’) c = getchar() START h 1 b2b2 b1b1 b2b2 ε S B H hN 1 Bb ε B b
15
Add Data Use Leaf Node if(c==‘h’) if(c==‘1’||’2’) while(c==‘b’) if(c==‘ε’’) while(c==‘b’) if(c==‘ε’’) START h 1 b2b2 b1b1 b2b2 ε S B H hN 1 Bb ε B b
16
Add Data Use Leaf Node if(c==‘h’) if(c==‘1’||’2’) while(c==‘b’) if(c==‘ε’’) while(c==‘b’) if(c==‘ε’’) START h 1 b 1 b 2 ε S B H hN 1 Bb ε B b
17
Eliminate Redundant Node 2 if(c==‘h’) 4 if(c==‘1’||’2’) 9 1 while(c==‘b’) 11 1 if(c==‘ε’’) START h 1 b 1 b 2 9 2 while(c==‘b’) 11 2 if(c==‘ε’’) b 2 ε S B H hN 1 Bb ε B b Identical Node
18
II. Bottom up parser Parse input in a bottom up manner Programming languages lex/yacc SAB Aaa Bb aab S a B a b A
19
A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A → β stack.pop (|β|); stack.push (A); } aab SAB Aaa Bb Trace: while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….
20
A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A → β stack.pop (|β|); stack.push (A); } aab SAB Aaa Bb Trace: while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.pop(aa); stack.push(A)….
21
Tree Construction aab SAB Aaa Bb Stack Operation Trace: Push(a), Push(a), Pop(aa), Push(A) Push(b), Pop(b), Push(B), Pop(AB), Push(S) Pop(b) Push(B) Push(b) Push(a) Push(A) Push(a) Push(S) S a B a b A Identify the parsing stack Identical Node
22
Evaluation – Top down grammar Bad?
23
Evaluation – Top down grammar
24
Evaluation – Bottom up grammar Identical Node
25
Performance Overhead 5X-45X 6X-8X
26
Discussion Grammar categories Top down, bottom up, any others? Possible to evade the control dependency structure in top down parser implementation. Individual input Multiple input final grammar Syntactic Structure Semantics
27
Related Work Network Protocol Format Reverse Engineering Instruction Semantics (Comparison, loop keyword, delimiter) Polyglot [CCS’07] Automatic Network Protocol Analysis [NDSS’08] Tupni [CCS’08] Execution Context (Call stack, PC) AutoFormat [NDSS’08] Limitations Part of the problem space Only top-down parsers. Part of the problem’s essence. Comparison (predicate), call stack control dependency
28
Conclusion Two dynamic analyses to construct input structure from program execution. No source code access or any symbolic information. Highly effective and produce input syntax trees with high quality.
29
Thank you To further contact us: {zlin,xyzhang}@cs.purdue.eduzlin,xyzhang}@cs.purdue.edu Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.