General approach to exploit detection and signature generation White-box Need the source code Gray-box More accurate. But need to monitor a program's execution flow Black-box Detect and analyze an exploit using the outputs of a vulnerable program.
Packet vaccine approach A black-box approach. Faster, but does not use much on data format information.
ShieldGen approach Gray-box approach General Gray-box approach is inherently specific to the attack input used in the data flow analysis. Generalize attack-specific symbolic predicate- based signatures to cover significantly more attack variants with data format-informed probing to the oracle in ShieldGen.
Packet Vaccine: Black-box Exploit Detection and Signature Generation Xiaofeng Wang, Zhuowei Li, Jun Xu, Michael K. Reiter, Chongyung Kil, Jong Youl Choi Presented by Zhaosheng Zhu
Outline Introduction to Packet Vaccine Related work Design of the packet vaccine mechanism Implementation and Evaluation Application (Good Points) Limitations (Bad Points) Conclusion
Introduction to Packet Vaccine The principle of vaccine Packet vaccine: Identify anomalous tokens in packet payloads Randomize the contents of tokens to get a vaccine Generate a signature during exception
Design of the packet vaccine mechanism
Design: 1. Vaccine Generation Build a target address set: T = [b s – au s, b s ] U [b h, b h + au h ] U S Aggregate the application payloads of the packets in one session into a dataflow, carry out a proper decoding For every byte session, do replacement Construct vaccine packet using the new data flows
Example
Design: 2. Exploit Detection and Vulnerability Diagnosis Correlate each byte sequence that equals to the forensic string with the exception Validation test Randomize all byte sequences Generate new vaccine Check Repeat
Design: 3. Signature Generation Constructs packet vaccines or probes by randomizing address-like strings It detects exploit by observing memory exception upon packet vaccine injection Generates signatures by finding in the attack input the bytes that cannot take random values
Byte-based vaccine injection Can be paralleled at most cases
Implementation Target address set is extracted from proc files Process monitor is developed using ptrace Kernel mode is necessary for CR2 Signature generation: Prober Verifier Sequential vaccine injection (performance penalty)
Evaluation Linux exploits Windows-based exploits: Code Red II Heap-based overflow
Evaluation Comparison with MEP signatures MEP signature contains richer information Quality of MEP diminishes with the availability for multiple exploit instances and application information MEP is slower
Application An architecture to protect Internet servers using packet vaccine
Application (good points) Fast Up to an order of magnitude faster than gray-box approaches Not use source code Effective Immune to interference Low overhead No need to install anything on host Lightweight Collector
Limitations Its main probing scheme randomizes each byte rather than leveraging data format information Works more reliably for text-based protocols than the binary ones because of the lack of protocol knowledge for binary data formats. Briefly mentioned the benefit of leveraging protocol specifications. Unclear what type of protocol specification language considered and how protocol specifications leveraged. Can only detect control flow hijacking attacks cannot detect exploits of the WMF vulnerability
Conclusion Packet vaccine is a fast, blackbox technique for exploit detection But not good enough in some case. If given input data format we have better approach: ShieldGen.
ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing Weidong Cui Marcus Peinado Helen J. Wang Michael E. Locasto Presented by Zhaosheng Zhu
Outline What is ShieldGen Related work and Comparison System Design Evaluation and Performance Some future work Conclusion
What is ShieldGen A system for automatically generating a data patch or a vulnerability signature for an unknown vulnerability. Leverage knowledge of the data format Use data-patch instead of traditional software patch.
SheildGen system overview
Related work Poly-graph Significant false negatives and false positives Nemean Generalization is dependent on the attack instance. Covers Signatures does not contain any protocol context. Packet vaccine Randomized each byte rather than leveraging data format information. Not efficient enough. Can only detect control-flow hijacking attack
The Oracle: a Zero-Day Attack Detector Used the Vigilante ’ s zero-day detector Based on dynamic data flow analysis Implement three vulnerability condition Arbitrary execution control (AEC) Arbitrary code execution (ACE) Arbitrary function arguments (AFA)
Data Format Spec and Data Analyzer Two assumptions to the input data Data formats are known No encryption or obfuscation are used. Two types of analyzers File data: application level protocol, host-based Network data High-speed parsing w/ preprocessed protocol parser E.g., binpac and GAPA We use GAPA as our Data analyzer
System design Design goals No false positive Minimizing the number of false negatives Minimizing the number of probes.
Data patch generation
Some methods to reduce probes Recognizing iterative elements Obeying protocol semantics and reduce illegitimate probes. High possibility that the vulnerability predicate is only dependent on the last message
Probe generation algorithm Three Steps Buffer Overrun heuristic for character strings Iteration removal Eliminating irrelevant field conditions
Buffer overrun heuristics If the offending byte lies in the middle of a byte or unicode string then ShieldGen diagnoses a buffer overrun and adds the following condition as a refinement: sizeof(buffer) > offendingByte offset – bufferStart offset
Iteration removal Many popular input formats include arbitrary sequences of largely independent elements (Records). Any input which contains a malicious record is an attack. Generating probes with removing some of the iterative elements. Iterative elements can be removed if probes still exploit successfully.
Eliminating irrelevant field conditions Constructing probes over the remaining data fields to eliminate don ’ t-care fields and to find additional values of the data fields for which the attack succeeds. Evaluating one field at one time
Evaluation Run ShieldGen for three well known vulnerabilities SQL vulnerability RPC vulnerability WMF (Window Metafile) vulnerability
Filter quality of ShieldGen For a larger sample of real-world vulnerabilities
Failure cases and analysis Complex conditions Unchecked array indices Other special cases
Future work Quality of the data format specification In our scheme the quality of data format specification matters. Complex filter conditions
Future work Probing time Reference VM is preferred Attacks not delivered by the last message
Conclusion Leverage data information to construct new attack instance Generate high quality vulnerability signatures Fewer don ’ t care fields Fewer false negatives
Thanks!