Yan Chen Department of Electrical Engineering and Computer Science

NetShield: Matching a Large Vulnerability Signature Ruleset for High Performance Network Defense
Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST)

Background NIDS/NIPS (Network Intrusion Detection/Prevention System) operation Signature DB NIDS/NIPS Packets Accuracy Speed Attack Coverage Security alerts

IDS/IPS Overview We try to answer: What is protocol identification?
Why we need the protocol parsing? Why we need a MINI-parser? How MINI? How can we achieve a MINI-parser?

Regular expression (regex) based approaches
State Of The Art Regular expression (regex) based approaches Used by: Cisco IPS, Juniper IPS, open source Snort Example: .*Abc.*\x90+de[^\r\n]{30} Pros Can efficiently match multiple sigs simultaneously, through DFA Can describe the syntactic context Cons Limited expressive power Cannot describe the semantic context Inaccurate Cannot combat Conficker!

Vulnerability Signature [Wang et al. 04]
State Of The Art Vulnerability Signature [Wang et al. 04] Blaster Worm (WINRPC) Example: BIND: rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00 && context[0].abstract_syntax.uuid=UUID_RemoteActivation BIND-ACK: rpc_vers==5 && rpc_vers_minor==1 CALL: rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00 && stub.RemoteActivationBody.actual_length>=40 && matchRE( stub.buffer, /^\x5c\x00\x5c\x00/) Good state Bad Vulnerability Signature Vulnerability: design flaws enable the bad inputs lead the program to a bad state Bad input Pros Directly describe semantic context Very expressive, can express the vulnerability condition exactly Accurate Cons Slow! Existing approaches all use sequential matching Require protocol parsing

Motivation of NetShield
6

Motivation Desired Features for Signature-based NIDS/NIPS Focus of
Accuracy (especially for IPS) Speed Coverage: Large ruleset Cannot capture vulnerability condition well! Shield [sigcomm’04] Regular Expression Vulnerability Accuracy Relative Poor Much Better Speed Good ?? Memory OK Coverage Focus of this work 7 7

Vulnerability Signature Studies
Use protocol semantics to express vulnerabilities Defined on a sequence of PDUs & one predicate for each PDU Example: ver==1 && method==“put” && len(buf)>300 Data representations For all the vulnerability signatures we studied, we only need numbers and strings number operators: ==, >, <, >=, <= String operators: ==, match_re(.,.), len(.). Blaster Worm (WINRPC) Example: BIND: rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00 && context[0].abstract_syntax.uuid=UUID_RemoteActivation BIND-ACK: rpc_vers==5 && rpc_vers_minor==1 CALL: rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00 && stub.RemoteActivationBody.actual_length>=40 && matchRE( stub.buffer, /^\x5c\x00\x5c\x00/) 8 8

Research Challenges Matching thousands of vulnerability signatures simultaneously Regex rules can be merged to a single DFA, but vulnerability signature rules cannot be easily combined Sequential matching match multiple sigs. simultaneously Need high speed protocol parsing 9 9

Outline Motivation and NetShield Overview
High Speed Matching for Large Rulesets High Speed Parsing Evaluation Research Contributions 10 10

NetShield Overview

Matching Problem Formulation
Suppose we have n signatures, defined on k matching dimensions (matchers) A matcher is a two-tuple (field, operation) or a four-tuple for the associative array elements Translate the n signatures to a n by k table This translation unlocks the potential of matching multiple signatures simultaneously Rule 4: URI.Filename=“fp40reg.dll” && len(Headers[“host”])>300 RuleID Method == Filename == Header == LEN 1 DELETE * 2 POST Header.php 3 awstats.pl 4 fp40reg.dll name==“host”; len(value)>300 5 name==“User-Agent”; len(value)>544 Multiple PDU matching problem (MPM) Associate array 12

Matching Problem Formulation
Challenges for Single PDU matching problem (SPM) Large number of signatures n Large number of matchers k Large number of “don’t cares” Cannot reorder matchers arbitrarily -- buffering constraint Field dependency Arrays, associative arrays Mutually exclusive fields. Multiple PDU matching problem (MPM) Associate array 13 13

Matching Algorithms Candidate Selection Algorithm
Pre-computation decides the rule order and matcher order Decomposition. Match each matcher separately and iteratively combine the results efficiently Integer range checking  balanced binary search tree String exact matching  Trie Regex  DFA (XFA) Matcher Implementation Integer range checking: Binary search tree String exact matching: Trie String regular expression: DFA, XFA, etc. String length checking: Binary search tree 14 14

Step 1: Pre-Computation
Optimize the matcher order based on buffering constraint & field arrival order Rule reorder: 1 Require Matcher 1 Require Matcher 1 Require Matcher 2 For K matchers and N signatures, in worst case, a matcher has O(N) candidates, requiring O(K × N) operations in total. However, based on observations 1–3, we know that a matcher will usually only have C candidates, where C is a small constant. In that case, we can get O(k) speed. The algorithm needs O(K × N) space to hold the bitmap. For each connection in worst case we need O(N) space to hold the candidates. However, in practice we just need a constant space determined by C. Don’t care Matcher 1 Don’t care Matcher 1 & 2 n 15

Step 2: Iterative Matching
PDU={Method=POST, Filename=fp40reg.dll, Header: name=“host”, len(value)=450} S1={2} Candidates after match Column 1 (method==) S2= S1 A2 +B2 ={2} {}+{4}={}+{4}={4} S3=S2 A3+B3 ={4} {4}+{}={4}+{}={4} Si Don’t care matcher i+1 require In Ai+1 RuleID Method == Filename == Header == LEN 1 DELETE * 2 POST Header.php 3 awstats.pl 4 fp40reg.dll name==“host”; len(value)>300 5 name==“User-Agent”; len(value)>544 R1 R2 R3 16 16

Complexity Analysis Merging complexity Three HTTP traces:
avg(|Si|)<0.04 Two WINRPC traces: avg(|Si|)<1.5 Merging complexity Need k-1 merging iterations For each iteration Merge complexity O(n) the worst case, since Si can have O(n) candidates in the worst case rulesets For real-world rulesets, # of candidates is a small constant. Therefore, O(1) For real-world rulesets: O(k) which is the optimal we can get

Refinement and Extension
SPM improvement Allow negative conditions Handle array cases Handle associative array cases Handle mutual exclusive cases Extend to Multiple PDU Matching (MPM) Allow checkpoints. 18 18

Outline Motivation High Speed Matching for Large Rulesets.
High Speed Parsing Evaluation Research Contribution 19 19

Observations PDU  parse tree Leaf nodes are numbers or strings
array Observation 1: Only need to parse the fields related to signatures (mostly leaf nodes) Observation 2: Traditional recursive descent parsers which need one function call per node are too expensive 20 20

Efficient Parsing with State Machines
Studied eight protocols: HTTP, FTP, SMTP, eMule, BitTorrent, WINRPC, SNMP and DNS as well as their vulnerability signatures Common relationship among leaf nodes Pre-construct parsing state machines based on parse trees and vulnerability signatures Design UltraPAC, an automated fast parser generator Protocol semantic are context sensitive 21 21

Example for WINRPC Rectangles are states Parsing variables: R0 .. R4
0.61 instruction/byte for BIND PDU 22 22

Outline Motivation High Speed Matching for Large Rulesets.
High Speed Parsing Evaluation Research Contributions 23 23

Evaluation Methodology
Fully implemented prototype 12,000 lines of C++ and 3,000 lines of Python Can run on both Linux and Windows Deployed at a university DC with up to 106Mbps 26GB+ Traces from Tsinghua Univ. (TH), Northwestern (NU) and DARPA Run on a P4 3.8Ghz single core PC w/ 4GB memory After TCP reassembly and preload the PDUs in memory For HTTP we have 794 vulnerability signatures which cover 973 Snort rules. For WINRPC we have 45 vulnerability signatures which cover 3,519 Snort rules The measured links experience a sustained traffic rate of roughly 20Mbps with bursts of up to 106Mbps. 24 24

Parsing Results Trace TH DNS TH WINRPC NU WINRPC TH HTTP NU HTTP DARPA HTTP Throughput (Gbps) Binpac Our parser 0.31 3.43 1.41 16.2 1.11 12.9 2.10 7.46 14.2 44.4 1.69 6.67 Speed up ratio 11.2 11.5 11.6 3.6 3.1 3.9 Max. memory per connection (bytes) 15 14 25 25

Matching Results Trace TH WINRPC NU WINRPC TH HTTP NU HTTP DARPA HTTP Throughput (Gbps) Sequential CS Matching 10.68 14.37 9.23 10.61 0.34 2.63 2.37 17.63 0.28 1.85 Matching only time speed up ratio 4 1.8 11.3 11.7 8.8 Avg # of Candidates 1.16 1.48 0.033 0.038 0.0023 Max. memory per connection (bytes) 27 20 26 26

Scalability and Accuracy Results
Rule scaling results Accuracy Create two polymorphic WINRPC exploits which bypass the original Snort rules but detect accurately by our scheme. For 10-minute “clean” HTTP trace, Snort reported 42 alerts, NetShield reported 0 alerts. Manually verify the 42 alerts are false positives Performance decrease gracefully

Research Contribution
Make vulnerability signature a practical solution for NIDS/NIPS Regular Expression Exists Vul. IDS NetShield Accuracy Poor Good Speed Memory ?? Coverage Multiple sig. matching  candidate selection algorithm Parsing  parsing state machine Achieves high speed with much better accuracy Build a better Snort alternative! 28 28

Q & A Thanks!

Comparing With Regex Memory for 973 Snort rules: DFA 5.29GB (XFA 863 rules1.08MB), NetShield 2.3MB Per flow memory: XFA 36 bytes, NetShield 20 bytes. Throughput: XFA 756Mbps, NetShield 1.9+Gbps (*XFA [SIGCOMM08][Oakland08])

Measure Snort Rules Semi-manually classify the rules. Results
Group by CVE-ID Manually look at each vulnerability Results 86.7% of rules can be improved by protocol semantic vulnerability signatures. Most of remaining rules (9.9%) are web DHTML and scripts related which are not suitable for signature based approach. On average 4.5 Snort rules are reduced to one vulnerability signature. For binary protocol the reduction ratio is much higher than that of text based ones. For netbios.rules the ratio is 67.6. CVE Identifiers (also called "CVE-IDs," "CVE names," "CVE numbers," and "CVEs") are unique, common identifiers for publicly known information security vulnerabilities. Common vulnerability & exposures 31 31

Matcher order Reduce Si+1 Enlarge Si+1
Merging Overhead |Si| (use hash table to calculate in Ai+1, O(1)) fixed, put the matcher later, reduce Bi+1

Matcher order optimization
Worth buffering only if estmaxB(Mj)<=MaxB For Mi in AllMatchers Try to clear all the Mj in the buffer which estmaxB(Mj)<=MaxB Buffer Mi if (estmaxB(Mi)>MaxB) When len(Buf)>Buflen, remove the Mj with minimum estmaxB(Mj)

Backup Slides

Experiences Working in process Interdisciplinary research
In collaboration with MSR, apply the semantic rich analysis for cloud Web service profiling. To understand why slow and how to improve. Interdisciplinary research Student mentoring (three undergraduates, six junior graduates)

Future Work Near term Long term research interests
Web security (browser security, web server security) Data center security High speed network intrusion prevention system with hardware support Long term research interests Combating professional profit-driven attackers will be a continuous arm race Online applications (including Web 2.0 applications) become more complex and vulnerable. Network speed keeps increasing, which demands highly scalable approaches.

Research Contributions
Demonstrate vulnerability signatures can be applied to NIDS/NIPS, which can significantly improve the accuracy of current NIDS/NIPS Propose the candidate selection algorithm for matching a large number of vulnerability signatures efficiently Propose parsing state machine for fast protocol parsing Implement the NetShield 38 38

Motivation Network security has been recognized as the single most important attribute of their networks, according to survey to 395 senior executives conducted by AT&T Many new emerging threats make the situation even worse

Candidate merge operation
Si Don’t care matcher i+1 require In Ai+1 40 40

A Vulnerability Signature Example
Data representations For all the vulnerability signatures we studied, we only need numbers and strings number operators: ==, >, <, >=, <= String operators: ==, match_re(.,.), len(.). Example signature for Blaster worm Example: BIND: rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00 && context[0].abstract_syntax.uuid=UUID_RemoteActivation BIND-ACK: rpc_vers==5 && rpc_vers_minor==1 CALL: rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00 && stub.RemoteActivationBody.actual_length>=40 && matchRE( stub.buffer, /^\x5c\x00\x5c\x00/) 41 41

System Framework Scalability Scalability Scalability Scalability
Accuracy & Scalability & Coverage Accuracy & Scalability & Coverage Accuracy & Scalability & Coverage Accuracy & Scalability & Coverage Accuracy & adapt fast Accuracy & adapt fast Accuracy & adapt fast Accuracy & adapt fast Accuracy & adapt fast

Example of Vulnerability Signatures
At least 75% vulnerabilities are due to buffer overflow Sample vulnerability signature Field length corresponding to vulnerable buffer > certain threshold Intrinsic to buffer overflow vulnerability and hard to evade Overflow! Protocol message Vulnerable buffer

Yan Chen Department of Electrical Engineering and Computer Science

Similar presentations

Presentation on theme: "Yan Chen Department of Electrical Engineering and Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yan Chen Department of Electrical Engineering and Computer Science

Similar presentations

Presentation on theme: "Yan Chen Department of Electrical Engineering and Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback