Understanding Intrusion Detection Systems Dr. K. V. Arya Multimedia & Information Security Research Group ABV-Indian Institute of Information Technology & Management Gwalior, India kvarya@iiitm.ac.in
Introduction Internet is changing computing. The possibilities and opportunities are limitless. Secure information transmission is very important in the present scenario for healthy reputation and financial status. Risks and chances of malicious involvement are increasing. Intrusion attempt: The potential possibility of a deliberate unauthorized attempt to Access Information Manipulate Information Render a system unreliable or unusable.
Introduction An Intrusion Detection system (IDS) Detects attacks as soon as possible and takes appropriate action. Does not usually take preventive measures when an attack is detected. It is a reactive rather than a pro-active agent. It plays a role of informant rather than a police officer.
Introduction Eugene Spafford reports: Information theft is up over 250% in the last 5 years. 99% of all major companies report at least one major incident. Telecom and computer fraud totaled $10 billion in the US alone. Since it seems obvious that we cannot prevent subversion, we should at least try to detect it and prevent similar attacks in future.
Very simple intrusion-detection system Source: H. Debar, An Introduction to Intrusion-Detection System, IBM Research, Zurich Research Lab
Taxonomy Source: H. Debar, An Introduction to Intrusion-Detection System, IBM Research, Zurich Research Lab
Objectives Understand the concept of IDS/IPS and the two major categorizations: based on either signature information or anomaly that generate false detections. Discussion on proposed hybrid IDS Be able to write a snort rule when given the signature and other configuration info
Elements of Intrusion Detection Primary assumptions: System activities are observable Normal and intrusive activities have distinct evidence Components of intrusion detection systems: From an algorithmic perspective: Features - capture intrusion evidences Models - piece evidences together From a system architecture perspective: Various components: audit data processor, knowledge base, decision engine, alarm generation and responses
Components of Intrusion Detection System Audit Data Preprocessor Audit Records Activity Data system activities are observable Detection Models Detection Engine Alarms normal and intrusive activities have distinct evidence Decision Table Decision Engine Action/Report
Intrusion Detection Approaches Modeling Features: evidences extracted from audit data Analysis approach: piecing the evidences together Misuse detection ( signature-based) Anomaly detection ( statistical-based) Deployment: Network-based or Host-based Network based: monitor network traffic Host based: monitor computer processes Need “both” on all these.
Misuse Detection pattern matching Intrusion Patterns: intrusion Sequences of system calls, patterns of network traffic, etc. activities pattern matching intrusion Example: if (traffic contains “x90+deZ^\r\n]{30}”) then “attack detected” Advantage: Mostly accurate. But problems? Can’t detect new attacks
Anomaly Detection probable intrusion activity measures Define a profile describing “normal” behavior, then detects deviations. Thus can detect potential new attacks. Any problem ? The two axis not shown are IO and page fault. Relatively high false positive rates Anomalies can just be new normal activities. Anomalies caused by other element faults E.g., router failure or misconfiguration, P2P misconfig Which method will detect DDoS SYN flooding ?
Host-Based IDSs Use OS auditing and monitoring/analysis mechanisms to find malware Can execute full static and dynamic analysis of a program Monitor shell commands and system calls executed by user applications and system programs Has the most comprehensive program info for detection, thus accurate Problems: If attacker takes over machine, can tamper with IDS binaries and modify audit logs User dependent: install/update IDS on all user machines! Only local view of the attack
Network Based IDSs Our network Internet Gateway routers Our network Host based detection At the early stage of the worm, only limited worm samples. Host based sensors can only cover limited IP space, which has scalability issues. Might not be able to detect the worm in its early stage.
Network IDSs Deploying sensors at strategic locations For example, Packet sniffing via tcpdump at routers Inspecting network traffic Watch for violations of protocols and unusual connection patterns Look into the packet payload for malicious code Limitations Cannot execute the payload or do any code analysis ! Even DPI gives limited application-level semantic information Record and process huge amount of traffic May be easily defeated by encryption, but can be mitigated with encryption only at the gateway/proxy Problems: mainly accuracy
Key Metrics of IDS/IPS Algorithm Architecture Alarm: A; Intrusion: I Detection (true alarm) rate: P(A|I) False negative rate P(¬A|I) False alarm (aka, false positive) rate: P(A|¬I) True negative rate P(¬A|¬I) Architecture Throughput of NIDS, targeting 10s of Gbps E.g., 32 nsec for 40 byte TCP SYN packet Resilient to attacks
Architecture of Network IDS Signature matching (& protocol parsing when needed) Protocol identification TCP reassembly Packet capture libpcap Packet stream
Problems with Current IDSs Inaccuracy for exploit based signatures Cannot recognize unknown anomalies/intrusions Cannot provide quality info for forensics or situational-aware analysis Hard to differentiate malicious events with unintentional anomalies Anomalies can be caused by network element faults, e.g., router misconfiguration, link failures, etc., or application (such as P2P) misconfiguration Cannot tell the situational-aware info: attack scope/target/strategy, attacker (botnet) size, etc.
Limitations of Exploit Based Signature 1010101 10111101 11111100 00010111 Traffic Filtering Internet X Our network X Polymorphism! Polymorphic worm might not have exact exploit based signature 27
Vulnerability Signature Vulnerability signature traffic filtering Internet X X Our network X X Vulnerability Work for polymorphic worms Work for all the worms which target the same vulnerability 28
Example of Vulnerability Signatures At least 75% vulnerabilities are due to buffer overflow Sample vulnerability signature Field length corresponding to vulnerable buffer > certain threshold Intrinsic to buffer overflow vulnerability and hard to evade Overflow! Protocol message Vulnerable buffer
Related Tools for Network IDS (I) While not an element of Snort, wireshark (used to called Ethereal) is the best open source GUI-based packet viewer www.wireshark.org offers: Support for various OS: windows, Mac OS. Included in standard packages of many different versions of Linux and UNIX For both wired and wireless networks
Related Tools for Network IDS (II) Also not an element of Snort, tcpdump is a well-established Command Line Interface (CLI) packet capture tool www.tcpdump.org offers UNIX source http://www.winpcap.org/windump/ offers windump, a Windows port of tcpdump
The Hybrid Intrusion Detection Misuse based Anomaly base Rule formation Experimental Results
Misused based Detection System (MDS) A typical misuse detection system modify existing rules attack state Audit Data System Profile Rule match ? Timing Information Add new rules
Misused based Detection System (MDS) Misuse based detection uses a database of known attack signatures Matches them with the contents of the test packet, and if a match is found, generates an alert. All other activities that do not match any of the attack signatures are considered to be normal. Disadvantage: cannot detect unknown attacks or any variant of known attacks
Types of MDSs Expert systems These are modeled in such a way as to separate the rule matching phase from the action phase. Ex: NIDES developed by SRI. NIDES follows a hybrid ID technique. It builds user profiles based on many different criteria. The expert system misused detection component encodes known scenarios and attack patterns
Types of MDSs (Contd.) Key Stroke Monitoring This is a very simple technique that monitors keystrokes for attack patterns. Features of shells in which user definable aliases are present defeat the technique unless alias expansion and semantic analysis of commands is taken up. Operating systems do not offer much support for keystroke capturing, so the keystroke monitor should have a hook that analyses keystrokes before sending them to their intended receiver. An improvement would be to monitor system calls by application programs as well.
Types of MDSs (Contd.) Model Based Intrusion Detection This states that certain scenarios are inferred by certain other observable activities. The model based scheme consists of three important modules The antcipator uses the active models and the scenario models to try to predict the next step in the scenario that is expected to occur. The planner then translates this hypothesis into a format that shows the behavior as it would occur in the audit trail. The interpreter then searches for this data in the audit trail. The system proceeds in this way, accumulating more and more evidence for an intrusion attempt until a threshold is crossed.
Types of MDSs (Contd.) State Transition Analysis Drawbacks The monitored system is presented as a state transition diagram. As data is analyzed, the system makes transitions from one state to another. A transition takes place on some Boolean condition being true. Drawbacks Attack patterns can specify only a sequence of events, rather than more complex forms. There are no general purpose methods to prune the search except through the assertion primitives. They can’t detect denial of service attacks.
Types of MDSs (Contd.) Pattern Matching Advantages This model encodes known intrusion signatures as patterns that are then matched against the auidt data. The implementation makes transitions on certain events called labels, and boolean variables called guards can be placed at each transition. Advantages Declarative Specification Multiple event streams Portability Real-time capabilities
Other Models Generic Intrusion Detection Model Independent of any particular system, application environment, system vulnerability, or type of intrusion. Network Security Monitor NSM is a network based IDS that differs from all of the IDSs it doesn’t use or analyze the host machines(s) audit trails.
Proposed Misuse based detection model Step 1: (a) Extract the header information (b) Check for anomalous field values (c) If anomalous values discovered Then Go to Step 3 Else continue with Step 2; Step 2: (a) Derive network attributes values as in Table 1 (b) Send for statistical analysis (c) Check for anomalous behavior by matching the appropriate "group of attributes" threshold values as in newly formed rules (d) If anomalous values discovered Step 3: (a) Extract payload or content portion of the packet (b) Match it against the known signature and patterns from database (c) If attack signature match found Then Alert; Else continue from Step 1.
Anomaly Based Detection An anomaly is defined as something that is not nominal or normal. Anomaly detection is split into two separate categories: static and dynamic. Static assumes that one or more sections on the host should remain constant Focus only on the software side and ignore any unusual changes in hardware Used to monitor data integrity Dynamic Depends on a baseline or profile Baseline established by IDS or network administrator Baseline tells the system what kind of traffic looks normal May include information about bandwidth, ports, time frames etc...
Advantages of Anomaly Based Detection The network can be in an unprotected state as the system builds its profile. If malicious activity looks like normal traffic to the system it will never send an alarm. False positives can become cumbersome with an anomaly based setup. Normal usage such as checking e-mail after a meeting has the potential to signal an alarm.
Disadvantages of Anomaly Based Detection The network can be in an unprotected state as the system builds its profile. If malicious activity looks like normal traffic to the system it will never send an alarm. False positives can become cumbersome with an anomaly based setup. Normal usage such as checking e-mail after a meeting has the potential to signal an alarm
Active Intrusion Detection Systems Passive systems can only send an alarm to an administrator when there is an attempt in progress. An active system can take control of the situation by disconnecting the assailant Methods: Session Disruption: IDS may send a TCP reset packet if the attacker has opened a TCP connection to the victim IDS may send various UDP packets to disrupt a UDP connection Will not permanently remedy the situation only disconnect the current connection Rule Modification IDS is linked to a firewall via an administrative link IDS communicates with the firewall telling it to drop all packets from the attackers IP Address
COSTS CSO magazine’s 2006 E-Crime Watch survey revealed that the damage done by enterprise security events is getting worse. 63% of respondents reported operational losses as a result of e-crime, 23% reported harm done to their organization’s reputation and 40% reported financial losses, which averaged "$740,000 in 2005 compared to an average of $507,000 in 2004." Intrusion Detection Systems range in price anywhere from $4,000 - $60,000 depending on the features that a company may need The price may appear high to some but when compared to the cost of the damage that may be done it’s a well spent investment to a company Remember that data is very hard to put a price tag on if lost
Anomaly based detection Anomaly based detection algorithms create a model of normal use. Look for deviation in the activities that do not conform to normal behavior. Data mining and artificial intelligence can be used to find outlier patterns. Advantage: can detect unknown or new attacks. Most of the available IDSs use one of the two detection algorithms. To improve the detection process misuse detection can be combined with anomaly
HYBRID INTRUSION DETECTION The hybrid intrusion detection (HID) algorithm combines anomaly detection using seeded k-means and misuse based detection using tuned Snort rules The network traffic is analyzed where the packets are captured using libpcap tool of the operating system. Preprocessing and decoding of the captured packets is done using unmodified Snort-IDS.
HYBRID INTRUSION DETECTION Next misuse-based detection is carried out for signature and contains matching through modified and tuned snort rules. Thereafter anomaly based detection is performed by trained seeded k-means clustering algorithm.
Flowchart of proposed model
Anomaly based detection of proposed model Seeded k-means for anomaly based detection Input: Data set X = {xi} for all , number of clusters K, and set of initial seeds S = {Sl} for all Output: Disjoint K partitioning of such that K-means objective function is optimized. Process: Initialize: Assign Clusters: Assign each data point x to the cluster h* i. e., Xh* (t + 1) , for h* = argmin || xi – xh(t) ||2 Estimate Mean: Repeat Step 3 through 5 till convergence is achieved.
FORMATION OF RULES IF (Protocol type=tcp) AND (src bytes= 0) AND (hot= 0) AND (same srv rate < 1) AND (diff srv rate > 0) AND (dst host same srv rate< 0.30) AND (dst host diff srv rate > 0.12) AND (dst host same src port rate > 0.05) AND (dst host serror rate > 0.05)AND (dst host srv serror rate> 0.9) THEN attack=PORTSWEEP (protocol type=tcp) AND (src bytes= 0) AND (hot= 0) AND (serror rate > 0) AND (rerror rate< 1) AND (diff srv rate = 1) AND (dst host same srv rate< 0:30) AND (dst host diff srv rate = 1) AND (dst host same src port rate > 0:05) AND (dst host rerror rate>0.6) THEN attack=SATAN
Mapping of network attributes to Snort keywords Description of Snort keywords or Network Attributes ip_proto Protocol IP protocol type src_ip $HOME_NET or $EXTERNAL_NET Source address or the originating address of the packets dest_ip Destination address of the packets service Port number Service type - http, ftp etc icmp_type itype ICMP message type src_bytes dsize Size of the payload Flags TCP Flags dest_count (n) Threshold: track by_dst, count(n) Number of connection from the same source src_count (n) by_src, count(n) Number of connections to the same destination
Network attributes used to construct rules to detect attacks Name of attribute Type Description Protocol nominal IP protocol type – TCP, UDP or ICMP Count integer sum of connections to the same destination IP address Src bytes Source bytes srv count sum of connections to the same destination port number dst host count sum of connections to the same destination IP address dst host srv count sum of connections to the same destination port number Hot sum of hot actions in a connection such as entering a system directory, creating programs, executing programs serror rate real % of connections that have activated s0, s1, s2 or s3, among the connections aggregated in count rerror rate the % of connections that activated flag REJ, among the connections aggregated in count
Network attributes used to construct rules to detect attacks Name of attribute Type Description diff srv rate real % of connections that were to different services, among the connections aggregated in count dst host same srv rate % of connections to the same service among connections aggregated in count dst host diff srv rate % of connections were to dierent services, among connections aggregated in dst host count dst host same src port rate % of connections to the same source port, among connections aggregated in dst host srv count dst host rerror rate % of connections that activated flag REJ, among connections aggregated in dst host count
Total attacks detected Total no. of connection events Comparison of Detection Algorithms on 10% unlabeled KDDcup99 dataset Detection Module Total attacks detected Detection Rate (%) Miss Rate False Alarm Rate (%) Snort 4142 70.50 29.50 1.24 Seeded k-means 4285 72.93 27.07 1.12 Proposed hybrid algorithm 4889 83.21 16.79 1.20 Comparison of time required to train the proposed hybrid algorithm and FERs Detection Module Total no. of connection events Training Time (seconds) Detection Time Proposed hybrid algorithm 20000 347 174 FERs 1125 1512
Conclusions A hybrid detection algorithm is developed by combining the attributes of signature based and anomaly based detection algorithms performs better detection than any of the detection methods – only Snort or only seeded k-means. Seeded k-means algorithm is used for anomaly detection and modified and tuned snort rules are used for misuse detection. On KDDcup99 dataset a better detection rate as compared to seeded k-means and tuned snort IDS.
Next Generation IDSs Vulnerability-based Adaptive - Automatically detect & generate signatures for zero-day attacks Scenario-based for forensics and being situational-aware Correlate (multiple sources of) audit data and attack information