Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours.

Similar presentations


Presentation on theme: "Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours."— Presentation transcript:

1 Information Fusion Ganesh Godavari

2 DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours The five phases of the attack scenario depicted [1]: –IPsweep of the Air Force Base from a remote site –Probe of live IP's to look for the sadmind daemon running on Solaris hosts –Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts –Installation of the trojan mstream DDoS software on three hosts at the AFB –Launching the DDoS

3 Related Work Charu C. Aggarwal Philip S. Yu (2001) “Outlier detection for high dimensional data”, International Conference on Management of Data, ACM SIGMOD Pg: 37 – 46 John McHugh (2000) “Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory”, ACM TISSEC, 3(4) Pg: 262 - 294 Risto Vaarandi. (2003) A Data Clustering Algorithm for Mining Patterns From Event Logs. Work shop on IEEE IP Operations and Management

4 Attack Scenario [1]

5 Phase 1 Attack (DDoS DataSet) IdDate Time Duration SrcIPTarget IP AnalyzerService 103/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.5 tcpdump_inside icmp-E-R 203/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 3 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 4 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_inside icmp-E-Rp 5 03/07/2000 09:51:38 00:00:00 202.77.162.213 172.16.115.87 tcpdump_inside icmp-E-R 603/07/2000 09:51:38 00:00:00 172.16.115.87 202.77.162.213 tcpdump_inside icmp-E-Rp 703/07/2000 09:51:41 00:00:00 202.77.162.213 172.16.115.234 tcpdump_insideicmp-E-R 803/07/2000 09:51:50 00:00:00 202.77.162.213 172.16.113.50 tcpdump_insideicmp-E-R 903/07/2000 09:51:50 00:00:00 172.16.113.50 202.77.162.213 tcpdump_inside icmp-E-Rp 10 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.84 tcpdump_inside icmp-E-R 11 03/07/2000 09:51:51 00:00:09 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 12 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.105 tcpdump_insideicmp-E-R 13 03/07/2000 09:51:51 00:00:00 172.16.113.105 202.77.162.213 tcpdump_inside icmp-E-Rp 14 03/07/2000 09:51:52 00:00:00 202.77.162.213 172.16.113.148 tcpdump_inside icmp-E-R :::::: 3203/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.194 tcpdump_inside icmp-E-R 3303/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.207 tcpdump_inside icmp-E-R icmp-E-R => icmp-echo-request icmp-E-Rp => icmp-echo-reply

6 Algorithm Step 1: go over the data file and build vocabulary –Read all the unique fields in the data files Step 2: identify the frequent vocabulary in the data file –How to determine frequency? How can one determine the threshold for frequency ? Step 3: Generate cluster candidates –Lines containing the same frequent words form cluster Step 4: Identify temporal relationships between cluster candidates –The 24 relationships of data Step 5: Generate unique lines –Lines in the data file in based on the candidate cluster

7 Need Suggestions Is it safe to assume that a threshold parameter is provided? Cluster candidate generation can involve too much data generation (next slide shows how)

8 Cluster Candidate Generation Data Set has 8 dimensions frequent words(4byte col. # word) with threshold > 10 are –0004202.77.162.213 repeated 22 –000103/07/2000 repeated 33 –000300:00:00 repeated 31 –0007icmp-echo-request repeated 22 –0007icmp-echo-reply repeated 11 –0006tcpdump_inside repeated 33 –0005202.77.162.213 repeated 11

9 Candidate Generation Example Example 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.5 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_inside icmp-E-Rp In all data first field is common so should they be considered as a candidate cluster? Cluster 1 = { line 1, line 2, line 3, line 4} Cluster 2 = { line 1, line 3, line 4} Cluster 3 = { line 1, line 3} Cluster 4 = { line 2, line 4} Cluster 5 = { line 1, line 2, line 3, line 4} Cluster 5 = { line 1, line 3} Cluster 6 = { line 2, line 4} Reduction but loss of information? –Cluster 1 = { line 1, line 3} –Cluster 2 = { line 2} –Cluster 3 = { line 4}

10 Work to be done Complete the algorithm and coding part

11 References [1] MIT Lincoln laboratories http://www.ll.mit.edu/IST/ideval/data/2000/ 2000_data_index.html


Download ppt "Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours."

Similar presentations


Ads by Google