Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours.

Information Fusion Ganesh Godavari

DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours The five phases of the attack scenario depicted [1]: –IPsweep of the Air Force Base from a remote site –Probe of live IP's to look for the sadmind daemon running on Solaris hosts –Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts –Installation of the trojan mstream DDoS software on three hosts at the AFB –Launching the DDoS

Attack Scenario [1]

Phase 1 Attack (DDoS DataSet) Date Time Duration SrcIP Target IP AnalyzerService 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.5 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:38 00:00:00 202.77.162.213 172.16.115.87 tcpdump_inside icmp-E-R 03/07/2000 09:51:38 00:00:00 172.16.115.87 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:41 00:00:00 202.77.162.213 172.16.115.234 tcpdump_insideicmp-E-R 03/07/2000 09:51:50 00:00:00 202.77.162.213 172.16.113.50 tcpdump_insideicmp-E-R 03/07/2000 09:51:50 00:00:00 172.16.113.50 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.84 tcpdump_inside icmp-E-R 03/07/2000 09:51:51 00:00:09 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.105 tcpdump_insideicmp-E-R 03/07/2000 09:51:51 00:00:00 172.16.113.105 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:52 00:00:00 202.77.162.213 172.16.113.148 tcpdump_inside icmp-E-R ::::: : 03/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.194 tcpdump_inside icmp-E-R 03/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.207 tcpdump_inside icmp-E-R icmp-E-R => icmp-echo-request icmp-E-Rp => icmp-echo-reply

Algorithm Step 1: go over the data file and build vocabulary –Read all the unique fields in the data files Step 2: identify the frequent vocabulary in the data file –How to determine frequency? How can one determine the threshold for frequency ? Step 3: Generate cluster candidates –Lines containing the same frequent words form cluster Step 4: Identify temporal relationships between cluster candidates –The 24 relationships of data Step 5: Generate unique lines –Lines in the data file in based on the candidate cluster

Need Suggestions Is it safe to assume that a threshold parameter is provided? Cluster candidate generation can involve too much data generation (next slide shows how) 24 relations cover everything. Need to identify on which we are interested in?

Cluster Candidate Generation Data Set has 8 dimensions frequent words(4byte col. # word) with threshold > 10 are –0004202.77.162.213 repeated 22 –000103/07/2000 repeated 33 –000300:00:00 repeated 31 –0007icmp-echo-request repeated 22 –0007icmp-echo-reply repeated 11 –0006tcpdump_inside repeated 33 –0005202.77.162.213 repeated 11

Candidate Generation Example Example 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.5 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_inside icmp-E-Rp In all data first field is common so should they be considered as a candidate cluster? for each frequent-word in frequent-word-list { While (Read a Line of data != EOF) { if (frequent-word in line) add line no. to Cluster } // end of while } // end of for Cluster 1 = { line 1, line 2, line 3, line 4} Cluster 2 = { line 1, line 3, line 4} Cluster 3 = { line 1, line 3} Cluster 4 = { line 2, line 4} Cluster 5 = { line 1, line 2, line 3, line 4} Cluster 5 = { line 1, line 3} Cluster 6 = { line 2, line 4}

Another Approach? Reduction but loss of information? Char Key While (Read a Line of data != EOF) { for each frequent-word in frequent-word-list { if (frequent-word in line) key = key + frequent-word } // end of for if ( key not in Cluster) add line no. to cluster } // end of while –Cluster 1 = { line 1, line 3} –Cluster 2 = { line 2} –Cluster 3 = { line 4}

Temporal Relations Unable to find a place where the 24 temporal relationship do not meet Need to identify relationships that are needed by the decision making

Work to be done Completed the algorithm and coding part till step 4.

References [1] MIT Lincoln laboratories http://www.ll.mit.edu/IST/ideval/data/2000/ 2000_data_index.html

Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours.

Similar presentations

Presentation on theme: "Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours.

Similar presentations

Presentation on theme: "Information Fusion Ganesh Godavari. DDoS Data Set DARPA DDoS data set (2000) is available –MIT Lincoln Laboratory –Data Set spans approximately 3 hours."— Presentation transcript:

Similar presentations

About project

Feedback