Download presentation
Presentation is loading. Please wait.
Published byHomer Price Modified over 8 years ago
1
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang
2
Introduction Botnet Internet-Connected Computers AKA. Zombie Army Architecture Command & Control (C&C) via Internet Relay Chat (IRC) Peer to Peer (P2P) What can they do? DDoS Spam Facilitate other malware
3
Introduction Propagation Method Auto-self-propagating (Type I) Network Scanning Dictionary attack on Admin Share Non-auto-self-propagating (Type II) Phishing Drive-by-download Pay-per-installation
4
Introduction Research Question Are there any similarities/differences in infection patterns between these two types of botnets? E.g. Distribution of victims Motivation Further understand the nature of botnets Develop more accurate/targeted malware monitoring, detection, and prediction systems Three Major Botnets Conficker (Type I) MegaD (Type II) Srizbi (Type II)
5
Data Collection Conficker Gain total control of infected computers Uses domain-fluxing to generate C&C domain names for victims to contact Methodology shadowserver.org Sinkhole servers Registered the domain names same as Conficker’s master servers to redirect queries of infected computers. Captures communication activities
6
Data Collection MegaD & Srizbi Spam bots Methodology BOTLAB project Spam trap servers Crawling URLs DNS monitoring Determine via correlation
7
Data Collection Limitations Dynamic IP makes it hard to identify hosts Generalize IP addresses to the subnet/network level /24 subnet If 155.245.23.70 is infected, the whole subnet (155.245.23.*) will be considered as an infected network Unlikely to collect the complete data of certain botnets (in this case… data for MegaD and Srizbi) Such is life
8
Data Collection Conficker 24,912,492 victims 1,339,699 infected networks MegaD 83,316 victims 71,896 infected networks Srizbi 106,446 victims 77,934 infected networks
9
Data Collection Type I 1,339,699 infected networks Type II 137,902 infected networks Common (Type I & Type II) 97,290 infected networks
10
Cross Analysis – Initial Analysis The distributions of Type I and Type II botnet are VERY SIMILAR IP address range of (77.* - 96.*), (109.* - 125.*) and (186.* - 222.*) are highly infected by BOTH type of botnets These IP address range are continuous, which may imply that vulnerable networks are close to each other Hypothesis I – Since the two types of botnets have very different infection vectors, they may exhibit different infection patterns (e.g. distribution of their infected networks)
11
Cross Analysis – Initial Analysis Only considering about the distribution of the IP address space, no semantic meanings What are the reason for these highly infected range? Where are these networks located? Who are using these networks? What about those non-infected machines? Need more fine-grained analysis and investigation
12
Cross Analysis – Initial Analysis Follow up investigation Geographical distribution of infected networks IP address population/density Remote accessibility of networks Dynamism of IP addresses Build hypothesis for each point based on some intuition
13
Cross Analysis – Geological Location Hypothesis II – Type I and Type II infected networks are mainly distributed over similar countries. Infected networks spread all over the world with some concentrated areas. Some countries are prone to be infected by both Type I and Type II Some countries are more likely to be infected by one type of botnet E.g. China is highly prone to Type I, but not so much to Type II Countries that are more prone to Type I… discuss later! Countries that are more prone to Type II… discuss later too!
14
Cross Analysis – IP Address Population IP addresses are not assigned evenly over networks or locations Some addresses are registered only for special purposes E.g. (224.* - 239.*) is assigned for multicast addresses IP address population is different for every country > 37% of IP addresses are assigned to United States < 0.5% of IP addresses are assigned to Turkey
15
Cross Analysis – IP Address Population The number of infected network for Type I, II, I EX, II EX are relatively proportional to IP address population The more IP addresses a country has, the more infected networks it contains Observed a spike of infections over low IP address population countries. Hypothesis III – Countries with more IP addresses (high IP address population countries) might contain more of both types of infected networks than low IP address population countries.
16
Cross Analysis – IP Address Population What’s up with the spike? Security education/knowledge of people in these countries may not be as prevalent compare to others People may open some vulnerable services or click suspicious URLs without serious considerations Network configuration/protection may not be as up to date as others, making malware more easy to abuse these exploits
17
Cross Analysis – IP Address Population Since high IP address population countries are likely to have more infected network… The infected networks could still be a relatively small percentage in the address population Purely comparing between absolute infection count could shadow countries with high percentage infections Need ways of finding countries with… Low IP address population High infection percentage
18
Cross Analysis – IP Address Population
21
Cross Analysis – Remote Accessibility Check network accessibility via Ping 5 ICMP echo request packets Regard network as reachable is success Only shows the lower bound of reachable networks Perimeter defending systems (Firewall, IDS, etc.) Host may be offline Assume each /24 subnet have the same network access control policy Hypothesis IV – Networks that are more open (more directly accessible from remote hosts) might have more infected networks of Type I botnets than that of Type II botnets.
22
Cross Analysis – Remote Accessibility Able to access 54.32% of Type I infected networks Ratio could be higher Lower bound limitation Networks are aware of malware scanning attacks, so they’re more defensive Able to access 46.85% of Type II infected networks Type 2 botnets do not depend on remote accessibility anyways… Able to access >60% of common infected networks Remote accessible networks are much more vulnerable to malware attacks
23
Cross Analysis – Dynamism of IP Address Analyze % of infected networks that are using dynamic IP Keywords in reverse PTR (pointer) record E.g. dynamic-host.abcd.com Only shows the lower bound of dynamic IP addresses Limitation of reverse DNS lookup & selected keywords Hypothesis V – Places (or networks) with more dynamic IP addresses are more prone to be infected by both types of botnets.
24
Cross Analysis – Dynamism of IP Address Dynamic IP addresses are more vulnerable Type I have no network preference Scan the address space regardless whether victim is using dynamic or static IP Type II prefers dynamic addresses Most likely to be home users with less security awareness TypeDynamic IPStatic IP Common 62%38% Type I 50.1%49.9% Type II 58.4%41.6% Type I EX49.08%50.92% Type II EX51.87%48.13%
25
Victim Prediction K-Nearest Neighbour Classification (k-NN) Popular machine learning algorithm Type I botnet Done in previous work >90% accuracy Strong neighbourhood correlation due to network scanning Attempt to do k-NN for Type II botnet
26
Victim Prediction K-NN for Type II botnet Select the same features for the k-NN classifier /24 subnet address Physical location of infected networks Data preparation Infected network as malicious network Clean network as benign network Determined by looking up several DNS blacklists
27
Victim Prediction Around 88% accuracy Strong neighbourhood correlation Type II Infected network highly depends on who receives spam emails Spammers specifically harness email addresses Similar locations E.g. Same company and same university BotnetKAccuracyFalse +ive 188.35%7.35% MegaD388.25%7.36% 588.14%7.54% 188.20%6.23% Srizbi387.70%6.04% 588.30%5.77%
28
Cross-Botnet Prediction Confirm similarity between botnets Calculated the Manhattan distance between the distribution of the two types of botnets The Manhattan distance between two items is the sum of all feature value differences for each of the all features in the item. It is frequently used to denote whether two data distributions are similar or not.
29
Cross-Botnet Prediction Probability distribution of infected networks for Conficker, MegaD, and Srizbi Manhattan distance Conficker / MegaD 1.1427 Conficker / Srizbi 1.1604 MegaD / Srizbi 0.8404
30
Cross-Botnet Prediction Use currently available information to predict future unknown botnets Use k-NN classification again, but changing the training and testing sets to cross-botnets More than 83% Accuracy Even if we have no knowledge of a botnet, if they share similar infection vector, we may be able to predict unknown infected networks BotnetKAccuracyFalse +ive 187.80%7.41% 386.75%7.49% 586.45%7.69% 184.09%6.53% 383.89%6.31% 583.65%5.09% MegaD (Training) Srizbi (Testing) Srizbi(Training) MegaD(Testing)
31
Conclusion Type I and Type II botnet have similar infection distribution Some countries are more vulnerable than others Can be largely influenced by: Geological location Security education/knowledge Malware owners intention (specifically or generally target) Remote accessibility Dynamism of IP address Prediction and classification of future malware is possible if they share some common traits with existing botnets
32
Criticism Limitation in dataset can cause bias in these evaluations Conclusions drawn from each hypothesis are guessed Assumes /24 subnet have the same network access control policy Many evaluation rely on previous works, which may not be applicable or directly comparable due to difference in methodologies
33
The End Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.