Presentation is loading. Please wait.

Presentation is loading. Please wait.

A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,

Similar presentations


Presentation on theme: "A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,"— Presentation transcript:

1 A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis Computer Science Department Johns Hopkins University

2 Overview  Introduction  Background  Measurement Methodology  Malware Collection  Graybox testing  Longitudinal Tracking of Botnets  Results and Analysis  Botnet Prevalence  Spreading Methods  Growth Patterns  Botnet Structures  Effective Botnet Size  Lifetime  “Insider’s view”  Conclusion

3 Introduction  Botnets – “networks of infected end-hosts that are under the control of a human operator”  Bots – end-hosts  Botmaster – human operator  Command and Control channels facilitate botmaster commands to bots in the botnet  Channels can use different communication mechanisms (e.g. P2P) Most modern botnets use Internet Relay Chat (IRC) Originally used to form large chat rooms

4 Introduction  Botnets almost always used for illegal activities  Extortion  E-mail spamming  Identity theft  Software piracy

5 Introduction  Paper attempts to address inquiries such as:  Number of botnet “species” Behavioral categorization of different species  Evolution of a botnet

6 Background  Step 1 – Botnets commandeer victims via remotely exploiting vulnerability of software running on victim  Infection strategies include: Self-replicating worms E-mail viruses Social engineering Convincing victims to run malicious code on their machine

7 Background  Step 2 – Victim executes shellcode and image of bot binary is fetched from location within botnet  When fetch is complete, the binary installs itself on target machine and automatically starts on each reboot

8 Background  Step 3 – Bot attempts to contact IRC server (address stored in executable)  Using a DNS name instead of IP address allows botmaster to retain control if IP is blacklisted by ISP

9 Background  Step 4 – Bot attempts to establish IRC session and join C2 channel  Three authentication steps: Bot authenticates itself using PASS message This is the IRC session password Bot issues C2 channel password This password and session password are in bot binary Botmaster authenticates to bot population This prevents other botmasters from seizing control of botnet

10 Background  Step 5 – Channel topic is parsed and executed  Contains default command that every bot executes  Future commands coming from botmaster can vary widely  Wide variety of available commands/responses increases difficulty of classifying botnet behaviors

11 Measurement Methodology  Data collection includes three phases:  Malware collection  Binary analysis via gray-box testing  Tracking of IRC botnets through IRC and DNS trackers

12 Measurement| Malware Collection  Distributed darknet  Locally deployed darknet Allocated but unused portion of IP address space  14 distributed nodes using PlanetLab testbed  Goal is to collect as many bot binaries as possible  Must support a wide array of data collection endpoints and be highly scalable

13 Measurement| Malware Collection  Modified nepenthes platform  Mimics replies generated by vulnerable services Collects first-stage exploit (shell-code)  Raw packets from PlanetLab nodes translated Using translation module written in Click  Packets were injected into local tunneling interface

14 Measurement | Malware Collection  On-line download modules in nepenthes disabled to prevent excessive downloads  Binaries retrieved by generating list of URL targets and sending to download station  Download station filtered entries in list and extracted unique sources/URLs

15 Measurement | Malware Collection  Honeynet catches exploits missed by nepenthes  Composed of honeypots running unpatched, virtual instances of Windows XP Each honeypot assigned private static IP on separate VLAN Infected honeypots sustain IRC connections until VM’s reimaged Suspect binaries retrieved by comparing VM contents to clean Windows image

16 Measurement | Malware Collection  Gateway routes darknet traffic to various parts on internal network  Half of darknet prefixes directed to local responder and other half to honeynet NAT used to map each honeypot to 128 darknet IP addresses

17 Measurement | Malware Collection  Serves as firewall preventing honeypots from conducting outbound attacks or infecting each other Cross-infection prevented by: Placing each honeypot on separate VLAN and terminating cross-VLAN traffic Terminating cross-VLAN traffic Outbound traffic block on popular vulnerable ports 135, 139, 445, etc.

18 Measurement | Malware Collection  Runs IRC detection module Application-level traffic searched for common IRC protocol strings NICK, JOIN, USER Once IRC connection witnessed, detection module establishes record for IRC session When honeypot attempts to reconnect, connection allowed to proceed to IRC server

19 Measurement | Malware Collection  Detection module only allows one honeypot to connect to an IRC server at given point in time Gateway detects when honeypot is infected Rules inserted to block inbound attacks to that honeypot

20 Measurement | Malware Collection  Gateway also performs miscellaneous tasks Triggering honeypot re- imaging Loading clean Windows images Pre-filtering for download station Running local DNS server to resolve DNS queries from honeypots

21 Measurement | Graybox Testing  Graybox testing used to extract features of suspicious binaries  Analysis spans two distinct phases (performed on isolated network segment)  First phase derives network fingerprint of binary  Second phase extracts binaries IRC-specific features

22 Measurement | Graybox Testing  Phase 1: Creation of a network fingerprint  Server acts as network sink All network activity initiated by malware will be detected  Traffic logs automatically processed to extract network fingerprint DNS – target of DNS requests IPs – destination IP addresses Ports – contacted ports and protocols Scan – whether or not default scanning behavior was detected Default scanning behavior – any attempt to contact more than 20 distinct destinations on the same port during the monitored period

23 Measurement | Graybox Testing  Phase 2: Extraction of IRC-related features  Modified version of UnrealIRC daemon instantiated on network sink  IRC listens on all ports ever observed in network fingerprint  Upon detecting an IRC connection, IRC-fingerprint is created PASS – initial password to establish IRC session NICK – nickname USER – username MODE – modes set JOIN – IRC channels to be automatically joined (and their associated passwords)

24 Measurement | Graybox Testing  (Phase 2 continued…)  To learn botnet “dialect”, bot connects to local IRC server and enters default channel IRC query engine plays role of botmaster Bot behavior is learned by subjecting it to series of commands Command set includes: IRC commands observed in honeynet traces Commands extracted from publicly available bot source code

25 Measurement | Longitudinal Tracking  Botnet tracking is performed by two means:  The use of a custom, lightweight IRC tracker  Probing DNS caches across the globe

26 Measurement | Longitudinal Tracking  IRC Tracker  “A modified IRC client that can join a specified IRC channel and automatically answer directed queries based on the template created by the graybox testing technique”  IRC tracker instantiates new IRC session to IRC server using fingerprint and template IRC trackers need to appear responsive

27 Measurement | Longitudinal Tracking  In order to appear “real”, the following must be performed: Traffic filtered so inappropriate information is not included in template Filtering performed automatically while bot is executing Computer specifications (e.g. memory, disk space) are changed to resemble specifications of a real machine IRC query engine issues a set of commands that require stateful responses Emulates a bot’s stateful software

28 Measurement | Longitudinal Tracking  DNS Tracking  Most bots issue DNS queries to resolve IP addresses of IRC servers  Caches of DNS servers are probed to determine number of DNS servers giving cache hits “Cache hit” implies at least one client queried DNS server during lifetime of its DNS entry

29 Measurement | Longitudinal Tracking  Original list contained 1.6 million DNS servers First filter removed top level domains.gov,.mil, etc. Second filter checked consistency of replies Two consecutive DNS queries First query was recursive and forced DNS server to completely resolve query Second query was not recursive and obtained local answers from server cache TTL field in second response should be smaller than first After filtering, master list consisted of 800,000 name servers  For a given IRC server, the caches of all DNS servers were probed and any associated cache hits recorded

30 Results and Analysis  Results include:  Traffic traces captured on local darknet 3 month period  IRC logs gathered 3 month period  DNS cache hit results from tracking 65 IRC servers 45 day period

31 Results| Botnet Prevalence  Botnet Traffic share  Two week snapshot of total incoming SYN packets to local darknet vs. packets originating from botnet spreaders A botnet spreader is any source that delivered a bot executable  27% of incoming SYNs attributed to botnet spreaders  76% come from botnet spreaders if target ports considered

32 Results| Botnet Prevalence  More than 90% of all traffic during peaks targeted ports used by botnet spreaders  More than 70% of sources during peak periods sent shell exploits  This suggests the total amount of botnet-related traffic is far greater than 27%

33 Results| Botnet Prevalence  11% (85,000) of probed servers were involved in at least one botnet activity  55% of servers in dataset are for.com domains 82% of DNS cache hits from name servers in that domain  29% of.com servers had at least 1 cache hit .cn servers only 0.2% of total servers 95% of them exhibited botnet activity

34 Results|Spreading Methods  Botnets use a variety of means to spread and recruit new victims  Email  Web  Active scanning (most prevalent)  Botnets can be grouped into two types:  Worm-like Continuosly scan ports following target selection algorithm  Variable scanning behavior Uses a number of scanning algorithms Uniform, non-uniform, localized

35 Results|Spreading Methods  192 botnets captured  34 botnets were Type-I Upon infection, bot starts scanning IP space for new victims Initiates connection to IRC servers (identified by hard-coded list of DNS names) All IRC servers/channels bots tried to join were unreachable Channel was banned by public IRC server DNS name did not resolve to valid IP address Still, botnet grew over time due to persistence of scanning

36 Results|Spreading Methods  Type-II botnets were the most prevalent class Scanning triggered by a command More difficult to track due to continuosly changing behavior Localized and targeted scanning are were most prevalent techniques Localized scanning focused on Class B address space Targeted scanning focused on Class A address space

37 Results|Growth Patterns  In order to examine botnet growth patterns, two approaches were taken:  Cumulative number of unique DNS cache hits for distinct botnets over time was plotted  Growth pattern was compared to behavior learned from IRC tracker

38 Results|Growth Patterns  Botnets with semi-exponential growth patterns exhibit persistent random scanning activity (unchanging over time)  Example: for one botnet, topic of the corresponding channel was set to randomly scan port 445 indefinitely for one month  Related to worm infections

39 Results|Growth Patterns  Also representative of botnets with intermittent activity profiles  Example: Botnet III corresponds to botnet that infected honeypots on 3/13/2006 IRC server went down between 4/12/2006 – 4/30/2006 When IRC server became available, growth slope increased and honeypots were re-infected by the same botnet

40 Results|Growth Patterns  Predominantly used time-scoped scanning commands  As opposed to continuous scanning like the previous two

41 Results|Growth Patterns  Botnet evolution estimated by counting unique sources for message broadcast to the channel  Only plotted botnets of comparable size on a given plot  Trends confirm heterogeneity in botnets

42 Results | Botnet Structures  60% of 318 collected malicious binaries were IRC bots  Four predominant IRC structures were revealed  All bots connected to a single IRC server Prevalent among smaller classes of botnets (few hundred users) 70% of observed botnets fell into this category  IRC servers can be connected to form an IRC network supporting large numbers of users 30% of botnets bridged on multiple servers 50% bridged between two servers only  Seemingly unrelated botnets appear more similar when comparing their naming conventions, channel names, and operators’ user IDs These botnets may seem to belong to the wrong botmaster  Selected group of bots commanded to download an updated binary Results in bots being moved to a different IRC server

43 Results | Effective Botnet Size  Botnet footprint can become fairly large (> 15,000 bots)  Predominant structures were botnets managed by a single or few servers  Distinction drawn between  Botnet’s footprint  Number of bots connected to IRC channel at a given time Effective Size

44 Results | Effective Botnet Size  Some “chatty” IRC servers broadcast join/leave information for members on channel  Number of online bots versus time for these IRC servers is plotted in figure 9  Maximum size of online population is significantly smaller than botnet’s footprint  Footprint greater than 10,000  No more than 3,000 bots online at the same time  Effective size has little impact on long term activity, however, it affects number of bots available to execute commands in a timely manner

45 Results | Lifetime  Discrepancy between footprint and effective size likely due to the long lifetime of a typical botnet  Bot death rates and high churn rates can affect botnet’s effective size

46 Results | Lifetime  High churn rates  Bots do not stay long on IRC channel Average stay time: 25 minutes 90% stay less than 50 minutes  Likely causes include  Client instability (as a result of infection)  Machine hibernation  Botmasters commanding bots to leave the channel

47 Results | Botnet Software Taxonomy  183 of 192 confirmed IRC-based bot executables responded to probes of IRC query engine  49% of bots run AV/FW killer – a utility that disables anti-virus and firewall processes  43% run identd server which performs user identification Ensures only intended bots join a given IRC channel  40% run system security monitor which tightens bot security E.g. disables DCOM service and file sharing  38% run a registry monitor which alerts the bot of any attempts to disable it

48 Results | Botnet Software Taxonomy  Number of exploits within bot binaries varied from 3 to 29  Average of 15 exploits per binary  Most popular exploits (appeared in over 75% of binaries) DCOM135 LSASS445 NTPASS

49 Results | Botnet Software Taxonomy  Authors evaluated effectiveness of ClamAV and Norton anti-virus on 192 malicious binaries  ClamAV classified 137 binaries as malicious  Norton anti-virus classified 179 binaries as malicious  Windows XP service pack 2 still not immune

50 Results | “Insider’s view”  Traces show that:  Botmasters share information concerning what prefixes should not be scanned  Bots are tweaked to minimize chatter on C2 channel  Bots are probed to detect and isolate “misbehavers” Also look for “super-bots” with high bandwidth network links and large storage capacities

51 Results | “Insider’s view”  Bots migrate from one IRC channel to another, instructed by:  Command from botmaster  Download of replacement software that points to a different C2 server

52 Results | “Insider’s view”  Control commands include channel joins and leaves  Mining category includes commands that collect machine specifications  Attack category includes commands from botmasters to attack other network computers

53 Results | “Insider’s view”  Small botnets receive larger portion of control and mining commands  Hands-on botmasters that devote large amounts of time to manually control their botnet  Medium and large botnets have a larger percentage of cloning and download commands  Cloning could include the use of one botnet to attack another botnet by overloading its IRC server with join requests

54 Conclusion  Botnets are a major contributor to overall unwanted internet traffic  Most botnet traffic can be attributed to scans used to recruit new bots  IRC is still the dominant protocol used for C2 communications  Effective sizes of botnets can range from a few hundred to a few thousand  Botnet footprints are usually much larger than effective size This is due to high churn rate within a botnet Bot’s average channel occupancy is less than half an hour  Graybox testing revealed sophistication of modern bot software E.g. Self-protection measures

55 Contributions  Established empirical measurements for botnet prevalence  Particularly in considering DNS cache hits by IRC botnets that were tracked  Classified typicality's of bot binaries  Registry monitoring tactics  Locking down host vulnerabilities  Classified most prevalent botnet activities as a function of botnet size  Delineated between botnet footprint and “effective size.”  Large experiment samples further solidified results

56 Critique  Focused mainly on Windows-based systems  It would be interesting to see the effectiveness of noted infection strategies on Unix systems  Only evaluated two anti-virus applications  Perhaps include other popular anti-virus applications McAfee, Symantec Corporate, AVG, etc.  Authors noted 60% of binaries collected were IRC bots  Did the other 40% use a different communication mechanism? If so, it would be interesting to know how they were structured and if the authors evaluated them in any way

57 References [1] Rajab, M.A., Zarfoss, J., Monrose, F., & Terzis A. (2006). A multifaceted approach to understanding the botnet phenomenon. Proceedings of the 6 th ACM SIGCOMM conference on Internet measurement, Rio de Janeriro, Brazil


Download ppt "A MULTIFACETED APPROACH TO UNDERSTANDING THE BOTNET PHENOMENON (2006) Jonathan Brant CAP 6135 – Spring 2010 Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose,"

Similar presentations


Ads by Google