1 Using Failure Information Analysis to Detect Enterprise Zombies Zhaosheng Zhu, Vinod Yegneswaran, Yan Chen Lab of Internet and Security Technology Northwestern University SRI International
2 Motivation Increasing prevalence and sophistication of malware Increasing prevalence and sophistication of malware Current solutions are a day late and dollar short Current solutions are a day late and dollar short NIDS NIDS Firewalls Firewalls AV systems AV systems Conficker is a great example! Conficker is a great example! Over 10M hosts infected across variants A/B/C Over 10M hosts infected across variants A/B/C
3 Related Work BotHunter [Usenix Security 2007] BotHunter [Usenix Security 2007] Dialog Correlation Engine to detect enterprise bots Dialog Correlation Engine to detect enterprise bots Models lifecycle of bots: Models lifecycle of bots: Inbound Scan / Exploit / Egg download / C & C / Outbound Scans Inbound Scan / Exploit / Egg download / C & C / Outbound Scans Relies on Snort signatures to detect different phases Relies on Snort signatures to detect different phases Rishi [HotBots 07]: Detects IRC bots based on nickname patterns Rishi [HotBots 07]: Detects IRC bots based on nickname patterns BotSniffer [NDSS 08] BotSniffer [NDSS 08] Uses spatio-temporal correlation to detect C&C activity Uses spatio-temporal correlation to detect C&C activity BotMiner [Usenix Security 08] BotMiner [Usenix Security 08] Combines clustering with BotHunter and BotSniffer heuristics Combines clustering with BotHunter and BotSniffer heuristics Focus on successful bot communication patterns Focus on successful bot communication patterns
4 Objective and Approach Develop a complement to existing network defenses to improve its resilience and robustness Develop a complement to existing network defenses to improve its resilience and robustness Signature independent Signature independent Malware family independent – no prior knowledge on malware semantics or C&C mechanisms needed Malware family independent – no prior knowledge on malware semantics or C&C mechanisms needed Malware class independent (detect more than bots) Malware class independent (detect more than bots) Key idea: Failure Information Analysis Key idea: Failure Information Analysis Observation: malware communication patterns result in abnormally high failure rates Observation: malware communication patterns result in abnormally high failure rates Correlates network and application failures at multi-points Correlates network and application failures at multi-points
5 Outline Motivations and Key Idea Motivations and Key Idea Empirical Failure Pattern Study: Malware and Normal Applications Empirical Failure Pattern Study: Malware and Normal Applications Netfuse Design Netfuse Design Evaluations Evaluations Conclusions Conclusions
6 Malware Failure Patterns Empirical survey of 32 malware instances with long- lived traces (5 – 8 hours) Empirical survey of 32 malware instances with long- lived traces (5 – 8 hours) SRI honeynet, spamtrap and Offensive Computing SRI honeynet, spamtrap and Offensive Computing Spyware, HTTP botnet, IRC botnet, P2P botnet, Worm Spyware, HTTP botnet, IRC botnet, P2P botnet, Worm Application protocols studied: Application protocols studied: DNS, HTTP, FTP, SMTP, IRC DNS, HTTP, FTP, SMTP, IRC 24/32 generated failures 24/32 generated failures 18/32 generated DNS failures 18/32 generated DNS failures Mostly NXDOMAINs Mostly NXDOMAINs DNS failures part of normal behavior for some bots like Kraken and Conficker (generates new list of C&C rendezvous points everyday) DNS failures part of normal behavior for some bots like Kraken and Conficker (generates new list of C&C rendezvous points everyday)
7 Malware Failure Patterns (2) SMTP failures part of most spam bots SMTP failures part of most spam bots Storm, Bobax etc. Storm, Bobax etc. 550: recipient address rejected 550: recipient address rejected HTTP failures HTTP failures Generated by worms: Virut (DoS attacks) and Weby Generated by worms: Virut (DoS attacks) and Weby Weby contacts remote servers to get configuration info Weby contacts remote servers to get configuration info IRC failures IRC failures Channel removed from a public IRC server Channel removed from a public IRC server Channel is full due to too many bots Channel is full due to too many bots
8 MALWARECLASS DNS rate HTTP rate ICMP rate SMTP rate TCP rate Look2meWsnpoemSPYWARE515 BobaxKrakenHTTPBOTNET AgobotGobot Sdbot I+II Spybot I/II/III WootbotWebloitIRCBOTNET Nugache Storm I/II P2PBOTNET AllapleGrumKwbotMytobNetskyProtorideVirutWebyWORM
9 Normal Applications Studied Webcrawler Webcrawler news.sohu.com, amazon.com, bofa.com, imdb.com news.sohu.com, amazon.com, bofa.com, imdb.com P2P P2P BitTorrent, Emule BitTorrent, Emule Video Video Youtube.com Youtube.com HTTP 304/Not Modified errors whitelisted HTTP 304/Not Modified errors whitelisted
10 Normal Applications Studied For video traffic, no transport-layer failures For video traffic, no transport-layer failures Application level only “HTTP 304/Not modified” failures. Application level only “HTTP 304/Not modified” failures.
11 Normal Application Failure Patterns ApplicationHTTP Hourly rate ICMP TCP # ports/ Hourly rate Sohu.comAmazon.comImdb.comBofa.com /0.041/1.41/0.21/0.9 BitTorrenteMule /333839/370
12 Empirical Analysis Summary High volume failures are good indicators of malware High volume failures are good indicators of malware DNS failures (NXDomain messages) are common among malware DNS failures (NXDomain messages) are common among malware Malware failures tend to be persistent Malware failures tend to be persistent Malware failure patterns tend to be repetitive (low entropy) while normal applications don’t Malware failure patterns tend to be repetitive (low entropy) while normal applications don’t
13 Outline Motivations and Key Idea Motivations and Key Idea Empirical Failure Pattern Study: Malware and Normal Applications Empirical Failure Pattern Study: Malware and Normal Applications Netfuse Design Netfuse Design Evaluations Evaluations Conclusions Conclusions
14 Netfuse Design Netfuse: a behavior based network monitor Netfuse: a behavior based network monitor Correlates network and application failures Correlates network and application failures Wireshark and L7 filters for protocol parsing Wireshark and L7 filters for protocol parsing Multi-point failure monitoring Multi-point failure monitoring Netfuse components Netfuse components FIA (Failure Information Analysis) Engine FIA (Failure Information Analysis) Engine DNSMon DNSMon SVM-based Correlation Engine SVM-based Correlation Engine Clustering Clustering
15 Multi-point Deployment Enterprise Network DNSMon Gateway FIA Failure Scores SVM Correlation Clustering
16 FIA Engine Wireshark: open source protocol analyzer / dissector Wireshark: open source protocol analyzer / dissector Analyzes online and offline pcap captures Analyzes online and offline pcap captures Supports most protocols Supports most protocols Uses port numbers to choose dissectors Uses port numbers to choose dissectors Augment wireshark with L7 protocol signatures Augment wireshark with L7 protocol signatures Automated decoding with payload signatures Automated decoding with payload signatures Sample sig for HTTP Sample sig for HTTP http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9] [\x09-\x0d - ~]*(connection:|content-type:|content-length:|date:)|post [\x09-\x0d -~]* http/[01 ]\.[019] http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9] [\x09-\x0d - ~]*(connection:|content-type:|content-length:|date:)|post [\x09-\x0d -~]* http/[01 ]\.[019]
17 DNSMon DNS servers typically located inside enterprise networks DNS servers typically located inside enterprise networks Suspicious domain lookups can’t be tracked back to original clients from gateway traces Suspicious domain lookups can’t be tracked back to original clients from gateway traces Especially true for NXDomain lookups Especially true for NXDomain lookups DNS Caching DNS Caching DNSMon track traffic b/t clients and resolving DNS server DNSMon track traffic b/t clients and resolving DNS server More comprehensive view of failure activity More comprehensive view of failure activity
18 Correlation Engine Integrates four failure scores Integrates four failure scores Composite Failure Score Composite Failure Score Failure Divergence Score Failure Divergence Score Failure Entropy Score Failure Entropy Score Failure Persistence Score Failure Persistence Score Malware failures tend to be long-lived Malware failures tend to be long-lived SVM-based correlation using Weka SVM-based correlation using Weka
19 Composite Failure Score Estimates severity of each host based on failure volume Estimates severity of each host based on failure volume Consider hosts Consider hosts Large # of application failures (e.g., > 15 per min) or Large # of application failures (e.g., > 15 per min) or TCP RST, ICMP failures > 2 std. dev from mean of all hosts TCP RST, ICMP failures > 2 std. dev from mean of all hosts Compute weighted failure score based on failure frequency of protocol Compute weighted failure score based on failure frequency of protocol
20 Failure Persistence Score Motivated by observation that malware failures tend to be long-lived Motivated by observation that malware failures tend to be long-lived Split time horizon into N parts and compute number of parts where failure occurs Split time horizon into N parts and compute number of parts where failure occurs In our experiments N = 24 In our experiments N = 24
21 Failure Divergence Score Measure degree of uptick in a host’s failure profile Measure degree of uptick in a host’s failure profile Newly infected hosts would demonstrate strong and positive dynamics Newly infected hosts would demonstrate strong and positive dynamics EWMA Algorithm EWMA Algorithm α = 0.5 α = 0.5 For each host, protocol and date compute difference between expected and actual value. For each host, protocol and date compute difference between expected and actual value. Add divergence of each protocol for that host Add divergence of each protocol for that host Normalize by dividing with the maximum divergence value for all hosts Normalize by dividing with the maximum divergence value for all hosts
22 Failure Entropy Score Measure degree of diversity in a host’s failure profile Measure degree of diversity in a host’s failure profile Malware failures tend to be redundant (low diversity) Malware failures tend to be redundant (low diversity) TCP: track server/port distribution of each client receiving failures TCP: track server/port distribution of each client receiving failures DNS: track domain name diversity DNS: track domain name diversity HTTP/SMTP/FTP: track failure types and host names HTTP/SMTP/FTP: track failure types and host names Ignore ICMP Ignore ICMP Compute weighted average failure entropy score Compute weighted average failure entropy score Protocols that dominate failure volume of a host get higher weights Protocols that dominate failure volume of a host get higher weights
23 Outline Motivations and Key Idea Motivations and Key Idea Empirical Failure Pattern Study: Malware and Normal Applications Empirical Failure Pattern Study: Malware and Normal Applications Netfuse Design Netfuse Design Evaluations Evaluations Conclusions Conclusions
24 Evaluation Traces Malware I: 24 malware traces from failure pattern study Malware I: 24 malware traces from failure pattern study Malware II: 5 new malware families (Peacomm, Mimail, Rbot, Bifrose, Kraken) + 3 trained families Malware II: 5 new malware families (Peacomm, Mimail, Rbot, Bifrose, Kraken) + 3 trained families Run for 8 to 10 hours each. Run for 8 to 10 hours each. Malware III: 242 traces selected from 5000 malware sandbox traces based on duration & trace size Malware III: 242 traces selected from 5000 malware sandbox traces based on duration & trace size Institute Traces: Benign traces from well-administered Class B (/16) network with hundreds of machines (5- day and 12-day) Institute Traces: Benign traces from well-administered Class B (/16) network with hundreds of machines (5- day and 12-day)
25 Evaluation Methodology 5-day Institute Trace 12-day Institute Trace Malware Trace I TrainingTesting Malware Trace 2 Testing Malware Trace 3 Testing
26 Detection Rate
27 False Positive Rate
28 Performance Summary Detection rate > 92% for traces I/II Detection rate > 92% for traces I/II Detection rate under 40% for trace III Detection rate under 40% for trace III Trace includes many types of malware including adware with failure patterns similar to benign applications Trace includes many types of malware including adware with failure patterns similar to benign applications Traces are short, many under 15 mins Traces are short, many under 15 mins False positive rate < 5% False positive rate < 5%
29 Clustering Results Peacomm pkts 3/3100% Bifrose306353/3100% Mimail /3100% Kraken495053/3100% Sdbot /3100% Spybot797503/3100% Rbot /3100% Weby90003/3100% Cluster detected hosts based on their failure profile 24 instances belong to 8 different types of malware
30 Conclusions Failure Information Analysis Failure Information Analysis Signature-independent methodology for detecting infected enterprise hosts Signature-independent methodology for detecting infected enterprise hosts Netfuse system Netfuse system Four components: FIA Engine, DNSMon, Correlation Engine, Clustering Four components: FIA Engine, DNSMon, Correlation Engine, Clustering Correlation metrics: Correlation metrics: Composite Failure Score, Divergence Score, Failure Entropy Score, Persistence Score Useful complement to existing network defenses Useful complement to existing network defenses