A Lustrum of Malware Network Communication: Evolution and Insights

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Dynamics of Online Scam Hosting Infrastructure
1 Network-Level Spam Detection Nick Feamster Georgia Tech.
Spam Sinkholing Nick Feamster. Introduction Goal: Identify bots (and botnets) by observing second-order effects –Observe application behavior thats likely.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
SESSION ID: #RSAC Chaz Lever Characterizing Malicious Traffic on Cellular Networks A Retrospective MBS-W01 Researcher Damballa,
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Understanding the Network-Level Behavior of Spammers Anirudh Ramachandran Nick Feamster.
Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Team Excel What is SPAM ?. Spam Offense Team Excel '‘a distinctive chopped pork shoulder and ham mixture'' Image Source:Appscout.com.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
11 The Ghost In The Browser Analysis of Web-based Malware Reporter: 林佳宜 Advisor: Chun-Ying Huang /3/29.
MSIT 458 – The Chinchillas. Offense Overview Botnet taxonomies need to be updated constantly in order to remain “complete” and are only as good as their.
DroidKungFu and AnserverBot
GONE PHISHING ECE 4112 Final Lab Project Group #19 Enid Brown & Linda Larmore.
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University.
APT29 HAMMERTOSS Jayakrishnan M.
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Time lag between discovering issue and resolving Difficult to find solutions and patches that can help resolve issue Service outages expensive and.
Panel Introduction: Life After Antivirus – What Does the Future Hold? Martin Fréchette Sr. Principal Engineer Symantec Research Labs – Advanced Concepts.
AccessMiner Using System- Centric Models for Malware Protection Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu and Engin Kirda.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Cross-Analysis of Botnet Victims: New Insights and Implication Seungwon Shin, Raymond Lin, Guofei Gu Presented by Bert Huang.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:
Exploiting Temporal Persistence to Detect Covert Botnet Channels Authors: Frederic Giroire, Jaideep Chandrashekar, Nina Taft… RAID 2009 Reporter: Jing.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
How dynamic are IP addresses? Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, Ted Wobber SIGCOMM ‘07 Chulhyun Park
©2014 Check Point Software Technologies Ltd Security Report “Critical Security Trends and What You Need to Know Today” Nick Hampson Security Engineering.
Unit 7: DHCP, APIPA and NTP. Static versus dynamic IP addressing Dynamic IP addresses can change each time you connect to the Internet, while static IP.
DISA Cyclops Program.
Chapter Overview Understanding Windows Name Resolution Using WINS.
Understand Names Resolution
Chapter 10: Web Basics.
A lustrum of malware network communication: Evolution & insights
IMPLEMENTING NAME RESOLUTION USING DNS
Practical Censorship Evasion Leveraging Content Delivery Networks
Practical Censorship Evasion Leveraging Content Delivery Networks
Conquering all phases of the attack lifecycle
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
De-anonymizing the Internet Using Unreliable IDs
SECURITY INFORMATION AND EVENT MANAGEMENT
Net 323 D: Networks Protocols
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.
Unknown Malware Detection Using Network Traffic Classification
Proactive Network Protection Through DNS
Intro to Ethical Hacking
Intro to Ethical Hacking
11/17/2018 9:32 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Design open relay based DNS blacklist system
Spam Fighting at CERN 12 January 2019 Emmanuel Ormancey.
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
AbbottLink™ - IP Address Overview
The Domain Abuse Activity Reporting System (DAAR)
Unconstrained Endpoint Profiling (Googling the Internet)‏
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id:
Presentation transcript:

A Lustrum of Malware Network Communication: Evolution and Insights CAMPBELL FOSKIN

Background Malware analysis critical to countering internet threats Static and dynamic analysis systems used produce detailed reports Provides reputation information on IP and DNS infrastructure Little known about how infrastructure and methods has evolved over time. - Malware analysis is at the forefront of the fight against Internet threats. - Over the last decade, many systems have been proposed to statically and dynamically analyze malicious software and produce detailed behavioral reports  -The vast amounts of data collected provide important reputation information about both IP and domain name system (DNS) infrastructure, which play an important role in the state-of-the-art detection engines used by the security industry.  - focusing on topics like the role of cloud providers, the infrastructure behind drive-by downloads, or the domains used by few malware families. But dispite lots of data little is known about infrastructure and methods used by Internet miscreants

Overview of study Conducted five year, longitudinal study of dynamic analysis traces Analyzed more than 26.8 million unique malware execution samples + five billion DNS queries Largest ever malware classification effort Classify malware samples into families Differentiating PUPs Correlate domains with malware families. - collected from multiple (i.e., two commercial and one academic) malware feeds – provided a UNIQUE look at malware networks - First study comparing the network properties of PUP and malware domains

Data collection Malware executions <= 5mins Passive DNS from large ISP provider Timestamped blacklists DGArchive – domains from reversed engineered malware Malware samples from 3 datasets Malware samples exclude those with no valid DNS resolution during their execution  - no more than 5 min execution  Virustotal - analyzes files and URLs submitted by users - scanned with multiple AV engines- API to query meta-data on malware samples using a sample’s hash DGArchive - dataset of domains from malware that use a DGA

Domain filtering Removed samples not flagged as malicious Malware does not interact with exclusively malicious infrastructure Invalid Domains Benign Domains Reverse Delegation Zones: DNS Pointer Records  Most DNS queried by malware are benign (95%)  1) Not all submitted things malicious 2) not all network activity is to malicious infrastructure remove those not fledged by AV vendors from virus total dataset - Invalid domains – non existent domains generated by DGAs  (domain generation algorithms) - Benign Domains – the hardest to do, caused by use of legitimate domains like dropbox, testing internet connection, downloading, email spam...... - remove domains in Alexa top 10,000 popular domains (except dynamic DNS domains), then manual sifted through most popular domains in dataset (mostly removing CDNs)- then remove spam bots filtered out any  MX lookups, or domains with mail keywords (mail, imap etc) RDZ - when a program directly connects to an IP address without performing a DNS resolution of a service’s domain name Removed begin PTRs excluding zones used by large ISPs and hosting providers - Most DNS benign – has implications on blacklist approach based of dynamic analysis + takes a lots of manual work

Classification of domains Sample classification - AVClass PUP/Malware family classification e2LD classification 2LD -> family that had the most samples resolve to it AVClass -open-source tool for massive malware labeling -  successfully removes noise from AV labels by addressing label normalization, generic token detection, and alias detection WHAT IS PUP AVClass – differentiate between PUP and malware by examining keywords HOWEVER – classifcation is conversative e2LD classification - mapping from e2LD to the most likely family the e2LD belongs to create a mapping from e2LD to the number of samples of each family that have resolved that e2LD assign each e2LD to the family with most samples resolving it e2LDs with less than 10 samples resolving them are left unclassified.

MALWARE DOMAIN ANALYSIS Malware use network communications to exfiltrate data, communicate with C&C servers, or download payloads. Malware often uses DNS to avoid IP blacklisting Studied domain queries to determine the temporal DNS properties of malware samples Dynamic Malware Analysis Passive DNS and Blacklists Analysis  Dynamic malware analysis to find trends and identify DNS queries used by malware. Passive DNS and Blacklists Analysis to determine if DNS queries are malicous by comparing with other datasets

Dynamic Malware Analysis: Domain polymorphism Most malware uses different domains over time to avoid DNS blacklisting Most domains are used only once by a  single malware sample Relying on DNS queries from analyzed malware not a viable threat mitigation Domain Polymorphism: Most malware uses different domains over time to avoid DNS blacklisting most domains are used only once by a single malware sample -> blacklisting malware domains observed during dynamic analysis does little to prevent future communication from newly discovered malware samples

Dynamic Malware Analysis: Dynamic DNS Allows nameservers to be automatically updated with frequently changing information. Many publicly available services provide this functionality Zone level blocking also blocks legitimate users Used by 32% of all malware samples with DNS queries dynamic DNS domains are commonly used across many malware samples and evasion is performed on the child label of the domain.

Dynamic Malware Analysis: CDNs Provide increased performance and availability Malicious content hosted in a CDN can hide in plain site. serve content from multiple, geographically distributed, data centers provide increased performance and availability WIDELY USED ON INTERNET – reduce effect of outages, better performance etc Massive discrepancy between most and least quried – top ones are some of the most common and popular CDNs The large number of child labels combined with potentially benign usage allows malicious content hosted in a CDN to effectively hide in plain site.

Passive DNS and Blacklists Analysis Correlated domains gathered from dynamic analysis with three datasets Passive DNS dataset from US ISP Public DNS based blacklists Set of domain expiration events Determined lag between when a domain is discovered, and when it is first resolved in passive DNS. Determine effectiveness of public blacklists

First appearance Aimed to determine effectiveness of public blacklists Only 30% of the entries already in blacklists 20% were reported with a delay of over 500 days Delays could be reduced by relying on malware analysis to populate domains blacklists Things added to blacklists by manual inspection or dedicated servers Only 30% of the entries already in blacklists BEFORE OUR DYNAMIC ANALYSIS THIS SUGGESTS… Delays could be reduced by relying on malware analysis LIKE THIS to populate domains blacklists But even then it would miss some, they would not show up for weeks – many were actives for weeks before being analyzed 95% - so still a lot of manual work

Domain lifetime All three types of domains frequently have long domain lifetimes Many samples remain active on the Internet for extended periods of time Lifetime: Difference between the first and last seen dates for each of these domains in passive DNS three hotspots that correspond to the most prevalent resolution behaviors for domains in malware and unclassified malicious software BOTTOM LEFT: large number of domains that are short lived and rarely resolved TOP RIGHT: long lived and frequently resolved BOTTOM RIGHT: long lifetime but infrequent resolution SO UNCLASSIFIED DOMAINS ARE LIKELY ALL MALWARE PUPs rising in prevalence over last 2-3 years – so its all stuck at the lower end of the lifetime axis Diagonal prevalence – intense and continuing resolution of PUP domains = Result of organizations failing to block PUP domains Since we showed most DNS resolutions occur only once, this means they remain active a LONG LONG time on the internet

INFRASTRUCTURE ANALYSIS Analyzed the hosting infrastructure of domains used by malware Focus on IP ranges over time Samples with domains resolving on a subnet for different years Assigned spikes to families, using mapping of e2LDs Looked at the IP subnets resolved by samples over the 5 years. Used the mappings created during classification to map spikes to families of malware/PUP

PUP families used stable infrastructure over time – indicates that popular cloud hosters do not ban them the same as malware/more lenient Sinkholes – example Microsoft led initiative on domains using no-ip DNS Third group of spikes caused multiple, rather than a single, malware family – likely correspond to hosting providers that had a more open policy on acceptable behavior during that timeframe – so we see them move around different subnets each year

Domain generation algorithms Reintroduce failed resolutions into the dataset Check against DGArchive 44% (3 M) of e2LDs in malware executions were generated by DGAs - Lower bound since DGArchive will miss some families – its

Criticisms and improvements US-centric Long running/dormant malware not analyzed Static analysis Prototype a general-use tool that uses this technique Benign network communications – may have patterns of behavior -> ML classification US-centric, used US ISP data - although hard to avoid it would be good to see if there are differences or different trends different parts of the world Long running dormant might have different behavior Static analysis could help where DGA is not used – could help catch long running/dormant ones Benign comms could have pattern of behavior + what are they downloaded, how often, at what times, is it regular intervals etc ML classification could assist in analysis, and classification of families and different behaviours - train a model on known malware behavior - Example many unclassified samples show similar behavior to malware, as opposed to PUPs – this could be confirmed with ML to a % probability

Conclusions Collected, filtered and analyzed 26.8 million records from dynamic malware executions Made several observations about the behavior and temporal properties of malware domains used by these samples PUPs are becoming more common, and use stable infrastructure Malware detection based off network communications marginally effective several hundred thousands PUP samples use the same network infrastructure over an entire year. – and not treated the same as most malware Not detected until weeks, or even months after it becomes active. Most are benign domains LOTS of manual work