Project Statistical Analysis of DNS Abuse in gTLDs (SADAG) Consortium: SIDN and TU Delft Requested by: Competition, Consumer Choice, and Trust Review Team Sidn and tu delft have formed a consortium to perform this study This study has been request by this review team.
Goal Comprehensive statistical comparison of rates of DNS abuse in new and legacy gTLDs Spam Phishing Malware Statistical analysis of potential relationship with abuse drivers DNSSEC Other drivers as identified by future Review Teams Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Motivation New Generic Top-Level Domain (gTLD) Program enabled hundreds of new generic top-level domains Safeguards built into the Program intended to mitigate rates of abusive, malicious, and criminal activity in these new gTLDs the motivation for this study was the fact that the new gtld program added hundreds of new gtlds to the root there are safeguards built into the program to mitigate rates of abuse.
Data Providers Blacklists Anti Phishing Working Group StopBadware Phishing URLs StopBadware Malware URLs SURBL (4 blacklists) Phishing domains SPAM domain Malware domains To perform this study we are using well-known reputable data providers. Apwg uses acredited reporters such as facebook
Data Providers Blacklists Spamhaus CleanMX (3 feeds) Phishing domains Phishing URLs Malware URLs Other To perform this study we are using well-known reputable data providers. Apwg uses acredited reporters such as facebook
Data Providers WHOIS data Whois XML API DomainTools Domain data All new gTLDs Subset of legacy gTLDs DomainTools Providing missing domains Domain data Zone files Per gTLD Per day 3 year period Whois data contains every delegated new gtld and the most important legacy gtlds We suspect the whois provider takes a snapshot of all existing domains at the start of the scanning period and this list of domains does not get updated with New domains during the scanning period We might be missing maliciously registered short-lived domains that get deleted quickly after being registered, we will use DT data to fill the gap
Security metrics Distribution of malicious content: * Number of unique domains E.g. malicious.com * “Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs”, Maciej Korczyński, Samaneh Tajalizadehkhoob, Arman Noroozian, Maarten Wullink, Cristian Hesselman, and Michel van Eeten, in the IEEE European Symposium on Security and Privacy (Euro S&P)
Security metrics Distribution of malicious content: Number of unique domains E.g. malicious.com Number of FQDNs E.g. connect.secure.wellsfargo.malicious.com, bankofamerica.com.malicious.com, (…) * “Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs”, Maciej Korczyński, Samaneh Tajalizadehkhoob, Arman Noroozian, Maarten Wullink, Cristian Hesselman, and Michel van Eeten, in the IEEE European Symposium on Security and Privacy (Euro S&P)
Security metrics Distribution of malicious content: Number of unique domains E.g. malicious.com Number of FQDNs E.g. connect.secure.wellsfargo.malicious.com, bankofamerica.com.malicious.com, (…) Number of URLs E.g. malicious.com/wp-content/file.php, malicious.com/wp-content/gate.php, (…) * “Reputation Metrics Design to Improve Intermediary Incentives for Security of TLDs”, Maciej Korczyński, Samaneh Tajalizadehkhoob, Arman Noroozian, Maarten Wullink, Cristian Hesselman, and Michel van Eeten, in the IEEE European Symposium on Security and Privacy (Euro S&P)
Security metrics for gTLDs Phishing domains, FQDNs, and URLs (APWG) per legacy gTLDs We first present the three occurrence security metrics that provide insight into the distribution of abuse across legacy gTLDs (Figure 7 ) and new gTLDs (Figure 8 ) over time. We aggregate the phishing incidents on a quarterly basis (x-axis) and present the results using a logarithmic scale (y-axis). Note that the observed “decrease” in the amount of abused domains, FQDNs, and URLs (paths) in the fourth quarter of 2015 is caused by the changes in the organization of APWG URL blacklists and not by the decrease in criminal activity. As explained in section III , starting from September 2015, Facebook data, which represented a significant part of URLs, was excluded from the feed. We observe a significant difference between three metrics based on concentration of abused domains, FQDNs, and URLs which were blacklisted by APWG. This is because the second and third one are mainly affected by legitimate services such as file storage web services or popular URL shortening services [30 ]. For example, in our previous work [30 ], we found 44,856 unique *.s3.amazonaws.com FQDNs that correspond to an online file storage web service offered by AmazonWeb Services (AWS), or 377,726 unique t.co/* URLs, where t.co is a popular URL shortener operated by Twitter. The results confirm that the two complementary occurrence metrics (number of FQDN and URLs) are useful and reveal information that is not captured by the number of unique abused domains.
Security metrics for gTLDs Phishing domains, FQDNs, and URLs (APWG) per legacy gTLDs Three measures reflect attackers’ profit-maximizing behavior. They abuse free legal services and affect the reputations of such associated services. We first present the three occurrence security metrics that provide insight into the distribution of abuse across legacy gTLDs (Figure 7 ) and new gTLDs (Figure 8 ) over time. We aggregate the phishing incidents on a quarterly basis (x-axis) and present the results using a logarithmic scale (y-axis). Note that the observed “decrease” in the amount of abused domains, FQDNs, and URLs (paths) in the fourth quarter of 2015 is caused by the changes in the organization of APWG URL blacklists and not by the decrease in criminal activity. As explained in section III , starting from September 2015, Facebook data, which represented a significant part of URLs, was excluded from the feed. We observe a significant difference between three metrics based on concentration of abused domains, FQDNs, and URLs which were blacklisted by APWG. This is because the second and third one are mainly affected by legitimate services such as file storage web services or popular URL shortening services [30 ]. For example, in our previous work [30 ], we found 44,856 unique *.s3.amazonaws.com FQDNs that correspond to an online file storage web service offered by AmazonWeb Services (AWS), or 377,726 unique t.co/* URLs, where t.co is a popular URL shortener operated by Twitter. The results confirm that the two complementary occurrence metrics (number of FQDN and URLs) are useful and reveal information that is not captured by the number of unique abused domains.
Security metrics for gTLDs Phishing domains (APWG) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Phishing domains (CleanMX ph) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Phishing domains (SURBL ph) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Malware domains (SURBL mw) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Malware domains (CleanMX mw) per new and legacy gTLDs While the number of abused domains remains approximately constant in legacy gTLDs, we observe a clear upward trend in the absolute number of phishing and malware domains in new gTLDs. Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Spam domains (Spamhaus) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Spam domains (SURBL ws) per new and legacy gTLDs The absolute number of spam domains in new gTLDs higher than in legacy gTLDs at the end of 2016 Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Security metrics for gTLDs Size matters! Phishing domains (APWG) per new and legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Size Size estimate: Number of 2nd–level domains in each gTLD zone file Size of a TLD can be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size can serve as a proxy for the “popularity” of the TLD. What makes it popular
Size Size estimate: Number of 2nd–level domains in each gTLD zone file Rates: (#blacklisted domains / #all domains) * 10,000 Size of a TLD can be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size can serve as a proxy for the “popularity” of the TLD. What makes it popular
Abuse rates Time series of abuse rates of phishing domains in legacy gTLDs and new gTLDs based on the APWG feed Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Abuse rates Time series of abuse rates of phishing domains in legacy gTLDs and new gTLDs based on the APWG feed Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Abuse rates Time series of abuse rates of phishing domains in legacy gTLDs and new gTLDs based on the APWG feed .com (82.5%), .net, .org, .info, and .biz legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Abuse rates Time series of abuse rates of phishing domains in legacy gTLDs and new gTLDs based on the APWG feed .com (82.5%), .net, .org, .info, and .biz legacy gTLDs Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Abuse rates Time series of abuse rates of phishing domains in legacy gTLDs and new gTLDs based on the APWG feed .com (82.5%), .net, .org, .info, and .biz legacy gTLDs Ather datasets confirm Top 5 most abused new gTLDs collectively owned 58.7% of all blacklisted domains in all new gTLDs
Abuse rates Time series of abuse rates of malware domains in legacy gTLDs and new gTLDs based on the StopBadware feed Generally for malware they are close
Abuse rates Time series of abuse rates of spam domains in legacy gTLDs and new gTLDs based on the Spamhaus feed Count and time-based gTLD-level security metrics Comprehensive descriptive statistical comparison of rates of DNS Abuse in new and legacy gTLDs Inferential statistical analyses testing driving factors of rates of abuse (e.g. DNSSEC deployment rate)
Compromised and maliciously registered domains Distinguishing between compromised and maliciously registered domains is critical because they require different mitigation actions by different intermediaries Assumption: maliciously registered domains are involved in a criminal activity within a short time after the registration. Other heuristics: if a given domain name contains a string of a brand name or its misspelled version indicating malicious registration, URLs indicating compromised content management systems, etc. Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Compromised and maliciously registered domains Distinguishing between compromised and maliciously registered domains is critical because they require different mitigation actions by different intermediaries Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Compromised domains Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Compromised domains Definitions: Rates of abused domains in legacy gTLDs (StopBadware URL blacklists) are driven by compromised domains Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Maliciously registered domains Rates of abused domains in new gTLDs (StopBadware URL blacklist) are driven by maliciously registered domains Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Maliciously registered domains Rates of abused domains in new gTLDs (StopBadware URL blacklist) are driven by maliciously registered domains …and can be driven by single campaigns (domains registered in bulk, common patterns in domain names) Definitions: Maliciously registered domain – domain registered by a miscreant for malicious purposes Compromised domain – domain registered by a legitimate user and hacked by a miscreant Third party domains – legitimate services that tend to be misused by miscreants (e.g. file sharing services, blog post services, URL shortening services) For compromised domains, the TLD size could be interpreted as the “attack surface” size for cybercriminals. For malicious registrations, the TLD size could serve as a proxy for the “popularity” of the TLD. What makes it popular? Limitation: (lack of the) WHOIS data, maliciously registered domains involved in a criminal activity within a longer time after the registration, or delayed blacklisting Solution: more advanced machine learning approach (requires more “features” and the “ground truth” data)
Privacy or Proxy Services Why use PP services Protecting your personal data Blocking Spam Stopping unwanted solicitations Analyzing use of PPs’es Extract list of registrants keyword search using “privacy”, “proxy”, “protect” etc Manual inspection How many? We found 570 570 PPs Are combinations of registrant name and organization
Privacy or Proxy Services Image source: https://www.name.com/whois-privacy
Privacy or Proxy Services Usage for newly created domains per month Legacy 24% New 19% with high std dev
Privacy or Proxy Services StopBadware Example for sbw, show increased use of pps
Privacy or Proxy Services Spamhaus
Geographical Location Using domain registrar location from WHOIS Registrant details not reliable Method Extract unique "registrar name" from WHOIS data. Combine the registrar name with the country information for ICANN-Accredited Registrars. Match remaining name variants Manually lookup the country information for missing registrars Result 5,985 registrars 99.99% of domains 1) Extract every unique "registrar name" attribute from the WHOIS data. 2) Using an automated process combine the extracted "registrar name" attribute with the country information for ICANN-Accredited Registrars, available from the ICANN website [42]. 3) Manuallymatchremainingnamevariants(theautomated process is not able to match every registrar name variant to a country) to their corresponding countries. 4) Manually lookup the country information for registrars that could not be found automatically (not every regis- trar is accredited by ICANN) using publicly available information from the corporate website of the registrar or domain industry websites [43].
Geographical Location WHOIS registrar distribution
Geographical Location Country distribution
Geographical Location SURBL Surbl legacy vs new gtld Can see that for abuse location there is diff between legacy and new gtld USA low percentage of new gtld abuse, gibraltar a lot, because of one registar alpnames.
Registrar Reputation Method Filter out registrars designed for sinkholing domains. Count number of incidents per registrar. Calculate percentage of total abuse linked to registrar. Note, sinkholing of confiscated abusive domains or pre- ventive registration of botnet C&C infrastructure domains is a common practice and special registrars have been created for this purpose e.g. "Afilias Special Projects" or "Verisign Security and Stability"
Registrar Reputation SURBL Alpnames example Show that after Registrar Accreditation Agreement (RAA). Termination the abuse goes down.
Registrar Reputation Nanjing Imperiosus Technology Co. Ltd. Zoom in on: Nanjing Imperiosus Technology Co. Ltd. Show that after Registrar Accreditation Agreement (RAA). Termination the abuse goes down. Two big spikes are for .top and .science Registered over time.
Schedule Final report available July 2017 Incorporate WHOIS data information from Domain Tools Inferential analysis of potential relationship with abuse drivers (Regression analysis of abuse in gTLDs)
Questions?