Download presentation
Presentation is loading. Please wait.
1
Spamscatter 1 Aug. 9 th, 2007Usenix Security 2007 Spamscatter: David S. Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego Characterizing Internet Scam Hosting Infrastructure Introduction
2
Spamscatter 2 Aug. 9 th, 2007Usenix Security 2007 Motivation 70 billion spam messages are sent everyday for a simple reason, advertising websites. A scam then is any website marketed using spam This online resource is directly implicated in the spam profit cycle, meaning it is rarer and more valuable Characterizing the scam infrastructure helps – Reveal the dynamics and business pressures exerted on spammers – Identify means to reduce unwanted sites and spam Introduction
3
Spamscatter 3 Aug. 9 th, 2007Usenix Security 2007 Spamscatter Approach Mine a large quantity of spam – Extract URLs – Probe machines hosting the scams This works because URLs must be correct – Follow the scent of money… All we need is a reliably large source of spam – We have access to a four letter, top level domain producing 150K spam per day Introduction
4
Spamscatter 4 Aug. 9 th, 2007Usenix Security 2007 Understanding scams Are scams distributed across different servers? Do different scams share the same server? How long do scams stay active? How reliable is their hosting? Where are scam servers located? Why is it useful to study these characteristics? Introduction
5
Spamscatter 5 Aug. 9 th, 2007Usenix Security 2007 Spamscatter and the Scam Methodology
6
Spamscatter 6 Aug. 9 th, 2007Usenix Security 2007 Methodology Data collection – Extract links from large spam feed – Probe links every 3 hours for 7 days – Record browser redirection – Save screenshots Analysis – Identify scams across servers and domains – Report on distributed and shared infrastructure, lifetime, stability, and location Methodology
7
Spamscatter 7 Aug. 9 th, 2007Usenix Security 2007 Identifying Scams Goal: Identify multiple hosts in the same scam, since many scams are spread across different IPs and domain names Naïve Approaches: 1. Correlate independent spam emails 2. Use HTML content returned from the webserver Limitations: Spam has too much chaff and obfuscation HTML is uninteresting and mostly composed of images. Web crawlers fail with frames, iframes and JavaScript Methodology
8
Spamscatter 8 Aug. 9 th, 2007Usenix Security 2007 Image Shingling Solution: Use rendered screenshots of web pages for correlation. – How to compare upwards of 10,000 images? Image shingling – based on text shingling idea [BRO97] – Fragment images into blocks and hash the blocks – Two images are similar if T% of the hashed blocks are the same (T=70-80%) – Shingling allows us to essentially compare all images in O(N lg N) – Resilient to small variations among images Methodology
9
Spamscatter 9 Aug. 9 th, 2007Usenix Security 2007 An Example Scam: “Downloadable Software” Scam Perspective 99 observed virtual hosts 3 IP addresses Operated for months 85 senders No forwarding used 5535 probes (97% successful) An Example Scam
10
Spamscatter 10 Aug. 9 th, 2007Usenix Security 2007 Clustering with Image Shingling Images differ slightly Some pages rotate content An Example Scam
11
Spamscatter 11 Aug. 9 th, 2007Usenix Security 2007 Location 2 Web servers in China; 1 Webserver in Russia 85 senders from 30 countries (28 from US) Blue – Web servers hosting Downloadable Software Red – Spam Relays – Hosts that sent us spam An Example Scam
12
Spamscatter 12 Aug. 9 th, 2007Usenix Security 2007 Shared Infrastructure One of the IPs (221.4.246.3) hosting “Downloadable Software” was also hosting “Toronto Pharmacy” Server located in Guangzhou, China An Example Scam
13
Spamscatter 13 Aug. 9 th, 2007Usenix Security 2007 Summary Statistics 1,087,711 319,700 36,390 7,029 Spam messages 30% contain links 11.3% are distinct links 19.3% resolve to unique IP addresses 1 week of spam collection – Nov. 28 th – Dec. 4 th 2 weeks of probing – Nov. 28 th – Dec. 11 th 2,334 33.2% resolve to distinct scams Results
14
Spamscatter 14 Aug. 9 th, 2007Usenix Security 2007 Distributed Infrastructure To what extent is the infrastructure distributed for scams? Most scams are not distributed: – 94% used one IP Top three distributed scams were extensive – 22, 30, and 45 IPs Top three virtual- hosted scams – 110, 695, and 3029 domain names Results - Infrastructure
15
Spamscatter 15 Aug. 9 th, 2007Usenix Security 2007 Shared Infrastructure To what extent do multiple scams share infrastructure? 38% of scams hosted on a machine with at least one other scam 10 IPs hosted 10 or more scams Top three shared IPs – 15, 18, and 22 scams Results - Infrastructure
16
Spamscatter 16 Aug. 9 th, 2007Usenix Security 2007 Scam Lifetime & Stability How long are scams active, and how reliable are the hosts? Scam webhosts seem to be taken down shortly after scams disappear Overall scam lifetime approached two weeks Reliability is high > 97% usually Results - Lifetime
17
Spamscatter 17 Aug. 9 th, 2007Usenix Security 2007 Spam campaign lifetime How long do spam campaigns last for a scam? 137 spams messages per scam (Avg) Most spam campaigns relatively short – 88% last 20 hours or less Only 8% last more than 2 days Scam lifetimes considerably longer – on average one week Results - Lifetime < 20 hour < 2 days
18
Spamscatter 18 Aug. 9 th, 2007Usenix Security 2007 Location Where are scam hosting servers located? Blue – Web servers Red – Spam Relays Results - Location
19
Spamscatter 19 Aug. 9 th, 2007Usenix Security 2007 Location Web Servers Country Count Percent 1. usa5884 [57.40%] 2. chn741 [7.23%] 3. can379 [3.70%] 4. gbr315 [3.07%] 5. fra314 [3.06%] 6. deu258 [2.52%] 7. rus185[1.80%] 8. kor181 [1.77%] Spam Relays Country CountPercent 1. usa54159 [14.50%] 2. fra26371 [7.06%] 3. esp25196[6.75%] 4. chn24833[6.65%] 5. pol21199 [5.68%] 6. ind20235 [5.42%] 7. deu18678 [5.00%] 8. kor17446 [4.67%] Results - Location
20
Spamscatter 20 Aug. 9 th, 2007Usenix Security 2007 Scam Categorization Scam category % of scams Uncategorized………………………………. 29.57% Information Technology………………… 16.67% Dynamic Content …………………………. 11.52% Business and Economy …………………. 6.23% Shopping ……………………………………… 4.30% Financial Data and Services ………….. 3.61% Illegal or Questionable …………………. 2.15% Adult ……………………………………………. 1.80% Message Boards and Clubs …………… 1.80% Web Hosting ………………………………… 1.63% Results - Categorization
21
Spamscatter 21 Aug. 9 th, 2007Usenix Security 2007 Lifetime of scams with Categorization More than 40% of malicious scams disappear before 120 hours Same is true for less than 15% of all scams Results - Categorization
22
Spamscatter 22 Aug. 9 th, 2007Usenix Security 2007 Summary Started with over 1m spam messages and coalesced to fewer than 2,500 scams. Image shingling allowed us to scalably determine if two sites were part of the same scam Most scams use one web server (vulnerable to blacklisting) – Scams may use many virtual domains that point to one IP Most scams not malicious per se Scam infrastructure more stable, longer lived, concentrated in US, compared with spam senders Conclusion
23
Spamscatter 23 Aug. 9 th, 2007Usenix Security 2007 Spammers beware; These boffins are on the prowl Questions and Answers Conclusion
24
Spamscatter 24 Aug. 9 th, 2007Usenix Security 2007 Spamscope Visibility Collected spam from news.admin.net- abuse.sightings – a newsgroup for contributing spam For a 3 day period, we saw – 6,977 spam from the newsgroup 205 scams – 113,216 spam from our feed 1,687 12% of the newsgroup scams were in ours The “largest” scams (most emails and most domains/IP) were seen in both feeds Supplementary Information
25
Spamscatter 25 Aug. 9 th, 2007Usenix Security 2007 Blacklists Host type Classification % of hosts Spam relay Open proxy 72.27% Spam host 5.86% Scam host Open proxy 2.06% Spam host 14.86% 9.7% of the scam hosts also sent us spam Results - Blacklisting
26
Spamscatter 26 Aug. 9 th, 2007Usenix Security 2007 Web Server OS 1Linux recent 2.4 (1)11.97% 2Windows 2000 (SP1+) 11.05% 3Akamai ???10.86% 4Windows 2000 SP48.25% 5Linux recent 2.4 (2)7.84% 6FreeBSD 4.6-4.8 7.72% 7Slashdot or BusinessWeek 7.04% 8FreeBSD 5.06.49% 9Windows XP SP15.90% 10Linux older 2.45.56% Supplementary Information
27
Spamscatter 27 Aug. 9 th, 2007Usenix Security 2007 URL Classification WISP Dynamic Content 17.931% WISP Uncategorized 13.965% WISP Illegal or Questionable 10.306% WISP Information Technology 9.051% WISP Shopping 4.872% WISP Business and Economy 4.733% WISP Financial Data and Services 4.626% WISP Personals and Dating 1.867% WISP Advertisements 1.249% WISP Educational Institutions 1.247% WISP Pay-to-Surf 1.022% WISP Search Engines and Portals 0.884% WISP Supplements and Unregulated Compounds 0.865% WISP Sex 0.862% Supplementary Information
28
Spamscatter 28 Aug. 9 th, 2007Usenix Security 2007 Image Clustering 2,541,486 250,864 9572 2334 Total probes 9.8% of probes result in a captured image 3.8% of screenshots are the 'first' screenshot for a scam Clusters detected by image shingling 1 week of spam collection – Nov. 28 th – Dec. 4 th 2 weeks of probing – Nov. 28 th – Dec. 11 th Supplementary Information
29
Spamscatter 29 Aug. 9 th, 2007Usenix Security 2007 Image Shingling For a typical day of screenshots, we tested various thresholds A 70% threshold provided a good mixture between flexibility and accuracy Supplementary Information
30
Spamscatter 30 Aug. 9 th, 2007Usenix Security 2007 Overlap of pairs of scams on the same server For scams running on the same server, how much time do they overlap? 96% of all scam pairs overlapped with each other when they remained active Only 10% of scams fully overlapped each other One week Supplementary Information
31
Spamscatter 31 Aug. 9 th, 2007Usenix Security 2007 IP ranges What are the network locations of scams and spam relays? The cumulative distribution of IP addresses is highly non- uniform Majority of spam relays (60%) fall between 58.* -> 91.* Most scams (50%) fall between 64.* -> 72.* Supplementary Information
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.