Clustering Spam MIT Spam Conference 2008 Phil Tom.

Slides:



Advertisements
Similar presentations
Internet – Part I. What is Internet? Internet is a global computer network of inter-connected networks.
Advertisements

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Track-able Bulk Management System. Agenda: Why TBMS? Track-able Bulk Management System (TBMS) TBMS Flow Benefits.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Twelfth Meeting of the.
Event-Driven Programming Vivek Pai Dec 4, GedankenEvents  Enron – bad news all around  What are these numbers: $1B, $100B, $80B, $200M  Korean.
Cordell Golden United States Tenth Meeting of the Washington Group on Disability Statistics 3-5 November 2010 Luxembourg Summary of Annual Activities Related.
Logically Centralized, Physically Distributed Mark Stuart Day Cisco Systems.
Understanding the Network-Level Behavior of Spammers Mike Delahunty Bryan Lutz Kimberly Peng Kevin Kazmierski John Thykattil By Anirudh Ramachandran and.
CyLab Usable Privacy and Security Laboratory C yLab U sable P rivacy and S ecurity Laboratory Statistical.
Phishing, Pharming, and Spam Margaret StewartTuesday, Oct. 21, 2006.
SAP Student Interest Group
CHAPTER 2.4 Gross Domestic Product. Most economists view the country like one big business They measure the success of this business through gross domestic.
By: Brittany Sands, Taylor Gardner, Thomas Juran, Chad Mack, Matteo
Using the Engaging Networks tools Ghazal Vaghedi Toronto February 21, 2012 #12ENCONF.
October 16, Community Conference Broadcast tool Marta Fornal de Seixas: Engaging Networks.
Broadcast service Core tools. Agenda 1.Introduction – tool and its main features 2.Setting up and sending a simple broadcast 3.Achieving.
Introduction to the Secure SMTP Server service. Secure SMTP server is a secure, reliable SMTP mail relay server for your outgoing mail. Secure SMTP service.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
Spam Sonia Jahid University of Illinois Fall 2007.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Fourteenth Meeting of.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Thirteenth Meeting of.
Norman SecureTide Powerful cloud solution to stop spam and threats before it reaches your network.
Visit for Marketing and Deliverability Tips, Tools, & Trainingwww. Delivered.com.
Why Your Business Assistant? Pave YOUR Road to Success.
CensorNet Ltd An introduction to CensorNet Mailsafe Presented by: XXXXXXXX Product Manager Tel: XXXXXXXXXXXXX.
Computer Science and Engineering Computer System Security CSE 5339/7339 Session 24 November 11, 2004.
1 Internet Security Threat Report X Internet Security Threat Report VI Figure 1.Distribution Of Attacks Targeting Web Browsers.
Comment Spam Identification Eric Cheng & Eric Steinlauf.
The Human Factors Components of a Safety Management System: The US Perspective Dr. William B. Johnson Chief Scientific & Technical Advisor for Human Factors.
Product news and Updates Future Roadmap Paul Greaves Sales Director.
1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,
1 Announcing … Global broadband subscribers to 30 June 2005 Total: 176 million 115 million * 65% * choose DSL.
Using Google to drive library usage Shaun Hobbs, Global Director, Content Development David Smith, Business Innovations Manager.
Implicit group messaging in peer-to-peer networks Daniel Cutting, 28th April 2006 Advanced Networks Research Group.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Understanding the Network-Level Behavior of Spammers Author: Anirudh Ramachandran, Nick Feamster SIGCOMM ’ 06, September 11-16, 2006, Pisa, Italy Presenter:
Government portals and Directgov March 2007
1 Information Systems 2/26/03 Tom Coppeto Mark Silis MIT Mail System Update 26 February 2003.
Computer Science and Engineering Computer System Security CSE 5339/7339 Session 23 November 9, 2004.
Outlook for the EU Leather Industry Paul Pearson Director UK Leather Federation Secretary ICT.
HEPTech Reaching Out Ian
Hello Around the World ! Europe France Everyone in France speaks French. Bonjour (BONE-zhure) Example Slide.
COUNTRIES AND NATIONALITIES (COUNTRY – NATIONALITY)
A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi
The World’s Top 10 Economies (Percent of World GDP) Sources: World Bank/Angus Maddison, “The World Economy: Historical Statistics” (OECD)/ PricewaterhouseCoopers/Milken.
Updated Branding Standards May 15, Logo Mark © The Advantage Group International, Inc. Filed to trademark the entire logo as one entity (not separated)
TAG YOU’RE IT: ENHANCING ACCESS TO GRAPHIC NOVELS WENDY WEST
Spoofing The False Digital Identity. What is Spoofing?  Spoofing is the action of making something look like something that it is not in order to gain.
Bed Linen Markets in the World to 2017 Bharat Book Bureau.
Spam By Dan Sterrett. Overview ► What is spam? ► Why it’s a problem ► The source of spam ► How spammers get your address ► Preventing Spam ► Possible.
Chief Accounting Officers Database List A chief accounting officer or a CAO plays a vital role in the organization as he/she is responsible for.
Why has Nokia Chosen China for some of its factories?
APPLE OR ORANGE? Which Fruit should I choose to eat?
Coverage area reaches more than 100 million people* Nationwide coverage (2G, 3G & 4G LTE) THE NETWORK *Source: Competitive Inteligence Unit (CIU)
Publish Date : November 2016 No. of Pages : 205 Geography Coverage : Seven Countries Price for Single User Licence : USD 2000 Price.
A First Look at the 2015 Program for International Student Assessment Financial Literacy Results Peggy G. Carr, Ph.D. Acting Commissioner Institute of.
Publish Date : April 2017 No. of Pages : 200 Geography Coverage : 10 Countries Price for Single User Licence : USD 1690 Price for.
Using Lenel Data To Identify Compromised University Network IDs
The Most Visited Countries
CSC 102 Lecture 9.
De-anonymizing the Internet Using Unreliable IDs
ROMS Cyber Infrastructure
نمونه گيري و انواع آن تدوین کننده : ملیکه سادات ابراهیمی
CLOSING SESSION Marco Carcassi.
Do humans beat computers at pattern recognition? Andra Miloiu Costina
Savings Presentation.
Powerpoint Quiz Write down the name of the country that these images relate to. There are 18 images. Good Luck!
GNP and per capita GNP Top of the world!?.
2006 Rank Adjusted for Purchasing Power
Presentation transcript:

Clustering Spam MIT Spam Conference 2008 Phil Tom

Simple Clustering Algorithm Expand clusters to include similar messages: 1.Identical originating IP addresses. 2.Identical subject lines. 3.Identical message bodies. for each cluster in clusters expand cluster for each message in unclustered messages create a new cluster add message to cluster expand cluster Clustering pseudocode

Dimensional Model

update sdbf_message set cluster_id = ? where (cluster_id <> ? or cluster_id is null) and sender_ip_id in (select sender_ip_id from sdbf_message where cluster_id = ?) Expand Cluster By IP

update sdbf_message m set cluster_id = ? from sdbd_body b where (m.cluster_id <> ? or m.cluster_id is null) and m.body_id in (select body_id from sdbf_message where cluster_id = ?) and m.body_id = b.body_id and b.size_in_bytes > 25 Expand Cluster By Body

update sdbf_message m set cluster_id = ? from sdbd_subject s where (m.cluster_id <> ? or m.cluster_id is null) and m.subject_id in (select subject_id from sdbf_message where cluster_id = ?) and m.subject_id = s.subject_id and (s.word_count > 1 or length(s.subject) > 10) Expand Cluster By Subject

Test Data Set Dec 22, Dec 29, 2007 Single “Received:” header tag only No multi-part messages 1.7 million messages Roughly 20%

Cluster Results

Messages per Cluster Size *Not including the big cluster

Top Clusters by IPs cluster_id | messages | subject | bodies | ips | networks | countries | | | | | 8940 | | | 451 | | 1313 | 57 | 2 59 | | 19 | 15 | 962 | 4 | 1 68 | 1065 | 2 | 1065 | 609 | 12 | 4 69 | 4476 | 59 | 85 | 514 | 17 | | 5521 | 5 | 9 | 283 | 4 | | 722 | 149 | 333 | 275 | 16 | | 307 | 2 | 306 | 208 | 179 | | 240 | 7 | 9 | 184 | 4 | | 5581 | 15 | 5212 | 153 | 119 | | 2934 | 20 | 2934 | 150 | 1 | | 377 | 22 | 377 | 125 | 3 | | 307 | 4 | 3 | 124 | 5 | | 3399 | 48 | 169 | 114 | 17 | | 156 | 4 | 155 | 105 | 96 | | 1117 | 174 | 1100 | 101 | 4 | 1

The Big One messages | subject | bodies | ips | networks | countries | | | | 8940 | 177 messages | subjects | bodies | ips | networks | country_name | | | | 1453 | United States | 5110 | | | 170 | Germany | 6558 | | | 147 | Spain | 4705 | | | 48 | Turkey | 4624 | | | 209 | United Kingdom | 3194 | | | 42 | Peru | 2848 | | | 148 | Columbia | 3059 | | | 152 | Chile | 5063 | | 9664 | 12 | Brazil | 4381 | | 9372 | 126 | Italy Cluster 1 summary Top 10 countries by IP count

Clustering the Big One Create clusters on subject and body messages | cluster_id | ips | subjects | bodies | | | 34 | 136 fake watches | | | 330 | penis enlargement | | | 27 | online casino | | | 55 | fake name brand goods | | 7190 | 81 | viagra | | | 20 | valium | | 5990 | | online pharmacy | | 3391 | 45 | 5 stock investment | | 4149 | 3 | 5 porn | | 3483 | 9 | software | | 9240 | 17 | 9273 russian dating messages unique IPs

Clustering the Big One (cont) Number of overlapping IPs between clusters

Am I Bot or Not? cluster_id | messages | subjects | bodies | ips | networks | countries | | 451 | | 1313 | 57 | 2 Subject content widely varied Many blocks of consecutive IPs Some blocks are entire or most of a /24 messages | subjects | bodies | ips | networks | country_name | 87 | 1246 | 5 | 3 | Canada | 443 | | 1308 | 54 | United States

Failure is Success Delivery Notification cluster: cluster_id | messages | subject | bodies | ips | networks | countries | 1065 | 2 | 1065 | 609 | 12 | 4 Subject Detail messages | subject | Delivery failure 452 | failure delivery Delivery notification from legitimate mail servers Not clustered with spam or sources of spam

Chinese Spam All Chinese messages messages | ips | networks | clusters | country_name | 5179 | 197 | 922 | China 139 | 2 | 1 | 2 | Thailand 78 | 12 | 3 | 4 | United States 5 | 4 | 1 | 2 | Germany Top 10 Chinese Clusters cluster_id | messages | subject | bodies | ips | networks | countries | | 19 | 15 | 962 | 4 | | 9987 | 1803 | 8 | 19 | 3 | 1 12 | 8054 | 9 | 8 | 26 | 1 | | 5521 | 5 | 9 | 283 | 4 | 1 69 | 4476 | 59 | 85 | 514 | 17 | | 3399 | 48 | 169 | 114 | 17 | | 2347 | 10 | 10 | 1 | 1 | | 2187 | 21 | 73 | 41 | 6 | 1 56 | 2047 | 29 | 45 | 61 | 14 | | 1944 | 3 | 4 | 5 | 1 | 1

Small Clusters Varied subjects and bodies. Manual clustering of “online pharmacy” spam Coalesced clusters: messages | ips | subjects | bodies | clusters | 9685 | | | 3651 Example subjects: Buy sugar pills online cheap!!!!11one Buy sugar pills online cheap!!!1cos(0) Buy sugar pills online cheap!111pi^0

What’s Next? Improve the similarity metrics Cluster a population or random sample Add time to the analysis