Spam May CS239
Taxonomy (UBE) Advertisement Phishing Webpage Content Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To: Thrifty Health-Insurance To: Date: May 10, :30 PM Subject: No obligation Health Insurance Quotes Great health insurance quotes. Get a quote from us and let local agents compete for your business. Health insurance is more affordable than you think. Health Plans Dental Plans Prescription Plans Vision Plans and more Check out the lowest rates in the industry. This is a commercial message. ………….
How worse is the situation 30-40% mail traffic are spam End-user Waste time reading junk (may fall in trap) ~1 billion productivity lost per year System operator Increased running cost
Why people spam? Economic incentive Effectiveness = sent x (1-P filtered ) x P read x P clickthrough Business strategy?
How spammer collect s UseNet Web pages Registration forms Dictionary attacks
Defense mechanism Authentication Challenge/response system DNSxL Check-sum based filtering Statistical filtering Micro-payment Spam poisoning A brand new architecture
Authentication Avoid forged sender address SMTP AUTH Verify sender is a legitimate user Sender Policy Framework (SPF) Verify sender’s IP corresponds to the domain
Challenge/response system Work together with white list Only sender in the contact list can get through If not, a challenge is posted to the sender Ensure sender is a human instead of a program
DNSxL Block list A list of IP/domain observed to be sending out spam consistently use DNS to distribute the list Similar to reverse DNS lookup White list Similar idea but work in the other way
Check-sum based filtering Collaborative filtering Distributed Checksum Clearinghouse (DCC) Vipul’s Razor Brightmail A checksum is computed for a spam reported The list is consistently updated and distributed
Statistical filtering 2-class text classification problem Words, phrases Training samples Adaptive
Statistical filtering False positive Classified junk Classified legitimate Total Acutally junk36945 Actually legitimate Total
Payment Increase the cost of spammers Micro-payment / e-cash “Computational” payment HashCash (SHA-1) X-Hashcash: Takes 1 second to generate Takes 1 microsecond to verify (both on 1GHz machine) CAMRAM
Spam poisoning Expose address in human-readable format Generate fake dynamically by CGI script Create addresses to harvest spam s (similar to honeypot)
New architecture Internet Mail 2000 Pull based Sender’s ISP responsible for storing s Receiver gets a notification only A global deployment is unlikely anytime in the near future
How spammer response? Append random string at the end of each spam Improve spambot to filter characters used in spam poisoning Use worms to infect client programs Analyze user’s pattern