Spam Andy Nguyen 5/17/2004
What is Spam? Unsolicited means that the Recipient has not granted verifiable permission for the message to be sent. Bulk means that the message is sent as part of a larger collection of messages, all having substantively identical content. Unsolicited means that the Recipient has not granted verifiable permission for the message to be sent. Bulk means that the message is sent as part of a larger collection of messages, all having substantively identical content. A message is Spam only if it is both Unsolicited and Bulk (UBE) A message is Spam only if it is both Unsolicited and Bulk (UBE) Unsolicited is normal (examples include first contact enquiries, job enquiries, sales enquiries)Unsolicited is normal (examples include first contact enquiries, job enquiries, sales enquiries) Bulk is normal (examples include subscriber newsletters, discussion lists, information lists)Bulk is normal (examples include subscriber newsletters, discussion lists, information lists) Technical Definition of “Spam”: Technical Definition of “Spam”: An electronic message is "spam" IF: (1) the recipient's personal identity and context are irrelevant because the message is equally applicable to many other potential recipients; AND (2) the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent; AND (3) the transmission and reception of the message appears to the recipient to give a disproportionate benefit to the sender.An electronic message is "spam" IF: (1) the recipient's personal identity and context are irrelevant because the message is equally applicable to many other potential recipients; AND (2) the recipient has not verifiably granted deliberate, explicit, and still-revocable permission for it to be sent; AND (3) the transmission and reception of the message appears to the recipient to give a disproportionate benefit to the sender. Source:
Effects of Spam Bandwidth Loss Bandwidth Loss Connection Expense Connection Expense Unnecessary disk space usage Unnecessary disk space usage Over-flowing user mail boxes Over-flowing user mail boxes Loss of productivity Loss of productivity Fraud Fraud Costs estimated at $1 Billion/year Costs estimated at $1 Billion/year Nearly 30% of AOL’s mail is Spam Nearly 30% of AOL’s mail is Spam
Spammers Use automated tools that analyze online content Use automated tools that analyze online content Methods Methods Looking through UseNet for addressesLooking through UseNet for addresses Mailing listsMailing lists Web pages (guest books, forums, etc.)Web pages (guest books, forums, etc.) Dictionary attacks on user and domain names, using predictable addressesDictionary attacks on user and domain names, using predictable addresses directories, white pages (Big Foot) directories, white pages (Big Foot) Chat RoomsChat Rooms
Spam Defense Types of Defense: Types of Defense: EducationalEducational TechnicalTechnical Legal/EconomicalLegal/Economical Issues for Technical Spam Solutions: Issues for Technical Spam Solutions: DeploymentDeployment
Blacklisting Blocking mail from servers that is known to be bad Blocking mail from servers that is known to be bad Can stop before it is sent out Can stop before it is sent out Uses DNS-based distribution scheme Uses DNS-based distribution scheme Issues: Issues: Account Hopping – spammers use free addresses, spoof addresses, send through open relays/non- blacklisted servers to hide their point of originAccount Hopping – spammers use free addresses, spoof addresses, send through open relays/non- blacklisted servers to hide their point of origin Should you trust the administrators of these blacklists?Should you trust the administrators of these blacklists? blacklist listing policies differ blacklist listing policies differ Compromised blacklist can blacklist the internet (0/0), or allow everyone through Compromised blacklist can blacklist the internet (0/0), or allow everyone through New/unknown mail servers? Also may prevent good mail from coming throughNew/unknown mail servers? Also may prevent good mail from coming through
Spam Poisoning Defense against harvesting Defense against harvesting Instead of use Instead of use Using images Using images Generating fake web pages, with fake addresses Generating fake web pages, with fake addresses Issues: Issues: Once address is revealed, all effort spent concealing address wastedOnce address is revealed, all effort spent concealing address wasted Harvesters use search engines to find addressesHarvesters use search engines to find addresses
Distributed, Collaborative Filtering When a system receives spam, either from a user or “spam trap”, message is hashed and passed to closest server When a system receives spam, either from a user or “spam trap”, message is hashed and passed to closest server This mechanism maintains a distributed and constantly updating library of bulk mail This mechanism maintains a distributed and constantly updating library of bulk mail Issues: Issues: Users can abuse service and submit legitimate Users can abuse service and submit legitimate Spammers randomize their spam to change checksums (adding random strings etc.)Spammers randomize their spam to change checksums (adding random strings etc.)
Content Filtering Destination based defense Destination based defense Based on the content of the message Based on the content of the message Bayesian ApproachBayesian Approach Issues: Issues: Processing load on mail serverProcessing load on mail server Doesn’t address bandwidth and storage issuesDoesn’t address bandwidth and storage issues Accuracy isn’t 100%? Is this acceptable?Accuracy isn’t 100%? Is this acceptable? Spammers may run their s through the filters in order to bypass themSpammers may run their s through the filters in order to bypass them Privacy issuesPrivacy issues
Pricing Functions Basic Idea: Basic Idea: “If I don’t know you and want you to send me a message, then you must prove that you spent, say, ten seconds of CPU time, just for me and just for this message”“If I don’t know you and want you to send me a message, then you must prove that you spent, say, ten seconds of CPU time, just for me and just for this message” Proof of effort takes some time to compute but easily verifiable Proof of effort takes some time to compute but easily verifiable Function based on large number of scattered number of memory accessesFunction based on large number of scattered number of memory accesses Issues: Issues: What about legitimate mailing lists?What about legitimate mailing lists? Attackers could just compromise many machines to send out the mail (similar to DDos)Attackers could just compromise many machines to send out the mail (similar to DDos) Where would you deploy this ? On the between sender and mail server, server-server?Where would you deploy this ? On the between sender and mail server, server-server?
Internet Mail 2000 New mailing protocol New mailing protocol Changes “push” architecture to a “pull” architecture Changes “push” architecture to a “pull” architecture Mail stored on sender’s serverMail stored on sender’s server Issues: Issues: New attacks are possibleNew attacks are possible Global deployment would be requiredGlobal deployment would be required
Discussion Certified ? Certified ? National opt-out list? National opt-out list? Human Skill-Challenges ? Human Skill-Challenges ? Payment methods (charging a small fee when sending ) Payment methods (charging a small fee when sending ) Possible legislation Possible legislation Which approach do you think is best? Or should we use a combination of mechanisms? Which approach do you think is best? Or should we use a combination of mechanisms?