Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people,

Similar presentations


Presentation on theme: "1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people,"— Presentation transcript:

1

2 1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people, including Robert Rounthwaite, David Heckerman, John Platt, Carl Kadie, Eric Horvitz, Scott Yih, Geoff Hulten, Nathan Howell, Micah Rupersburg, George Webb, Ryan Hamlin, Kevin Doerr, Elissa Murphy, Derek Hazeur, Bryan Starbuck, lots of people at Hotmail and Outlook and Exchange and MSN…

3 Overview  Introduction to spam There’s a ton of it, and people hate it  Kinds of Spam(lots)  Techniques Spammers Use Can be used for other kinds of spam  Solutions to spam Machine Learning, Fuzzy Hashing, Turing Tests, Blackhole lists, etc. Can apply to other kinds of spam  Conclusion

4 InfoWorld Poll July 25, 2003

5 Pew Internet Study Numbers  25% of email users say spam has reduced their overall email use  76% of email users are bothered by offensive or obscene content of spam.  Economics favor spam 7% of email users report that they have ordered a product advertised in spam. Cost of sending spam is only about.01 cents/message! If 1 in 100,000 people buy, and you earn $11, you make a profit

6 Kinds of Spam “Advertising Wants to Be Free”  Email spam (I’ll talk about this in detail)  Usenet spam (usenet now nearly useless in many groups)  Chat rooms  Instant Messenger  Popups (Web pages, Adware)  Search engine spam Link spam Word spam Blog spam  VOIP?  Conclusion: If you can advertise for free, someone will

7 Chat Room Spam  MSN closed its free chat rooms  Spambots come in and pretend to chat But really just advertising porn sites Some spambots trivial Don’t talk at all, but take up space Link to porn spam in their profile Some spambots very sophisticated You can have a short conversation with them before they try to convince you to go to their website Randomized conversations so hard for users to spot

8 joshuagood9: hi there super_christina12: hey there how u doin? joshuagood9: are you a bot? super_christina12: im not a bot are u? lol joshuagood9: are you a bot? super_christina12: i hate bots lol joshuagood9: asl? super_christina12: im 20 f usa and u? joshuagood9: asl? super_christina12: im 21 f usa and u? joshuagood9: 74/M, WA super_christina12: nice age joshuagood9: thank you super_christina12: yw sweety..could u do me a favor..check out my homepage and my profile see if my cam works? brb Chat Bot Conversation

9 super_christina12’s home page

10 Instant Messenger Spam “SPIM”  Send messages to people via IM  Microsoft solved this by requiring to get permission before IMing  Spammers put spam in their “name” – so permission request message now has spam!

11 Popup Spam  Web page popups You go to a web page, and get a popup May be a “pop under” that appears under all other windows, so you don’t even know where it came from  Adware (e.g. Gator) Software installed on your computer either without your permission, or where permission is hidden deep in license agreement. Creates popups all the time  Messenger Spam (not IM) Method meant to deliver notices like “Printer is out of paper” Spammers exploit it to create notices like “Buy a diploma”

12 Search Engine Spam  Link spam Search engines use number of links to determine rankings Spammers create millions of pages that link to their site Fake pages may be realistic and may be returned as search results, too.  Word spam Spammers put misleading words on their page, e.g. celebrity names or technical terms Page is actually porn  Cloaking Put useful content on page when crawler comes (use IP address or agent ID) Put spam on page when people click on the link

13 VOIP spam?  Make free or low-cost internet phone calls  Internet to internet is free Could be fully automated Speech recognition someday Almost no cost  Internet to POTS (fixed line) 2 cents/minute Could be cheap foreign labor International calls too expensive over traditional phone lines for most advertising, become more affordable with VOIP

14 Techniques Spammers Use  Examples of Tricks  Sending spam Open proxies Zombies  (Lots of other nasty stuff, won’t have time to talk about today)

15 Weather Report Guy  Content in Image  Good Word Chaff Weather, Sunny, High 82, Low 81, Favorite…

16 Secret Decoder Ring Dude  Another spam that looks easy  Is it?

17 Secret Decoder Ring Dude  Character Encoding  HTML word breaking Pharmacy Produc t s

18 Diploma Guy  Word Obscuring Dplmoia Pragorm Caerte a mroe prosoeprus

19 Diploma Guy  Word Obscuring Dipmloa Paogrrm Cterae a more presporous

20 Diploma Guy  Word Obscuring Dimlpoa Pgorram Cearte a more poosperrus

21 Diploma Guy  Word Obscuring Dpmloia Pragorm Caetre a more prorpeosus

22 Diploma Guy  Word Obscuring Dplmoia Pragorm Carete a mroe prorpseous

23 More of Diploma Guy  Diploma Guy is good at what he does

24 Trends in Spam Exploits (Hulten et al.) Exploit 2003 Spam 2004 Spam Delta (Absolute %) Description Word Obscuring 4%20%16% Misspelling words, putting words into images, etc. URL Spamming 0%10% Adding URLs to non-spam sites (e.g. msn.com). Domain Spoofing 41%50%9% Using an invalid or fake domain in the from line. Token Breaking 7%15%8% Breaking words with punctuation, space, etc. MIME Attacks5%11%6% Putting non-spam content in one body part and spam content in another. Text Chaff52%56%4% Random strings of characters, random series or words, or unrelated sentences. URL Obscuring22%17%-5% Encoding a URL in hexadecimal, hiding the true URL with an @ sign, etc. Character Encoding5%0%-5%Pharmacy renders into Pharmacy. Based on 1,200 spam messages sent to Hotmail

25 Sending Spam: Open Proxies  These are web-page proxy servers Used for getting web-pages past firewalls Should have nothing to do with email  Spammers exploit holes Exploit a hole that you can use some proxies to send email Exploit another hole that anyone can access the proxy-server Both holes must be present to use an open proxy  Spammers really love these Almost impossible to trace spammer Spammer uses someone else’s bandwidth Less incentive for owner to close the proxy than to close open mail relays  Everyone who abuses things likes open proxies (example: click fraud on Google and Overture advertisements)

26 Sending Spam: Zombies  As much as 2/3 of spam may originate from zombies now!  Consumer computers taken over by viruses or trojans Spammer tells them what to send Very difficult to trace Very cheap for spammer

27 Blogs, Referrers  For link spam, some spammers manually or automatically add links to blogs Increases page rank of site Blog site tends to have high page rank  Some sites will put list of referrer pages or proxy traffic somewhere on site Spammers generate fake traffic to site to get entries, which are then indexed by search engines  Even secure features like these are vectors for abuse

28 Solutions to Spam  Filtering Machine Learning Matching/Fuzzy Hashing Blackhole Lists (IP addresses)  Postage Turing Tests, Money, Computation  SmartProof

29 Filtering Technique Machine Learning  Learn spam versus good  Problem: need source of training data Get users to volunteer GOOD and SPAM 100,000 volunteers at Hotmail  Should generalize well  But spammers are adapting to machine learning too Images, different words, misspellings, etc.  We use machine learning – details later

30 Filtering Technique Matching/Fuzzy Hashing  Use “Honeypots” – addresses that should never get mail All mail sent to them is spam  Look for similar messages that arrive in real mailboxes Exact match easily defeated Use fuzzy hashes How effective?  The Madlibs attack defeats exact match filters or fuzzy hashing  Spammers already doing this  Good work from IBM and AOL (Josh Alspector) on solutions Make Earn thousands of dollars lots of money working at home in the comfort of your own house !!!.

31 Blackhole Lists  Lists of IP addresses that send spam Open relays, Open proxies, DSL/Cable lines, etc…  Easy to make mistakes Open relays, DSL, Cable send good and spam…  Who makes the lists? Some list-makers very aggressive Some list-makers too slow MSN blocks e-mail from rival ISPs By Stefanie Olsen Staff Writer, CNET News.com February 28, 2003, 2:34 PM PTStefanie Olsen Microsoft's MSN said its e-mail services had blocked some incoming messages from rival Internet service providers earlier this week, after their networks were mistakenly banned as sources of junk mail. The Redmond, Wash., company, which has nearly 120 million e-mail customers through its Hotmail and MSN Internet services, confirmed Friday it had wrongly placed a group of Internet protocol addresses from AOL Time Warner's RoadRunner broadband service and EarthLink on its "blocklist" of known spammers whose mail should be barred from customer in- boxes. Once notified of the error by the two ISPs, MSN moved the IP addresses "over to a safe list immediately," according to a Microsoft spokeswoman.

32 Postage  Basic problem with email is that it is free Force everyone to pay (especially spammers) and spam goes away Send payment pre-emptively, with each outbound message, or wait for challenge  Multiple kinds of payment: Turing Test, Computation, Money

33 Turing Tests (Naor ’96)  You send me mail; I don’t know you  I send you a challenge: type these letters  Your response is sent to my computer  Your message is moved to my inbox, where I read it

34 Computational Challenge (Dwork and Naor ’92)  Sender must perform time consuming computation  Example: find a hash collision Easy for recipient to verify, hard for sender to find collision  Requires say 10 seconds (or 5 minutes?) of sender CPU time (in background)  Can be done preemptively, or in response to challenge

35 $$$ Money  Pay actual money (1 cent?) to send a message  My favorite variation: take money only when user hits “Report Spam” button Otherwise, refund to sender Free for non-spammers to send mail, but expensive for spammers  Requires multiple monetary transactions for every message sent – expensive  Who pays for infrastructure?

36 The SmartProof Approach Overview  Combines best aspects of several previous techniques: Machine learning Challenge response Postage (multiple techniques)

37 SmartProof: Selective Challenging  Most challenge-response approaches challenge every message  We use machine learning to challenge only some messages Definite spam deleted (saves processing costs) Definite good passed through to inbox (avoids annoying challenges, and avoids many challenges that will not be answered.) Only possible spam, possible good is challenged

38 SmartProof: Sender Chooses Type of Proof  Can auto-respond with computation Least annoying to sender – he may never see the challenge Usable by people with disabilities  Can respond by solving a Turing Test Works for people with old computers or incompatible computers or who do not want to download code  Future?: Can respond with micro-payment Works for small businesses. Hardest for spammers to work around.

39 Why Email and Spam need Their Own Field/Conference  Email is one of top two applications Search is the other (TREC, SIGIR) Email is why my grandfather and my wife’s grandmother bought computers  If you are interested in the frontiers of the internet, you should be very interested in the number one application on the internet

40 Example: Anti-Spoofing  Cryptographic approaches S/MIME, PGP Small adoption because of problems distributing keys – need solutions that work for email  Systems/networking approaches DNS/IP address-based approaches  Combination approaches Put key in DNS entry (e.g. Yahoo’s DomainKeys)  Need a conference where the crypto people and systems people and email people and spam people all come together to compare and learn

41 Conference on Email and Anti-Spam  www.ceas.cc  How it’s different: First academic-style research conference on email or spam Plenty of informal conferences, industrial conferences  Thursday and Friday at Stanford  If you can’t come this year, make sure you come next year, and submit papers!

42 Conclusion  Machine learning filters combined with postage seem like the best approach to stopping spam  Any thing on the network that can be abused will be abused


Download ppt "1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people,"

Similar presentations


Ads by Google