Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana.

Similar presentations


Presentation on theme: "Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana."— Presentation transcript:

1 Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana

2 Overview Introduction Primitive reputation service Gmail reputation service - Authentication - Reputation calculation Results Problems Simple rules Conclusion Discussion

3 Introduction Gmail, a free email service Gmail is very concerned about detecting spammy emails and eliminating them Reputation systems and spam filters

4 Some interesting stats 50 million Gmail users in all Maximum disk space 130,000 terabytes Assuming 20% usage 104,000 terabytes (backups included) 5-30 % of all the emails received is spam Considering an average of 17.5% spam A minimum of 18,200 terabytes of data stored by gmail is spam --- http://www.zipmail-for-gmail.com/us/to-google-executive.htm#gmail1 http://www.zipmail-for-gmail.com/us/to-google-executive.htm#gmail1

5 Reputation-Based & Content-Based classification The reputation based classification uses the senders reputation instead of the content in the email to classify the mail either to be spam or not. Contrastingly, the content-based classification uses the contents of the email to classify.

6 Rudimentary Reputation System Whitelists Block lists

7

8

9

10 Working Use connecting IP (crude authentication) Check if in whitelist Check if in block list If not in any list send to spam filter (now a content based filtering is done by the filter)

11 Problems with the rudimentary system Removing false positives Manual whitelist management Figuring out the true sender Multiple domains sharing a set of IP addresses

12 Gmail reputation system Detecting solicited and bulk emails What is spam & what is not?

13 Some definitions for spam To indiscriminately send unsolicited, unwanted, irrelevant, or inappropriate messages, especially commercial advertising in mass quantities. Noun: electronic "junk mail". www.tecrime.com/0gloss.htm www.tecrime.com/0gloss.htm Spam refers to electronic junk mail or junk newsgroup postings. Some people define spam even more generally as any unsolicited e-mail. In addition to being a nuisance, spam also eats up a lot of network bandwidth. Because the Internet is a public network, little can be done to prevent spam, just as it is impossible to prevent junk mail. However, the use of software filters in e-mail programs can be used to remove most spam sent through e-mail. nces.ed.gov/pubs2003/secureweb/glossary.asp nces.ed.gov/pubs2003/secureweb/glossary.asp To crash a program by overrunning a fixed-site buffer with excessively large input data. Also, to cause a person or newsgroup to be flooded with irrelevant or inappropriate messages. www.tsl.state.tx.us/ld/pubs/compsecurity/glossary.html www.tsl.state.tx.us/ld/pubs/compsecurity/glossary.html "SPAM" mail is the practice of sending massive amounts of e-mail promotions or advertisements (and scams) to people that have not asked for it. Spam mail is controversial and there are many levels of definitions for it. Many times, spam e-mail lists are created by "harvesting" e-mail addresses from discussion boards and groups, chat rooms, IRC, and web pages. Pugmarks strictly prohibits sending spam from accounts on our servers. www.pugmarks.com/support/glossary.htm www.pugmarks.com/support/glossary.htm

14

15

16 Authenticating a domain IP addresses don’t represent sender Domain-based authentication systems –SPF (Sender Policy Framework) –Domain Keys

17 Working of SPF Domains use DNS to direct requests All domains publish email (MX) records to determine which machines receive mail for the domain. SPF works by domains publishing "reverse MX" records to determine which machines send mail from the domain. The recipient can check those records to make sure mail is coming from where it should be coming from. With SPF, those "reverse MX" records are easy to publish: one line in DNS is all it takes. - www. openspf.org

18 Working of DomainKeys DomainKeys adds a header named "DomainKey-Signature" that contains a digital signature of the contents of the mail message. Parameters : –SHA-1 (cryptographic hash) –RSA (Public key encryption) –Base64 (to encode encrypted hash)

19 Signature Header DKIM-Signature: a=rsa-sha1; q=dns; d=example.com; i=user@eng.example.com; s=jun2005.eng; c=relaxed/simple; t=1117574938; x=1118006938; h=from:to:subject:date; b=dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSb av+yuU4zGeeruD00lszZVoG4ZHRNiYzR DNS query will be made to: jun2005.eng._domainkey.example.com

20 SPF Authentication Method Plain-SPF Best-guess SPF DNS PTR zone

21 An example - DNS PTR Zone Messages arrives  abc.xyz.com DNS PTR of the above message’s IP (using reverse DNS)  xyz.com (or) foo.xyz.com AUTHENTICATE

22 SPF Authentication Method Breakdown ~half of SPF authenticated messages used plain SPF

23 Authentication Breakdown : Nonspam Most of Gmails’s incoming mail that is not spam is authenticated (~75%)

24 Authentication Breakdown : spam Most of Gmail’s spam is not authenticated (~60%)

25 When an email arrives… When an email arrives, it is classified & an event is logged. Authentication associated with the message is also logged. Manual reclassification is also recorded. (either Report spam or Not spam.)

26 Reputation calculation Four variables involved in the calculation: - Autospam -Autononspam -Manualspam -Manualnonspam Reputation is calculated as a number between 0 and 100

27 A simplified formula good = autononspam + manualnonspam - manualspam total = autononspam + autospam reputation = (100*good) / total

28 An Example weliketospam.com sends 100 mails in a day autospam  40 autononspam  60 manualspam  8 manualnonspam  14 good=60+14-8=66 total=60+40 reputation=(100*66)/100=66

29 More on reputation Unfortunately false positives also occur Reputation calculation is done over many days Only a subset of users are considered for reputation calculation No bias towards heavy users over light users.

30 How it works define threshold T while(1) { wait for new mail / collect from queue if current mail reputation < T then send to SPAM folder else if current mail reputation fairly high then send to INBOX else send to spam_filter(reputation) } end

31 RESULTS Distribution of SPF reputations Distribution of DomainKey reputations

32 Results Contd. Selected SPF domains and their reputations Selected DomainKey domains and their reputations

33 Some of the good bulk sending practices Use a consistent IP address to send bulk mail Automatically unsubscribe users whose addresses bounce multiple pieces of mail Provide a 'List-Unsubscribe' header which points to a web form where the user can unsubscribe easily from future mailings Messages should indicate that they are bulk mail, using the 'Precedence: bulk' header field You must terminate, in a timely fashion, all users and/or clients who use your service to send spam mail http://www.google.com/mail/help/bulk_mail

34 Problems Forwarding spam to Gmail using tools that modify envelope sender Mailing lists (granularity) Corporate bulk senders who rarely send spammy bulk messages Users who report spam on a mailing list they are subscribed to

35 Simple rules for senders Authenticate using both SPF and DomainKeys Forwarding emails should not be authenticated unless the spam is filtered Try to keep spammers off the network Observe good bulk sending practices

36 CONCLUSIONS Using this reputation system spammers and good senders are easily identified There are surely some problems, but which can be easily solved eventually. Both SPF and DomainKey techniques should be used, one cannot replace the other.

37 DISCUSSION Are the reputation systems vulnerable to attacks? What kinds of attacks can be expected? Is the wisdom of gmail, in non-sharing spam information (whitelists/block lists), questionable? How feasible is a more granular reputation system?


Download ppt "Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana."

Similar presentations


Ads by Google