Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chad Mills Program Manager Windows Live Safety Platform Microsoft.

Similar presentations


Presentation on theme: "Chad Mills Program Manager Windows Live Safety Platform Microsoft."— Presentation transcript:

1 Chad Mills Program Manager Windows Live Safety Platform Microsoft

2

3 Assumption:  Spam words continue to appear in spam messages  Good words continue to appear in good messages million dollars transfer guardian March community social fellow (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 0.37 -0.11

4 From: "Chelsea Clark" Subject: Get PaidFor yourOpinion … opens NRSU syringe /> Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf undertaking paced Liquidation reduction /> …

5 OverallGroup of words Goodnewsletter peers month select these Goodlate click commissioner media Goodsmoothly off close support before Goodokay sponsor rock go by ads Goodnone cases text membership

6 Good Message + Free Nigeria Viagra Spammy Words = Borderline Spam Message + Borderline Spam late click commissioner Unknown Words = late click commissioner Good Words Inbox + Borderline Spam newsletter select month Unknown Words = newsletter select month Non-Good Words Junk Folder

7

8 Chaff Spam  [spam content]  newsletter peers month select these  late click commissioner media  smoothly off close support before  okay sponsor rock go by ads  none cases text membership Legitimate Mail March is all about the Zune community. This month, you can help create a new feature for The Social, get tips from a fellow Zune user and find out the winners of the Your Zune Your Choice Awards.

9

10

11  Sum of weights (content filter score)  Average weight  Standard Deviation  Percent of words that are good  Percent of words that are spam  Number of features  Maximum feature weight  Number of strong spam words  Etc.

12 million dollars transfer guardian March community social fellow (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) Sum: 0.37 σ: 0.09 Max: 0.2 Sum: -0.11 σ: 0.04 Max: -0.1 Features (feature, weight) Metafeatures (Metafeature, weight) (Sum: 0.37, 1.0) (σ: 0.09, 0.8) (Max: 0.2, 0.1) (Sum: -0.11, -0.8) (σ: 0.04, -0.6) (Max: -0.1, -0.3) -1.7 1.9

13  Hotmail Feedback Loop ◦ Messages classified by recipients  Training Set: 1,800,000 messages ◦ Ending on 5/20/07  Evaluation Set: 50,000 messages ◦ Data from 5/21/07

14 45% improvement in TP at low FP levels

15  At a reasonable False Positive rate: ◦ 98% of unique catches are chaff spam ◦ Caught 99.5% of chaff spam missed by regular content filter ◦ Similar types of False Positives as regular filter  Challenges Remaining ◦ Primarily just helped on spam with chaff ◦ Relies on base content filter to detect spam with obfuscated content (e.g. v1agra) or naïve spam without any chaff

16  Spam messages with good word chaff have unnatural weight distributions  Metafeatures is able to identify and catch these messages  This resulted in a 45% improvement in TP  Gains were limited to spam with good word chaff


Download ppt "Chad Mills Program Manager Windows Live Safety Platform Microsoft."

Similar presentations


Ads by Google