Download presentation
Presentation is loading. Please wait.
Published byGiles Clarke Modified over 9 years ago
1
Chad Mills Program Manager Windows Live Safety Platform Microsoft
3
Assumption: Spam words continue to appear in spam messages Good words continue to appear in good messages million dollars transfer guardian March community social fellow (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 0.37 -0.11
4
From: "Chelsea Clark" Subject: Get PaidFor yourOpinion … opens NRSU syringe /> Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf undertaking paced Liquidation reduction /> …
5
OverallGroup of words Goodnewsletter peers month select these Goodlate click commissioner media Goodsmoothly off close support before Goodokay sponsor rock go by ads Goodnone cases text membership
6
Good Message + Free Nigeria Viagra Spammy Words = Borderline Spam Message + Borderline Spam late click commissioner Unknown Words = late click commissioner Good Words Inbox + Borderline Spam newsletter select month Unknown Words = newsletter select month Non-Good Words Junk Folder
8
Chaff Spam [spam content] newsletter peers month select these late click commissioner media smoothly off close support before okay sponsor rock go by ads none cases text membership Legitimate Mail March is all about the Zune community. This month, you can help create a new feature for The Social, get tips from a fellow Zune user and find out the winners of the Your Zune Your Choice Awards.
11
Sum of weights (content filter score) Average weight Standard Deviation Percent of words that are good Percent of words that are spam Number of features Maximum feature weight Number of strong spam words Etc.
12
million dollars transfer guardian March community social fellow (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) Sum: 0.37 σ: 0.09 Max: 0.2 Sum: -0.11 σ: 0.04 Max: -0.1 Features (feature, weight) Metafeatures (Metafeature, weight) (Sum: 0.37, 1.0) (σ: 0.09, 0.8) (Max: 0.2, 0.1) (Sum: -0.11, -0.8) (σ: 0.04, -0.6) (Max: -0.1, -0.3) -1.7 1.9
13
Hotmail Feedback Loop ◦ Messages classified by recipients Training Set: 1,800,000 messages ◦ Ending on 5/20/07 Evaluation Set: 50,000 messages ◦ Data from 5/21/07
14
45% improvement in TP at low FP levels
15
At a reasonable False Positive rate: ◦ 98% of unique catches are chaff spam ◦ Caught 99.5% of chaff spam missed by regular content filter ◦ Similar types of False Positives as regular filter Challenges Remaining ◦ Primarily just helped on spam with chaff ◦ Relies on base content filter to detect spam with obfuscated content (e.g. v1agra) or naïve spam without any chaff
16
Spam messages with good word chaff have unnatural weight distributions Metafeatures is able to identify and catch these messages This resulted in a 45% improvement in TP Gains were limited to spam with good word chaff
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.