Download presentation
Presentation is loading. Please wait.
1
PPM based Spam Filtering in SEWM2008
Liu JuXin, Xu Congfu, Peng Peng, Lu Guanzhong College of Computer Science, Zhejiang University April 10, 2008
2
Outline PPM( prediction by partial matching ) Email Pre-processing
Train PPM Model Model Classification
3
PPM Data Compression
4
PPM Framework
5
Email Pre-processing Source alphabet Merge continuous spaces
Truncate long messages
6
Email Pre-processing Sample: Alphabet : {a,b,c,d,e,f,_,=, }
Replace char: ? Truncate length: 20 Raw Data Abcd_= Af?/[]=+ safj =ab fe addfe After Replace abcd_= ? Af????=? ?af? =ab fe addfe After Merge Blank abcd_= ? Af????=? ?af? =ab fe addfe After Truncate abcd_= ? Af????=? ?a
7
Train PPM Model Use order-6 PPM* model Use Method D Escape estimation
Train Two PPM model HAM Model SPAM Model
8
Model Classification MCE( Minimum Cross-entropy )
MDL( Minimum Description Length ) Spam Score
9
Advantage Simple pre-processing No decode ( avoid obfuscate )
Highly self-adaptive Low false positive
10
Reference 《Spam Filtering Using Statistical Data Compression Models》
《Unbounded Length Contexts for PPM》
11
Question Delay Index Deliver the filter ham, Ham and HAM
Active learning 10000 Deliver the filter
12
Thanks for your attention! Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.