1 An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory
2 Neural Networks a large number of processing elements, called neurons a different approach in problem solving neural networks and conventional algorithmic computers complement each other
3 Adaptive Resonance Theory Proposed by Carpenter and Grossberg in Solves the stability – plasticity dilemma ART architecture models can self-organize in real time producing stable recognition while getting input patterns beyond those originally stored Contains two components: an attentional and an orienting subsystem The orienting subsystem works like a novelty detector
4 ARTMAP ARTMAP a class of Neural Network architectures perform incremental supervised learning multi-dimensional maps input vectors presented in arbitrary order Fuzzy ARTMAP features presented in fuzzy logic
5 System A complex system that will gather the spam and ham corpus study its characteristics learn no human involvement
6 Inputs words like viagra, mortgage, xanax obfuscated words information extracted from headers other heuristics used in Anti-Spam filters
7 Hierarchy Initial implementation: single neural network Increasing number of heuristics Increasing number of training items Train both on spam and ham Improvements Next step: multiple neural networks (a hierarchy) Run only requested heuristics Perform a refined classification Split into several categories Increase detection speed Learn new patterns without losing detection on older spam
8 Hierarchy
9 Correction module and noise reduction Performs noise reduction on the input data before entering the learning phase Increases discrimination rate between the input patterns Eliminates or modifies patterns that can cause misclassification (same pattern for multiple categories)
10 Results
11 Results Table 3: Detection results on an increasing number of training items. Both train and test corpus were analyzed. Detection results on training items Detection results on test items
12 Conclusions Fast learning method Solves the stability – plasticity dilemma (property preserved from the ART-modules) Improves consistently the heuristic filter Faster The analysis is based on pattern recognition Performs a refined analysis High detection rates Advanced categorization Multiple spam categories Can also be used for parental control Can perform classification (business, school, personal) In conclusion, this system improves both speed and detection