Spam Filtering Techniques Arnold Perez Joseph Tilley
Spam Filters CRM114 – The Controllable Regex Mutilator Bayesian Filter – improvements over Pantel and Lin Case Based Approach to Spam Filtering that Can Track Concept Drift
Spam Filter Traits Text Classification Use text classification to identify spam Concept Drift Leverage case based filtering to avoid concept drift Headers Investigate data in headers
Other Spam Filtering Techniques Blacklists List of sender information that will identify an as spam Greylists Hold messages from a sender that is not recognized
Goals Evaluate different Spam Filtering Techniques Create Spam Filter that borrows from different strengths from other Spam Filters Decrease number of false positives.
Testing Test our filter against a set of text files that represent s. Compare our results with statistical data of existing spam filters. Provide record statistics at milestones Addition of text classification Addition of Bayesian improvements Addition of cased based filtering
Project Deliverables Create Spam Filter that combines text classification, case based filtering, and improved Bayesian filter Comparisons of our filter to existing statistical data. Conclusions, lessons learned and possible future work.