Spyware Detection Jeff Rosenberg Advisor: Professor Hemmendinger Computer Science Senior Project Winter 2006
Why bother? Most spyware removal tools use a list of known spyware programs List must be constantly updated Won’t catch anything that’s not listed At the mercy of those who create it Rather than use a reference list, detect spyware based on patterns it shows Avoids the need to update
The Program C++ program that uses MFC Easy to make a dialog-based interface Searches a computer, testing all files and directories it encounters Displays a list of detected files and directories, along with their probabilities of being spyware Based on which tests they passed
Searching Dir2 Dir4 Dir3 Bad Files List root Dir1 Dir3 File10 Dir2
Testing 123 Patterns – size, name, type Tests – combinations of patterns that spyware often exhibits SizePattern 400, 800 NamePattern “spy” TypePattern “exe” 00000001 00000010 00000100 SpywareExe - looks for executable files with “spy” in the name SmallExe - looks for executable files between 400 and 800 bytes 00000110 00000101 File ThisIsSpyWare.exe 2KB 00000110 & 00000110 = 00000110 00000110 00000110 & 00000101 = 00000100
Time is on my side Spyware often appears in groups, with all files created at the exact same time Can also use these bad dates to find spyware in other locations Algorithm to find date clusters in a given directory Sort a list of files by date Starting with the first file in the list, look through all of the files that follow as long as their dates are within a certain range of each other Continue until a date is found outside of this range. The probability of being spyware for files in this cluster depends on how many files are in it.
Program Interface
Conclusions/Future Work Can be hard to distinguish between good and bad files Still did a good job of finding all the spyware on the test machine Tests were developed from infections in October, but were still able to find spyware from new infections in February Learning – adjusting tests on the fly and create new ones Optimization Many of the algorithms used can be sped up significantly Still does okay, took 3 minutes and 35 seconds to scan 131,301 files (37GB)
Questions?