Application Recognition Sam Larsen Determina
Process Control One method to improve computer security is through process control Whitelist: user specifies what is allowed to run Blacklist: user specifies what is not allowed to run Strong customer interest Disadvantages: Difficult to administrate Hackers are learning to circumvent
The Pesky Gray Many applications won’t be black or white Whitelist: a lot of work for the administrator Currently, we identify applications via a checksum New software introduces a new checksum Every new upgrade/patch requires intervention Blacklist: circumvention is getting common Bad guys now create custom binaries just for you! Small modifications defeat checksums Many malware payloads are encrypted
Application Recognition Can we automatically recognize a different version of a known application? Migrate to blacklist/whitelist with little or no user intervention Malware identification Hackers are lazy: families of malware derived from the same code base
Approach Observe runtime program behavior Indirect branches pose no problem to analysis Focus on the code that actually executes Handle self-unpacking binaries Potentially, observe runtime data Apps derived from the same codebase should have similar runtime behavior Different apps should have different behavior First attempt: characterize an application by the stream of system calls it generates
Rationale for System Calls System calls are the important events Nearly identical binaries should generate nearly identical traces Factor out small code changes Low runtime overhead Only take action at system calls
Application Communities Application identification is most useful in an application community Community data can be aggregated to form more complete application signatures Once an application is recognized, it can be approved or disapproved for everyone Prevent harm for most community members Eliminate most of the overhead of recognition
Initial Experimental Results Use DR to capture system call traces Build database of all sequences of N calls Example: For N=2 and sequence ABCD → AB, BC, CD Measure of similarity between two apps: T - d T T = # unique sequences across both apps d = # sequences in one and not the other
Firefox N = 2 N = 3 N = 4
Firefox N = 2 N = 3 N = 4
Apache N = 2 N = 3 N = 4
Apache N = 2 N = 3 N = 4
Gaim N = 2 N = 3 N = 4
Gaim N = 2 N = 3 N = 4
Traces of API calls Windows API is the primary system interface for windows apps More sensible to track sequences of API calls At system call, examine the call stack to find the outermost API call If not possible, default to system call
Firefox N = 2 N = 3 N = 4
Firefox N = 2 N = 3 N = 4
Apache N = 2 N = 3 N = 4
Apache N = 2 N = 3 N = 4
Gaim N = 2 N = 3 N = 4
Gaim N = 2 N = 3 N = 4
Comparison with Traditional HIPS Syscall-based intrusion detection/prevention [Forrest & Hofmeyr] Record traces during training, then monitor and compare in deployment Problem with false positives Syscall-based application recognition Looking at general trends, thus some noise can be tolerated false positives not an issue More practical use of system call traces
Next Steps Gather data for more applications How can we match applications that make few system calls (e.g., calc)? Compare families of malware Build a sandbox? Malicious code may be recognized too late