Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fingerprinting Malware Authors

Similar presentations


Presentation on theme: "Fingerprinting Malware Authors"— Presentation transcript:

1 Fingerprinting Malware Authors
(until they read this presentation anyway) Rich Cummings

2 A very brief review of why you care about malware
The malware menace

3 The Malware Menace Malware is big business Malware makes money
an entire Industry has formed around the malware business model Malware makes money how much? I have no idea, lots of people have made educated guesses and the magnitude is usually in the Billions Enough to create an Industry and spawn dozens of computer security conferences

4 More than just money Some malware operations may also be driven by other motives: Nation-state sponsored spying Industrial espionage Military offensive and defensive capabilities Political espionage Others?

5 Why does size matter? The scale of the industry brings a greater level of efficiency and organization Professionally Designed & Developed Use cases Design documents Bug tracking QA and testing Support forums Separation of creators, vendors, and operators

6 Malware circa 1990s

7 Malware circa 2010 Testers Country without extradition Drop man
Cashier/Mule Implant Developers Implant Vendor ? Rootkit Developers Drop man Testers Forger Testers Botnet Vendor Stolen Account Buyers Botmaster Affiliate Website/Payment System Developers Testers Exploit Pack Vendor Pay-Per Install Affiliates Victims Exploit Developers

8 Who do you try to stop? Dropman/Cashier/Mules? Stolen Account Buyers?
Probably just one of hundreds Stolen Account Buyers? Probably just one of tens or hundreds PPI? Probably just one of hundreds or thousands Botmaster? May be one of several

9 Pursuing the source Identifying the vendor or developer may be more useful in the long run for locating and preventing future attacks

10 Who do you try to identify?
Vendors Professional companies with office space, stock plans, health benefits, management structure, and other traditional business features Sometimes protected by political connections or organized crime They may sell their product with a EULA and claim they are not responsible for damages Digital fingerprint can vary from very large to very small

11 Who do you try to identify?
Developers Code reuse is common A wide variety of potential digital fingerprints Developer specific forensic tool marks may help you to identify or prevent future attacks

12 Intelligence Spectrum
Blacklists Developer Fingerprints Social Cyberspace DIGINT Physical Surveillance HUMINT  Nearly Useless Nearly Impossible  SSN & Missile Coordinates of the Attacker MD5 Checksum of a single malware sample Sweet Spot IDS signatures with long-term viability Predict the attacker’s next moves

13 Humans Attribution is about the human behind the malware, not the specific malware variants Focus must be on the factors influenced directly or indirectly by a human

14 Why does it work? Rule #1 Humans are lazy
They don’t rewrite code every day They use toolkits and recycle existing code bases They generally do the minimum work necessary to hide from AV or avoid IDS

15 What does it work? Rule #2 Most attackers are focused on rapid response to network-level filtering Think dynamic dns, shifting C2, multiple C2 channels, obfuscation of network traffic They are not as focused on host level stealth Malware is simple in nature Enterprises rely on AV for host based detection and AV is simple to avoid

16 Why does it work? Rule #3 Physical memory is king
Cannot obfuscate from the CPU Code and data must be revealed at some point so it can execute or be processed correctly

17 What can be used to identify a developer?
DIGITAL FINGERPRINT

18 Simplistic Approach Focus on strings and human-readable data within a malware program In many cases, code-level reverse engineering is not required If you can read a packet sniffer, you can attribute malware Yes, this means more people in your organization can do this

19 More Complex: Traits A trait is defined as a logical behavior or ability, for example: Communicates to the internet using HTTP Captures a screen shot Records keystrokes Installs as a service etc…

20 Advanced: Link Analysis
Relationships between source code, vendors, authors, operators, servers, and capabilities Requires a large investment of time and effort to continually monitor underground activities and marketplaces Requires purchasing emerging malware products

21 Traits Traits are often stable across multiple revisions of the same malware Traits can identify previously unknown malware Traits reveal malware capabilities

22 Why not MD5 or checksums? Does not work, malware is too dynamic
Not effective for evaluating programs in memory

23 OS Loader In memory, traditional checksums don’t work
DISK FILE IN MEMORY IMAGE 100% dynamic Copied in full Copied in part OS Loader In memory, traditional checksums don’t work MD5 Checksum is not consistent Software Traits remain consistent MD5 Checksum reliable

24 Multiple Packers To avoid AV, malware may utilize multiple packer/obfuscator utilities Effectively changes signatures, checksums, MD5, etc However, traits remain consistent

25 OS Loader Physical memory tends to get around the ‘packing’ problem
IN MEMORY IMAGE Packer #1 Packer #2 Decrypted Original OS Loader Physical memory tends to get around the ‘packing’ problem Starting Malware Packed Malware Software Traits remain consistent

26 Compilers Compiler vendors, versions, and options can all have a dramatic change on malware Harder for signature based methods to identify Traits remain consistent

27 Same malware compiled in three different ways OS Loader
IN MEMORY IMAGE DISK FILE Same malware compiled in three different ways OS Loader MD5 Checksums all different Software Traits remain consistent

28 Toolkits are identifiable
Toolkit traits are consistent, even across multiple versions, multiple operators, and multiple packers

29 Toolkits can be detected Toolkit traits are apparent
IN MEMORY IMAGE OS Loader Toolkits can be detected Malware Toolkit Different Malware Authors Using Same Toolkit Toolkit traits are apparent Packed

30 Toolmarks Bytes, strings, or algorithms that can be used to identify a piece of the software development tool chain

31 Example Development Toolchain
Developer(s) Machine Machine Core ‘Backbone’ Sourcecode Compiler Linker Tweaks & Mods Malware Time Time 3rd party Sourcecode Paths Paths MAC address Libraries 3rd party libraries Resources

32 Example second phase Toolchain
Variant.A Packer A Variant.B Toolkit Protector A Installer Variant.C Variant.D Packer B Malware Protector B Embedded Variant.E Dropper A Variant.F Protector C Packer C Variant.G Dropper B Variant.H Wrapper Variant.I

33 Toolchain Each step in the toolchain can add unique indicators
The combination of some toolchain indicators can be used to identify or relate malware development

34 detail TOOL MARKS

35 Example Development Toolchain
Developer(s) Machine Machine Core ‘Backbone’ Sourcecode Compiler Linker Tweaks & Mods Malware Time Time 3rd party Sourcecode Paths Paths MAC address Libraries 3rd party libraries Resources

36 Source Code Core that the developer keeps re-using 3rd party sources
‘Hello World’ equivalent for malware Service, Driver, Dropper, Browser Helper Object, etc. 3rd party sources Functions that are cut-and-paste into the malware Encryption, communications, enumeration, etc. Often initially copied from open source repositories

37 Programming Language C C++ Delphi
ECX is used as the *this pointer in function calls Delphi Apparent in embedded data Visual Basic, purebasic, freebasic, etc Very obvious in embedded data

38 Code Idioms Use of exception handlers Use of error checking
Use of logging / error feedback Use of modular components Use of certain methods versus others (for example: fopen vs CreateFile)

39 Filename Creation Log files, EXE’s, DLL’s Subdirectories
Environment Variables Random numbers

40 Format Strings These are written by humans, so they provide good uniqueness

41 Logging Strings Searching for: “Unable to determine” & “Unknown type!”
Reveals that the attacker is using the source-code of BO2k for cut-and-paste material.

42

43 Mutex Names Mutex names remain consistent at least for one infection-push, as they are designed to prevent multiple-infections for the same malware.

44 Tweaks and Mods Small changes to typical code patterns or constants
For example, copying a code sample and removing extraneous cases

45 3rd Party Libraries Static or dynamic linking Compression / Encryption
OpenSSL Gzip InfoZip Communication TFtpServer, TWSocket, FtpSrvT Other Xml parser libraries

46 Copyright & Version Strings
OpenSSL/0.9.6 RAND part of OpenSSL 0.9.8e 23 Feb 2007 MD5 part of OpenSSL 0.9.8k 25 Mar 2009 libdes part of OpenSSL 0.9.7b 10 Apr 2003 inflate Copyright Mark Adler inflate Copyright Mark Adler Etc Compressed by Petite (c) 1999 Ian Luck.

47 zlib Fingerprinting Every new version of zlib has a unique pattern of bits in the data tables – these are modified for each version specifically This pattern is a data constant and can be used even if the copyright notices have been removed

48 inflate library patterns
Not as specific as zlib patterns but can be used to detect the inflate decompressor

49 Example Development Toolchain
Developer(s) Machine Machine Core ‘Backbone’ Sourcecode Compiler Linker Tweaks & Mods Malware Time Time 3rd party Sourcecode Paths Paths MAC address Libraries 3rd party libraries Resources

50 Compiler & Machine Version of compiler Machine-data leakage
GUID V1 leaks MAC address Compile time Source paths Project Names

51 Name Mangling

52 Undecorate Visual C++ demangle: DWORD WINAPI UnDecorateSymbolName(
__in PCTSTR DecoratedName, __out PTSTR UnDecoratedName, __in DWORD UndecoratedLength, __in DWORD Flags ); Also, see source to winedbg GNU C++ demangle see libiberty/cplus-dem.c and include/demangle.h

53 Delphi Give-away strings: SOFTWARE\Borland\Delphi\RTL
This program must be run under Win32

54 Delphi Uses specific function names – easy to identify
Language is derived from Pascal 78 hits for pascal, only 2 for c++

55 Revision Control Often indicated by paths and URLs Search for
cvs svn mss You might be able to get access to the malware source repository or at least the code used as a base for the malware

56 This technique was used to track the author of the Melissa virus
GUID V1: MAC Address The OSF specified algorithm for GUID V1 uses the MAC address of the network card for the last 48 bits of the 128 bit GUID This was deprecated on Windows 2000 and greater, so this has limited value {21EC2020-3AEA-1069-A2DD-08002B30309D} V1 GUIDS have a 1 in this position This is the MAC of the machine This technique was used to track the author of the Melissa virus

57 Country of Origin

58 Country of Origin ZeuS C&C server source code. Written in PHP Specific “Hello” response (note, can be queried from remote to fingerprint server) Clearly written in Russian In many cases, the authors make no attempt to hide…. You can purchase ZeuS and just read the source code…

59 Example Development Toolchain
Developer(s) Machine Machine Core ‘Backbone’ Sourcecode Compiler Linker Tweaks & Mods Malware Time Time 3rd party Sourcecode Paths Paths MAC address Libraries 3rd party libraries Resources

60 Embedded Manifest Contains name, description, platform
Contains list of dependent modules + versions May contain key tokens that identify specific dependent modules (aka strongly named) May contain public key that is tied to the developer if assembly itself is strongly named not likely! Public/private key pair (sn.exe)

61 Visual Studio Static or dynamic linked runtime library?
Single-threaded or multi-threaded? Use of STL? Use of older iostream libraries?* See: * support.microsoft.com/kb/154753

62 Visual Studio – Static Linking
Version Libraries linked with Type Compiler flag VC++ .NET 2003 and earlier LIBC.LIB, LIBCP.LIB Single Threaded Static /ML LIBCD.LIB, LIBCPD.LIB /MLd All LIBCMT.LIB, LIBCPMT.LIB Multi-threaded Static /MT LIBCMTD.LIB, LIBCPMTD.LIB /MTd Visual Studio – Dynamic Linking Version DLL Linked with VC++ 4.2 MSVCRT.DLL/MSVCRTD.DLL VC++ 5.0 MSVCR50.DLL VC++ 6.0 MSVCR60.DLL VC++ .NET 2002 MSVCR70.DLL VC++ .NET 2003 MSVCR71.DLL VC++ .NET 2005 MSVCR80.DLL VC++ .NET 2008 MSVCR90.DLL

63 Static Linking C runtime library strings will be embedded in the EXE itself, as opposed to being in an external DLL DOMAIN error TLOSS error SING error R6027

64 Debug Symbols Debug timestamp (time_t – seconds since 01.01.1970)
Version of the PDB file NB09 - Codeview 4.10 NB11 - Codeview 5.0 NB10 - PDB 2.0 RSDS - PDB 7.0 Age – number of times the malware has been compiled

65 Image Data Directories
PE Timestamps PE file Module timestamp* time_t (32 bit) e_lfanew The ‘lmv’ command in WinDBG will show this value.. Image File Header Optional Header Image Data Directories IMAGE DEBUG DIRECTORY Debug timestamp time_t (32 bit) This is present if an external PDB file is associated with the EXE *This is not the same as NTFS file times, which are 64 bit and stored in the NTFS file structures.

66 Timestamp Formats time_t – 32 bit, seconds since Jan. 1 1970 UTC
0x3DE03E0A  usually start with ‘3’ or ‘4’ ‘3’ started in 1995 and ‘4’ ends in 2012 Use ‘ctime’ function to convert FILETIME – 64 bit, 100-nanosecond intervals since Jan UTC 0x01C195C2.5100E190  usually start with ‘01’ and a letter 01A began in 1972 and 01F ends in 2057 Use FileTimeToSystemTime(), GetDateFormat(), and GetTimeFormat() to convert

67 User Language Native language of the software, expected keyboard layout, etc – intended for use by a specific nationality Be aware some technologies have multiple language support Resource-culture-codes for embedded resources in the EXE

68 Example second phase Toolchain
Variant.A Packer A Variant.B Toolkit Protector A Installer Variant.C Variant.D Packer B Malware Protector B Embedded Variant.E Dropper A Variant.F Protector C Packer C Variant.G Dropper B Variant.H Wrapper Variant.I

69 Software Protection Utilities that offer to help prevent reverse engineering of your software Legitimate? commercial purposes, DRM, protecting intellectual property, licensing Widely used in the malware community

70 Protectors/Encryptors
Themida HASP VMProtect WinLicense Code Virtualizer EXECryptor ASProtect … many more

71 Themida

72 Less legitimate Underground tools built by malware authors or vendors
Commercial tools or open source projects rebuilt for malicious uses

73

74

75 PEID Free tool that detects 600+ PE signatures http://www.peid.info/
Has a great list of what to look for in a binary to determine the protector/encryptor

76 Example second phase Toolchain
Variant.A Packer A Variant.B Toolkit Protector A Installer Variant.C Variant.D Packer B Malware Protector B Embedded Variant.E Dropper A Variant.F Protector C Packer C Variant.G Dropper B Variant.H Wrapper Variant.I

77 Compression Similar to Protectors and sometimes a single tool can provide both Very common in malware UPX or variants of UPX based on modifications to the UPX source/header information

78 UPX: The most common

79 Toolkits Prebuilt packages contain combinations of Trojans, Servers, and Exploits Sold in the underground by vendors to make a profit – generally not used directly by the creator/author

80

81

82 Real world example Case studY

83 Case Study: Gh0stNet

84 Doc/View is usually MFC
gh0st Rootkit e:\gh0st\server\sys\i386\RESSDT.pdb e:\job\gh0st\Release\Loader.pdb Cgh0stView Cgh0stDoc e:\job\gh0st\Release\gh0st.pdb C:\gh0st3.6_src\HACKER\i386\HACKE.pdb \gh0st3.6_src\Server\sys\i386\CHENQI.pdb Dropper GUI (MFC) Doc/View is usually MFC Rootkits Already at version 3.6

85 NOTE: Packing is not fully effective here
GhostNet: Dropper UPX! ¶üÿÿU‹ìƒìSVW3ÿÿ Packer Signature MZx90 This progRy. y cannot be run in DOS mode Embedded executable NOTE: Packing is not fully effective here

86 GhostNet: Dropper UPX! ¶üÿÿU‹ìƒìSVW3ÿÿ Resource Culture Code 0x0804 MZx90 The embedded executable is tagged with Chinese PRC Culture code This progRy. y cannot be run in DOS mode

87 GhostNet: Backdoor The dropped EXE is loaded as svchost.exe on the victim. It then drops another executable, a device driver. UPX! MZx90 MZx90 This program cannot be run in DOS mode E:\gh0st\Server\Release\install.pdb MZx90 MZx90 e:\gh0st\server\sys\i386\RESSDT.pdb Another embedded EXE Another PDB path

88 Link Analysis “gh0st\” The web reveals Chinese hacker sites that reference the “gh0st\” artifact

89 What do we know… i386 directory is common to device drivers. Other clues: sys directory ‘SSDT’ in the name SSDT means System Service Descriptor Table – this is a common place for rootkits and HIPS products to place hooks. Also, embedded strings in the binary are known driver calls: IoXXXX family KeServiceDescriptorTable ProbeForXXXX KeServiceDescriptorTable is used when SSDT hooks are placed. We know this is a hooker.

90 What do we know… IofCompleteRequest, IoCreateDevice, IoCreateSymbolicLink, and friends are used when the driver communicates to usermode. This means there is a usermode module (a process EXE or DLL) that is used in conjunction with the device driver. When communication takes place between usermode & kernelmode, there will be a device path.

91 Link Analysis “RESSDT” A readme file on Kasperky’s site references a Ressdt rootkit.

92 GhostNet: Screen Capture Algorithm
Loops, scanning every 50th line (cY) of the display. Reads screenshot data, creates a special DIFF buffer LOOP: Compare new screenshot to previous, 4 bytes at a time If they differ, enter secondary loop here, writing a ‘data run’ for as long as there is no match. Offset in screenshot Len in bytes Data….

93 GhostNet: Searching for sourcecode
Large grouping of constants Search source code of the ‘Net

94 GhostNet: Refining Search
Has something to do with audio… Further refine the search by including ‘WAVE_FORMAT_GSM610’ in the search requirements…

95 GhostNet: Source Discovery
We discover a nearly perfect ‘c’ representation of the disassembled function. Clearly cut-and-paste. We can assume most of the audio functions are this implementation of ‘CAudio’ class – no need for any further low-level RE work.

96 Conclusion There is a wealth of information available for fingerprinting various phases and aspects of malware and malware authors Generally you will not find someone to arrest, but you will develop a profile of authors that can be used to identify future malware Malware authors will eventually wise up and reduce/remove/subvert some of the information they are leaving in their binaries

97 Thank You HBGary, Inc. (www.hbgary.com)
HBGary Federal (


Download ppt "Fingerprinting Malware Authors"

Similar presentations


Ads by Google