By: Georg Wicherski Presenting: Rasika Bindoo. Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback.

Slides:



Advertisements
Similar presentations
Object Oriented Programming
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
Functions Function: The strength of C language is to define and use function. The strength of C language is that C function are easy to define and use.
ECE454/CS594 Computer and Network Security Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2011.
Overview of Data Structures and Algorithms
Malware Identification and Classification
Chapter 3 Loaders and Linkers
Mr. D. J. Patel, AITS, Rajkot 1 Operating Systems, by Dhananjay Dhamdhere1 Static and Dynamic Memory Allocation Memory allocation is an aspect of a more.
1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 1 Section 2 – Slide 1 of 22 Chapter 1 Section 2 Observational Studies, Experiments, and.
Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.
Novel Information Attacks From “Carpet Bombings” to “Smart Bombs”
Control Charts for Attributes
Cryptography and Network Security Chapter 11 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
Automated malware classification based on network behavior
1 Measurements and Mitigation of Peer-to-Peer-based Botnets: A Case Study on Storm Worm T. Holz, M. Steiner, F. Dahl, E. Biersack, and F. Freiling - Proceedings.
CAS: A FRAMEWORK OF ONLINE DETECTING ADVANCE MALWARE FAMILIES FOR CLOUD-BASED SECURITY From: First IEEE International Conference on Communications in China:
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
1 Pattern Matching Using n-grams With Algebraic Signatures Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas Schwarz[2] [1] Université Paris Dauphine.
What is a database? An organized collection of data. This can be in an electronic, paper, or other format. Types of databases Operational -constantly changing.
Malicious Code Brian E. Brzezicki. Malicious Code (from Chapter 13 and 11)
Presented by: Kushal Mehta University of Central Florida Michael Spreitzenbarth, Felix Freiling Friedrich-Alexander- University Erlangen, Germany michael.spreitzenbart,
GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.
HASH Functions.
IP Address Lookup Masoud Sabaei Assistant professor
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Research in Business. Introduction to Research Research is simply the process of finding solution to a problem after a thorough study and analysis of.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Virus Detection Mechanisms Final Year Project by Chaitanya kumar CH K.S. Karthik.
Software Function, Source Lines Of Code, and Development Effort Prediction: A Software Science Validation ALLAN J. ALBRECHT AND JOHN E.GAFFNEY,JR., MEMBER,IEEE.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
1 Pattern Matching Using n-gram Sampling Of Cumulative Algebraic Signatures : Preliminary Results Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas.
Monitoring and Evaluation Management of a Training Program.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Attack signatures derived from Metasploit Final Presentation E. Ramirez A. Zoghbi
1 Memory Management (b). 2 Paging  Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
JETT 2005 Session 5: Algorithms, Efficiency, Hashing and Hashtables.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
1 Pre-Exam Lecture 4 Final Examination is scheduled on Monday December 18th at 1:30PM in class 4 There are 8 questions with or without sub- parts and.
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
Object-Oriented Programming Chapter Chapter
Ensemble Learning for Low-level Hardware-supported Malware Detection
BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 1 Section 2 – Slide 1 of 21 Chapter 1 Section 2 Observational Studies, Experiments,
In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.
IT 221: Introduction to Information Security Principles Lecture 5: Message Authentications, Hash Functions and Hash/Mac Algorithms For Educational Purposes.
SEMINAR - SCALABLE, BEHAVIOR-BASED MALWARE CLUSTERING GUIDES : BOJAN KOLOSNJAJI, MOHAMMAD REZA NOROUZIAN, GEORGE WEBSTER PRESENTER RAMAKANT AGRAWAL.
ANTIVIRUS ANTIVIRUS Author: Somnath G. Kavalase Junior Software developer at PBWebvsion PVT.LTD.
Explain Half Adder and Full Adder with Truth Table.
Corrado LeitaSymantec Research Labs Ulrich Bayer Technical University Vienna Engin KirdaInstitute iSecLab.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.
Data Integrity / Data Authentication. Definition Authentication (Signature) algorithm - A Verification algorithm - V Authentication key – k Verification.
Chapter 7 Text Input/Output Objectives
Chapter 1. Basic Static Techniques
COMBINED PAGING AND SEGMENTATION
Chapter 7 Text Input/Output Objectives
Module IV Memory Organization.
Memory Allocation CS 217.
CMSC 491/691 Malware Analysis
Presentation transcript:

By: Georg Wicherski Presenting: Rasika Bindoo

Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow. Thus development of peHash for clustering group instances of the same polymorphic instances.

Other Attempts at Hashing Spamsum, mrshash n-grams Signatures Vx-Class

peHash Function Design The function should have the following design characteristics It should not have the need to look into the contents of the sections. Low computational complexity. Scaling the result of the bzip2 compression ratio to [0…7] С N leads to best matches.

Structural properties The polymorphic malware share the same structural Portable Executable properties. Thus following properties are taken into account for distinction between binaries :  Image characteristics.  Subsystem.  Stack commit size.  Heap commit size.

Structural properties Structural information used for each section in the Portable Executable.  Virtual address  Raw size  Section Characteristics

Generation of hash values hash[0] := characteristics[0…7] V characteristics[8…15] hash[1] := subsystem[0…7] V subsystem [8…15] hash[2] := stackcommit[0…7] V stackcommit[8…15] V stackcommit[24…31] hash[3] := heapcommit[0…7] V heapcommit [8…15] V heapcommit[24…31] ‘V’ symbolizes XOR operation

Generation of hash values Sub-hash shash[0] := virtaddress-9…31] shash[2] := rawsize[8…31] shash[4] := characteristics[16…23] V characteristics[24…31] shash[5] := kolmogorov [0…7] С N

Advantages of this hash function Complexity is O(1). SHA1 of the hash buffer is calculated to obtain the final hash value. Thus difficult to create collisions. Constant length hashes are generated in spite of variable number of sections in the executables.

Entry Points and Imports The value of entry point can be easily changed for each instance of polymorphic specimen. Most packers specify misleading Import Address Tables. The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function. Thus both entry point information and imports are not included in hash function.

Evaluation Cluster Size Mwcollect Alliance Arbor Networks peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.

Evaluation FileMD5Size diantz.exe 48734e9b45dca36 e8a… makecab.exe 2740dc2fbefaddb8 91f… find.exe 09b4e22c86f7e9f1 e5… 9216 print.exe 76b96ed f2 08… 9216 subst.exe 77847ef3cec784b13 7… 9216 bootvrfy.exe c2ab77d9dc66447 dc1… 5120 comrereg.exe 908f0eda6a49625f 98… 5120 dcomcnfg.exe 1178cd20b d… 5120 Files in broken cluster share same size. Differentiation can be done only by looking at actual code or imports. Hence not possible for peHash.

Performance Analysis to be carried out for one sample per peHash cluster. Performance is not related to binary size or section count.

Conclusion peHash provides a performant solution to the problem of seemingly new malware samples. peHash can accomplish correct clustering for large sets by using basic information from Portable Executables. peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.

Thank You