June 2013 Univ. of Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information.

Slides:



Advertisements
Similar presentations
Internet Investigations COEN 252 Computer Forensics  Thomas Schwarz, S.J
Advertisements

Supplied on \web site. on January 10 th, 2008 Customer Security Management Reducing Internet fraud June 1 st, 2008 eSAC Walk Thru © Copyright Prevx Limited.
Prescription for Criminal Justice Forensics. The government has all but declared a national state of emergency regarding computer-related crimes and has.
Malware Identification and Classification
What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.
Parameter Tampering. Attacking the Ecommerce Shopping Cart In the above image we see that a user who wants to purchase a Television visits an online Store.
6 C H A P T E R © 2001 The McGraw-Hill Companies, Inc. All Rights Reserved1 Electronic Mail Electronic mail has revolutionized the way people communicate.
MEPO Training MEPO Database Access Training Presentation Copyright 2011 Rodger B. Fluke, MPA.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Breaking Trust On The Internet
Course 201 – Administration, Content Inspection and SSL VPN Filtering
Online Banking Fraud Prevention Recommendations and Best Practices This document provides you with fraud prevention best practices that every employee.
Cyber X-Force-SMS alert system for threats.
 Malicious or unsolicited mail sent to a mailbox without the option to unsubscribe  Often used as a catch-all of any undesired or questionable mail.
SOA Security Chapter 12 SOA for Dummies. Outline User Authentication/ authorization Authenticating Software and Data Auditing and the Enterprise Service.
Recommender systems Ram Akella November 26 th 2008.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
June is an easy way to communicate. It costs nothing to send an , but it does require a connection to the Internet. You can.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Spam Reduction Techniques Using greylisting and SpamAssassin.
Norman SecureTide Powerful cloud solution to stop spam and threats before it reaches your network.
Chapter Objectives Explain Web page multimedia issues
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Examining the Effectiveness and Techniques of the Anti-Phishing Technology in Leading Web Browsers and Security Toolbars. Wesley W. Owen
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Web Design, 3 rd Edition 6 Multimedia and Interactivity Elements.
Consumer Behavior, Market Research
Speaker : YUN–KUAN,CHANG Date : 2009/10/13 Working the botnet: how dynamic DNS is revitalising the zombie army.
B OTNETS T HREATS A ND B OTNETS DETECTION Mona Aldakheel
Cyber Crimes.
Department of Computer Sciences The University of Texas at Austin Zmail : Zero-Sum Free Market Control of Spam Benjamin J. Kuipers, Alex X. Liu, Aashin.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Lecture 10: 9/26/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
Database Application Security Models Database Application Security Models 1.
Jeopardy Computer Internet Policy & Legal Potpourri Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 11 09/27/2011 Security and Privacy in Cloud Computing.
Slide 3-1 Chapter 3 Terms Electronic Commerce and Internet Technologies Introduction to Information Systems Judith C. Simon.
Malware Targets Bank Accounts GAMEOVER!!. GameOver Cyber criminals have found yet another way to steal your hard-earned money: a recent phishing scheme.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
CSC350: Learning Management Systems COMSATS Institute of Information Technology (Virtual Campus)
Cybercrime What is it, what does it cost, & how is it regulated?
Malicious Spam: The Impact of Prosecuting Spammers on Fraud and Malware Contained in Spam Alex Kigerl, PhD Washington State University
Lecture2 Networking. Overview and spam World Wide Web Censorship Freedom of expression Children and inappropriate content Breaking trust on the.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Any criminal action perpetrated primarily through the use of a computer.
THE LARGEST NAME SERVICE ACTING AS A PHONE BOOK FOR THE INTERNET The Domain Name System click here to next page 1.
Remember effective ways to search +walk (includes words) Intitle:iPad Intext:ipad site:pbs.org Site:gov filetype:jpg.
General Architecture of Retrieval Systems 1Adrienn Skrop.
VIRTUAL SERVERS Chapter 7. 2 OVERVIEW Exchange Server 2003 virtual servers Virtual servers in a clustering environment Creating additional virtual servers.
Analysing s Michael Jones. Overview How works Types of crimes associated with Mitigations Countermeasures Michael Jones2Analsysing s.
Created by the E-PoliceSlide 122 February, 2012 Dangers of s By Michael Kuc.
18-1 PRENTICE HALL ©2008 Pearson Education, Inc. Upper Saddle River, NJ FORENSIC SCIENCE An Introduction By Richard Saferstein.
Web Design, 5 th Edition 6 Multimedia and Interactivity Elements.
Internet Vulnerabilities & Criminal Activity Internet Forensics 12.1 April 26, 2010 Internet Forensics 12.1 April 26, 2010.
Dec 14, 2014, Harvard University
A Project on CYBER SECURITY
E-commerce | WWW World Wide Web - Concepts
SQL INJECTION ATTACKS.
E-commerce | WWW World Wide Web - Concepts
Demo Advanced Threat Protection
Information Security Session October 24, 2005
Executive Admin Assistant
Executive Admin Assistant
Cybersecurity Simplified: Phishing
Presentation transcript:

June 2013 Univ. of Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information Sciences University of Alabama at Birmingham

We offer BS and MS degrees with an emphasis on forensics; the Criminal Justice Department participates in these programs. Research center: CIA/JFR: Gary Warner Blog “Cyber Crime and Doing Time” My research Spam Phishing Malware June 2013 Univ. of Birmingham2 Computer Forensics at UAB

June 2013 Univ. of Birmingham3 Outline This presentation will describe my research interests in spam and malware. The next 9 slides: spam. Subsequent slides: malware.

June 2013 Univ. of Birmingham4 Spam and the criminal web 70-80% of all in the world is spam. Spam enables various classes of antisocial activity: Spam advertises opportunities to buy counterfeit goods, for example, pills (possibly adulterated pills) Spam delivers phish, which commonly are intended to steal credentials to banks and other financial institutions. Spam delivers malware.

People commonly expect our research to be classification of s as ham or spam: desired or undesired. They then expect us to help filter , so that spam will not be delivered. That is not our research. Instead, we start with a data file that we expect is entirely spam, and our goal is to cluster it into spam campaigns. This is an important goal, because after we understand the various spam campaigns, we know which are the largest, and we know what type of criminal activity each campaign enables. This enabled law enforcement to focus attention on the most harmful campaigns. June 2013 Univ. of Birmingham5 Spam: Clustering, not Classification

Background on Data Mining Data Mining studies the challenges and opportunities offered by huge data files. Three methods are central to Data Mining. Clustering: group together records in the data file if they resemble each other (without knowing the “meaning” of any resulting group, called a cluster). Classification: assign each record to one of several “classes”, each of which corresponds to a known type of data. Frequent sets and association rules June 2013 Univ. of Birmingham6

Our spam data Each day: 1 million spam messages Stored into UAB Spam Data Mine June 2013 Univ. of Birmingham7

June 2013 Univ. of Birmingham8 Preprocessing of spam data Parsing Subject Sender IP Sendername If body contains a URL: Its domain, and IP Word count of body

Some spams, parsed Subject Sender Sender Name Username Order HCG online y5fh6 EfrenGriffith artq.com Order HCG online vfe3ih Victor musicradio.com Pfizer Inc Discount lefley uab.edu Buy Cialis Online Tam Smith adeptis.com Your LinkedIn blocked John Fial irs.gov June 2013 Univ. of Birmingham9

June 2013 Univ. of Birmingham10 Goal, for the Spam Data Mine Cluster each day’s s, to find largest spam campaigns, and then to find clues: where are they coming from? Relate each day’s clusters to the previous day’s clusters. Any new types of spam are considered “emerging threats”.

June 2013 Univ. of Birmingham11 Largest Cluster on a particular day

June 2013 Univ. of Birmingham12 Why Is This Work Useful? A large number of domains used by leading spammers to counter domain blacklisting Shutdown of those domains and their hosting servers can greatly cripple spammers’ ability to conduct spam-related cyber crimes. Further investigation of domains and IP addresses may lead to the identities of spammers.

June 2013 Univ. of Birmingham13 Transition Spam clustering is an ongoing project. A different thrust is the study of malware. I describe two methods of static analysis of malware: using blocks and jumps (slide 16), and using strings (slides 17-23).

June 2013 Univ. of Birmingham14 Malware What is malware? A program that performs actions that the user does not want Executable file, i.e., machine code Each day, we add 5000 new malwares to our database Two types of analysis: Static analysis Dynamic analysis

June 2013 Univ. of Birmingham15 Goals Malwares belong to families, such as Zeus, Reveton, Perfect keylogger Eventual goal: Put each malware into its family. Current goal: Cluster malwares, based on their strings.

Static Analysis, using Blocks and Jumps Method to encode malwares: Jumps (e.g. subroutines, and subroutine calls) Disassemble each malware, split it into “blocks”, compute a hash value for each block. Also find each jump, and write which block it is from and which it is to. Result: each malware is a directed graph. When malwares are encoded this way, malwares will be clustered together if their graphs are similar. July 2013 Univ. of Birmingham16

Static Analysis, using strings of printable characters at least 4 characters long, ending with \0 cxczxczxczxcc Enter %d-%02d-%02d_%02d-%02d-%02d-%d JPEG Image saved successfully!^ Screenshot saving cancelled because of logging disabled.^ COXJPEGFile::fill_input_buffer : Catching CFileException^ %d-%d-%d_%d-%d-%d _controlfp Password: June 2013 Univ. of Birmingham17

June 2013 Univ. of Birmingham18 Data File for 1 Day Each row is the list of strings in one malware. A sample file of 5000 malwares looks like: m1: cxczxczxczxcc, Enter, _controlfp, …. m2: ……………. m3: ……………. m4: …………….. m5000: ………….

Frequent sets A typical application is retail data. Data File: Purchases at a large store. Each record: List of purchases of one customer. Question: Which items are often bought together? Our application: malware. Our data file: Strings in malwares. Each record: List of strings of one malware. Question: Which strings are often found together? Dual Question: which malwares have many common strings? June 2013 Univ. of Birmingham19

Frequent sets: Tiny example 6 malwares (so 6 records), 4 strings. The malwares: a, b, c, d b, c, d a, c, d a, b c, d b, d July 2013 Incidence matrix a b c d Univ. of Birmingham20

Frequent sets: Tiny example Strings a,c are a frequent set (records r1 and r3 contain both) But a,c is not maximal, because d is in both records Incidence matrix a b c d r1 *1 1 *1 *1 r r3 *1 0 *1 *1 r r r Univ. of Birmingham21

Closed frequent sets A frequent set is closed if it equals the intersection of the records containing it. Alternate definition: a closed set is a maximal all-ones submatrix. Since rows and columns play the same role in this, one can let malwares and strings exchange roles. Ex: Incidence matrix a b c d r1 *1 1 *1 *1 r r3 *1 0 *1 *1 r r r July 2013 Univ. of Birmingham22

Closed Frequent Sets for Malware Analysis Wanted closed frequent sets, with threshold 30. The lowest the state-of-the-art algorithm could do was By being willing to discard strings that appear more than 10 times, we recently managed threshold 20. Ongoing June 2013 Univ. of Birmingham23

The end. July 2008 Univ. of Birmingham24