Intelligent Detection of Malicious Script Code CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec.

Slides:



Advertisements
Similar presentations
Cross-Site Scripting Issues and Defenses Ed Skoudis Predictive Systems © 2002, Predictive Systems.
Advertisements

By Philipp Vogt, Florian Nentwich, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna Network and Distributed System Security(NDSS ‘07)
Manufacturing Productivity Solutions Management Metrics for Lean Manufacturing Companies Total Productive Maintenance (T.P.M.) Overall Equipment Effectivity.
A Crawler-based Study of Spyware on the Web Author: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, Henry M.Levy Presented At: NDSS, 2006 Prepared.
Server-Side vs. Client-Side Scripting Languages
1 Chapter 12 Working With Access 2000 on the Internet.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec.
By Morris Wright, Ryan Caplet, Bryan Chapman. Overview  Crawler-Based Search Engine (A script/bot that searches the web in a methodical, automated manner)
Crawler-Based Search Engine By: Bryan Chapman, Ryan Caplet, Morris Wright.
Spring Definitions  Virus  A virus is a piece of computer code that attaches itself to a program or file so it can spread.
Module 6 Windows 2000 Professional 6.1 Installation 6.2 Administration/User Interface 6.3 User Accounts 6.4 Managing the File System 6.5 Services.
Traffic Sign Recognition Jacob Carlson Sean St. Onge Advisor: Dr. Thomas L. Stewart.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.
INTRUSION DETECTION SYSTEMS Tristan Walters Rayce West.
Microsoft Office Project 2003 Resource Kits James Scott & Roy Riley Technical Content Development Microsoft Corporation.
Module 8: Implementing Administrative Templates and Audit Policy.
1 Archive-It Training University of Maryland July 12, 2007.
Secure Search Engine Ivan Zhou Xinyi Dong. Project Overview  The Secure Search Engine project is a search engine that utilizes special modules to test.
Simple Computer Maintenance. Common Computer Clean up Tasks Disk Clean – up Anti-virus scan Deleting Cookies.
BOTNETS & TARGETED MALWARE Fernando Uribe. INTRODUCTION  Fernando Uribe   IT trainer and Consultant for over 15 years specializing.
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
TC2-Computer Literacy Mr. Sencer February 8, 2010.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Computer Viruses Preetha Annamalai Niranjan Potnis.
WEB SECURITY WEEK 3 Computer Security Group University of Texas at Dallas.
Niels Provos and Panayiotis Mavrommatis Google Google Inc. Moheeb Abu Rajab and Fabian Monrose Johns Hopkins University 17 th USENIX Security Symposium.
A Crawler-based Study of Spyware on the Web A.Moshchuk, T.Bragin, D.Gribble, M.Levy NDSS, 2006 * Presented by Justin Miller on 3/6/07.
A Crawler-based Study of Spyware on the Web Authors: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, and Henry M. Levy University of Washington 13.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Database-Driven Web Sites, Second Edition1 Chapter 5 WEB SERVERS.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Web Search Algorithms By Matt Richard and Kyle Krueger.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Deep Web Exploration Dr. Ngu, Steven Bauer, Paris Nelson REU-IR This research is funded by the NSF REU program AbstractOur Submission Technique Results.
ASP. What is ASP? ASP stands for Active Server Pages ASP is a Microsoft Technology ASP is a program that runs inside IIS IIS stands for Internet Information.
Data Science Background and Course Software setup Week 1.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Matthew Glenn AP2 Techno for Tanzania This presentation will cover the different utilities on a computer.
Computer Hope Copyright © Cannady ACOS. All rights reserved. (R1: July 2011)
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Secure Search Engine Ivan Zhou Xinyi Dong. Project Overview  The Secure Search Engine project is a search engine that utilizes special modules to test.
Policies and Security for Internet Access
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
By Collin Donaldson. Hacking is only legal under the following circumstances: 1.You hack (penetration test) a device/network you own. 2.You gain explicit,
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Quantifying the Fingerprintability of Browser Extensions
Analyzing WebView Vulnerabilities in Android Applications
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Recitation on AdFisher
Introduction to JavaScript
Cross Site Request Forgery (CSRF)
Implementation Plan system integration required for each iteration
Presentation transcript:

Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec

Outline for Project Phase I : Setup Set up machine for testing environment Set up machine for testing environment Ensure that “whitelist” is clean Ensure that “whitelist” is clean Phase II : Crawling Modify crawler to output only necessary data. This means: Modify crawler to output only necessary data. This means: Grab only necessary information from webcrawling resultsGrab only necessary information from webcrawling results Listen into Internet Explorer’s Javascript interpreter and output relevant behaviorListen into Internet Explorer’s Javascript interpreter and output relevant behavior Phase III: Database Research and develop an effective structure for storing data and link it to webcrawler Research and develop an effective structure for storing data and link it to webcrawler Phase IV: Analysis Research and develop an effective algorithm for learning from massive amounts of data Research and develop an effective algorithm for learning from massive amounts of data

Completed Tasks – First Quarter Phase I Configured machine with Norton Antivirus and Heritrix web crawler Configured machine with Norton Antivirus and Heritrix web crawler Webcrawler will be used to grab additional URLs, and Norton Antivirus will be used to verify that a URL has not launched an attackWebcrawler will be used to grab additional URLs, and Norton Antivirus will be used to verify that a URL has not launched an attack Created a Python script to ensure that visited sites are clean Created a Python script to ensure that visited sites are clean Captures Norton’s web attack logs before and after loading a site in Internet Explorer, then compares the logs for new entries and signals whether or not a site’s data should be discardedCaptures Norton’s web attack logs before and after loading a site in Internet Explorer, then compares the logs for new entries and signals whether or not a site’s data should be discarded Phase II Configured Heritrix to run specific crawls that target a set of domains, and output minimal information Configured Heritrix to run specific crawls that target a set of domains, and output minimal information The purpose is to gather as many URLs with scripts as possible for a large sample baseThe purpose is to gather as many URLs with scripts as possible for a large sample base Created a parser for Heritrix logs to filter out irrelevant websites Created a parser for Heritrix logs to filter out irrelevant websites For example, we are omitting URLs that point to images since they will not contain scriptsFor example, we are omitting URLs that point to images since they will not contain scripts

Completed Tasks – Second Quarter Phase I Whitelist: integrated Symantec component to check whether visited site is malicious, so all of the data we gather is from clean sources Whitelist: integrated Symantec component to check whether visited site is malicious, so all of the data we gather is from clean sources Hard drive: installed a 750 GB hard drive Hard drive: installed a 750 GB hard drive

Completed Tasks – Second Quarter Phase II Crawling: We ran a shallow crawl with 200 domains as seed, and that is the current base of our data. The result was 18,500 URLs that we run through with our Script Listening component Crawling: We ran a shallow crawl with 200 domains as seed, and that is the current base of our data. The result was 18,500 URLs that we run through with our Script Listening component

Completed Tasks – Second Quarter Phase II Script Listening: received a customizable tool from Symantec that listens to the Javascript interpreter in Internet Explorer Script Listening: received a customizable tool from Symantec that listens to the Javascript interpreter in Internet Explorer We modified it to output the information we need: We modified it to output the information we need: GUID -> DISPID -> ArgType -> ArgVal

Completed Tasks – Second Quarter Example of data: DISPID(function)GUID(object) # of Args Arg Type Arg Value f55f- 98b5- 11cf- bb82- 00aa00b dce0b 1BSTR130

Completed Tasks – Second Quarter Phase III The amount of data we have gotten is too large to use in a database. The pure text file is 4GB (~50 million function calls), and querying such a database is too slow on the computer we have. The amount of data we have gotten is too large to use in a database. The pure text file is 4GB (~50 million function calls), and querying such a database is too slow on the computer we have. Instead, we are storing the data as a text file, and doing operations on it with Python scripts. Instead, we are storing the data as a text file, and doing operations on it with Python scripts.

Results and Findings – Second Quarter Phase IV We have analyzed data from our first two result sets We have analyzed data from our first two result sets Crawl with 5 initial seedsCrawl with 5 initial seeds 3,476,348 function calls 3,476,348 function calls 109 distinct GUIDs, 7364 GUID-DispID pairs 109 distinct GUIDs, 7364 GUID-DispID pairs Crawl with 15 initial seedsCrawl with 15 initial seeds 3,706,454 function calls 3,706,454 function calls 95 distinct GUIDS, 5575 GUID-DispID pairs 95 distinct GUIDS, 5575 GUID-DispID pairs Looked at most common functions, most common int-argument functions, and distribution of the argument values for these functions Looked at most common functions, most common int-argument functions, and distribution of the argument values for these functions

Results and Findings – Second Quarter Function 1: Function 1: GUID: 3050f55d-98b5-11cf-bb82-00aa00bdce0bGUID: 3050f55d-98b5-11cf-bb82-00aa00bdce0b GUID object name: DispHTMLWindow2GUID object name: DispHTMLWindow2 DispID: 1103DispID: 1103 Most popular int-argument function in both result sets Most popular int-argument function in both result sets Mostly random distribution, but signs of regularity Mostly random distribution, but signs of regularity Results from two sets show significant differences Results from two sets show significant differences

Results and Findings – Second Quarter

Function 2: Function 2: GUID: 3050f55f-98b5-11cf-bb82-00aa00bdce0bGUID: 3050f55f-98b5-11cf-bb82-00aa00bdce0b GUID object name: DispHTMLDocumentGUID object name: DispHTMLDocument DispID: 1013DispID: 1013 Second most popular int-argument function in both result sets Second most popular int-argument function in both result sets Shows a regular distribution with distinct characteristics Shows a regular distribution with distinct characteristics Results from two sets show significant differences Results from two sets show significant differences

Results and Findings – Second Quarter

Function 3: Function 3: GUID: 3050f51b-98b5-11cf-bb82-00aa00bdce0bGUID: 3050f51b-98b5-11cf-bb82-00aa00bdce0b GUID object name: DispHTMLIFrameGUID object name: DispHTMLIFrame Dispid: Dispid: Third most popular int-argument function 1 st result set, 95 th most popular in 2 nd result set Third most popular int-argument function 1 st result set, 95 th most popular in 2 nd result set Shows a random distribution with distinct characteristics Shows a random distribution with distinct characteristics Results are dramatically different between data sets Results are dramatically different between data sets All arguments in the 2 nd result set are 0All arguments in the 2 nd result set are 0

Results and Findings – Second Quarter

Found significant differences between the data sets in both the frequencies of specific functions, and the arguments of specific functions Found significant differences between the data sets in both the frequencies of specific functions, and the arguments of specific functions Suspect that differences result from biases due to small amount of original seeds (5 and 15) Suspect that differences result from biases due to small amount of original seeds (5 and 15) Ran a much broader crawl (200 seeds) in hopes of getting more general, unbiased results Ran a much broader crawl (200 seeds) in hopes of getting more general, unbiased results Just from partial results of this crawl (roughly 8000 websites), we have so far found:Just from partial results of this crawl (roughly 8000 websites), we have so far found: A much larger average of calls to our listener per website A much larger average of calls to our listener per website A large percentage of function calls that take 0 arguments A large percentage of function calls that take 0 arguments Will post complete results once crawl is finishedWill post complete results once crawl is finished

Direction for Next Quarter Further analyze the gathered data for patterns Further analyze the gathered data for patterns Compare trends in “normal” data to what occurs in malicious scripts Compare trends in “normal” data to what occurs in malicious scripts