Intelligent Detection of Malicious Script Code CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec.

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
S6 - 1© 2011 Pearson Education, Inc. publishing as Prentice Hall S6 Statistical Process Control PowerPoint presentation to accompany Heizer and Render.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche.
Calculating & Reporting Healthcare Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Measures of Variability. Why are measures of variability important? Why not just stick with the mean?  Ratings of attractiveness (out of 10) – Mean =
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
The one sample t-test November 14, From Z to t… In a Z test, you compare your sample to a known population, with a known mean and standard deviation.
STANDARD SCORES AND THE NORMAL DISTRIBUTION
Statistical Process Control
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Measures of Central Tendency
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Chapter 2 Describing Data with Numerical Measurements
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Chapter 11 Descriptive Statistics Gay, Mills, and Airasian
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
Review Measures of central tendency
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
Descriptive Statistics: Numerical Methods
Statistics Measures Chapter 15 Sections
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
1.1 EXPLORING STATISTICAL QUESTIONS Unit 1 Data Displays and Number Systems.
By: Amani Albraikan 1. 2  Synonym for variability  Often called “spread” or “scatter”  Indicator of consistency among a data set  Indicates how close.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Categorical vs. Quantitative…
INVESTIGATION 1.
Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Slide 1 Copyright © 2004 Pearson Education, Inc..
1 Chapter 4 Numerical Methods for Describing Data.
PCB 3043L - General Ecology Data Analysis.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Time Series - A collection of measurements recorded at specific intervals of time. 1. Short term features Noise: Spike/Outlier: Minor variation about.
Introduction to statistics I Sophia King Rm. P24 HWB
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Descriptive Statistics(Summary and Variability measures)
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Exploratory Data Analysis
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Get out your notes we previously took on Box and Whisker Plots.
PCB 3043L - General Ecology Data Analysis.
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Process Capability.
Lesson 1: Summarizing and Interpreting Data
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Tukey Control Chart Farrokh Alemi, Ph.D.
Presentation transcript:

Intelligent Detection of Malicious Script Code CS194, Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Sponsored by Symantec

Outline for Project Phase I : Setup Set up machine for testing environment Set up machine for testing environment Ensure that “whitelist” is clean Ensure that “whitelist” is clean Phase II : Crawling Modify crawler to output only necessary data. This means: Modify crawler to output only necessary data. This means: Grab only necessary information from webcrawling results Grab only necessary information from webcrawling results Listen into Internet Explorer’s Javascript interpreter and output relevant behavior Listen into Internet Explorer’s Javascript interpreter and output relevant behavior Phase III: Database Research and develop an effective structure for storing data and link it to webcrawler Research and develop an effective structure for storing data and link it to webcrawler Phase IV: Analysis Research trends for normalcy and investigate possible heuristics Research trends for normalcy and investigate possible heuristics

Approach to Project First Quarter : Infrastructure Second Quarter : Data Gathering Third Quarter : Data Analysis (Note: some overlap between quarters)

Infrastructure Internet Explorer 7, Windows XP SP2 Professional Internet Explorer 7, Windows XP SP2 Professional Main testing environment Main testing environment Norton Antivirus Norton Antivirus Protects against malicious files and scripts Protects against malicious files and scripts Can access logs to determine which sites launched attacks Can access logs to determine which sites launched attacks Integrated into automated site visiting Integrated into automated site visiting

Infrastructure CanaryCallback.dll CanaryCallback.dll Plugin into Internet Explorer Plugin into Internet Explorer Able to access most data received by low-level Javascript interpreter Able to access most data received by low-level Javascript interpreter The function being called (DISPID) The function being called (DISPID) The class that the function belongs to (GUID) The class that the function belongs to (GUID) The list of types and values of parameters passed into the function. Examples: The list of types and values of parameters passed into the function. Examples: VT_I4: 4-byte integerVT_I4: 4-byte integer VT_BSTR: Byte stringVT_BSTR: Byte string VT_DISPATCH: ObjectVT_DISPATCH: Object Large part of first and second quarter was spent programming, debugging, and maintaining the functions that would handle the data Large part of first and second quarter was spent programming, debugging, and maintaining the functions that would handle the data Functions to grab data type Functions to grab data type Functions to parse data values (some stored in bitstreams) Functions to parse data values (some stored in bitstreams) Functions to output data to file Functions to output data to file If types did not have an obvious output format (i.e. VT_DISPATCH), we had to create one that would accurately represent as many components of the data as possibleIf types did not have an obvious output format (i.e. VT_DISPATCH), we had to create one that would accurately represent as many components of the data as possible

Infrastructure Python Python Scripting language Scripting language Designed to handle parsing with ease Designed to handle parsing with ease Script for infrastructure was used to perform three tasks: Script for infrastructure was used to perform three tasks: Launch Internet Explorer (uses the cPAMIE engine), load website, then close Internet Explorer Launch Internet Explorer (uses the cPAMIE engine), load website, then close Internet Explorer Access and parse Norton’s web attack logs for any attacks launched by website Access and parse Norton’s web attack logs for any attacks launched by website Sort script data from CanaryCallback DLL based on DLL data and attack logs (Was there an attack? Did any scripts run? Etc.) Sort script data from CanaryCallback DLL based on DLL data and attack logs (Was there an attack? Did any scripts run? Etc.) Heretrix Heretrix Open-source webcrawler with high customizability Open-source webcrawler with high customizability Can run specific crawls that target a set of domains, and output minimal information Can run specific crawls that target a set of domains, and output minimal information Uses HTTP requests; does not render crawled sites Uses HTTP requests; does not render crawled sites The purpose is to gather as many URLs with scripts as possible for a large sample base The purpose is to gather as many URLs with scripts as possible for a large sample base

Infrastructure: Crawler Heretrix raw data Heretrix parsed dataWWW CrawlerPython parser Step 0: URL queue is “seeded” with domain list URL queue Step 1: Grab URL from queue Step 2: Grab source from URL Step 3: Append URLs to log data and URL queue iff they satisfy our set of rules Step 4: Get rid of excess data, leaving only URL information for each site, and output to new file Repeat steps 1-4 until crawl limit is reached.

Infrastructure: Gatherer Python controller Norton Antivirus: CanaryCallback data Heretrix parsed dataInternet Explorer 7 Norton Antivirus: Logs Formatted output Step 1: Python script grabs site from crawl data Step 2: cPAMIE component loads IE and sends it to specified site Step 3: IE7 Javascript interpreter outputs to file containing all DLL data Step 4: IE7 informs PAMIE that it is finished; Python kills IE7 Step 5: Python analyzes callback data and logs to decide whether a site is clean, dirty, or has no scripts Step 6: Python outputs sorted and formatted data to relevant files for future analysis Repeat steps 1-6 until URL list is exhausted.

Data gathering Heretrix crawls Heretrix crawls First crawl: 5 seeds, depth 5 First crawl: 5 seeds, depth 5 5 million sites found 5 million sites found Second crawl: 10 seeds, depth 3 Second crawl: 10 seeds, depth 3 3 million sites found 3 million sites found Third crawl: 200 seeds, depth 1 Third crawl: 200 seeds, depth 1 18,500 sites found 18,500 sites found Fourth crawl: 200 seeds, depth 2 Fourth crawl: 200 seeds, depth 2 3 million sites found 3 million sites found First two crawls produced data that was biased towards large, interlinked sites; the last two broad crawls were run to remedy this. First two crawls produced data that was biased towards large, interlinked sites; the last two broad crawls were run to remedy this. CanaryCallback gathering CanaryCallback gathering For first and second crawls, a chosen set of 1,000 or so sites were run through by gatherer component. For first and second crawls, a chosen set of 1,000 or so sites were run through by gatherer component. For third crawl, all sites (18,500) were processed by gatherer For third crawl, all sites (18,500) were processed by gatherer For fourth crawl, several tasks were performed: For fourth crawl, several tasks were performed: 20,000 sites were processed by gatherer 20,000 sites were processed by gatherer In mid-May, the same 1000 sites were processed 28 times (about 4 times per day) from May 7 to May 13 In mid-May, the same 1000 sites were processed 28 times (about 4 times per day) from May 7 to May 13

Data analysis setup CanaryCallback data analysis CanaryCallback data analysis Main choice for parsing data was Python scripting language Main choice for parsing data was Python scripting language Too much data for MS Access or even MySQL Too much data for MS Access or even MySQL Python scripts were developed to facilitate analysis in manner similar to SQL Python scripts were developed to facilitate analysis in manner similar to SQL Scripts to aggregate data sets and frequencies Scripts to aggregate data sets and frequencies Scripts to calculate various metrics of data sets, such as: Scripts to calculate various metrics of data sets, such as: Smallest data pointSmallest data point Largest data pointLargest data point Average data pointAverage data point Variance of data pointVariance of data point Total data pointsTotal data points Sum of data pointsSum of data points Scripts to output to file in Excel spreadsheet (CSV) for deeper analysis Scripts to output to file in Excel spreadsheet (CSV) for deeper analysis

Individual data analysis Third quarter and last half of second quarter were spent focusing on as wide a range of data as possible Third quarter and last half of second quarter were spent focusing on as wide a range of data as possible To accomplish this, our group split up and pursued a different line of research individually To accomplish this, our group split up and pursued a different line of research individually Individual presentations will follow: Individual presentations will follow: Eyal: Activity categorization Eyal: Activity categorization Benson: Integer argument trend analysis Benson: Integer argument trend analysis Kamron: Byte string argument trend analysis Kamron: Byte string argument trend analysis

Activity Categorization

Activity Analysis There is an obvious connection between a function and the site using it There is an obvious connection between a function and the site using it Is it possible to quantify this relationship, and establish whether certain functions are used in a specific kind of site? Is it possible to quantify this relationship, and establish whether certain functions are used in a specific kind of site? Characterize a site based on how active it is; i.e, how many function calls are made while the site is loaded Characterize a site based on how active it is; i.e, how many function calls are made while the site is loaded Does there exist a pattern in the data that will be able to distinguish an abnormal usage of any function based on the characteristic of the site? Does there exist a pattern in the data that will be able to distinguish an abnormal usage of any function based on the characteristic of the site?

Site Function Usage Statistics Minus outliers: none Minus outliers: none Three Standard Deviations below: 0 Two Standard Deviations below: 0 One Standard Deviation below: One Standard Deviation above: 1633 Two Standard Deviations above: 510 Three Standard Deviations above: 296 Normal distribution outliers: 323 Total number of sites: Average function calls per site: 5777 Average function calls per function: 1984 Standard deviation of function calls per function: Standard deviation of function calls per site: Median: 1456 First quartile: 438 Third quartile: 4029 Interquartile range: 3591 Minus outliers: none Lower whisker starts at: 0 Upper whisker ends at: 9365 “ Box and whisker ” outliers: 2048

Correlation analysis Related each function to the site calling it using the number of function calls on that site Related each function to the site calling it using the number of function calls on that site Each tuple consisted of the number of times a function was called at a particular site, and the number of total function calls that were made at that site Each tuple consisted of the number of times a function was called at a particular site, and the number of total function calls that were made at that site The correlation between the variables in the tuple was made for each individual function The correlation between the variables in the tuple was made for each individual function Many functions were not common, and so not enough data was available to make a conclusion about them Many functions were not common, and so not enough data was available to make a conclusion about them For the functions that had enough (over 100) sites that called them, the correlation values were between.004 and -.01, showing no correlation between the function and the script activity of the site calling it For the functions that had enough (over 100) sites that called them, the correlation values were between.004 and -.01, showing no correlation between the function and the script activity of the site calling it

Function Usage Amount An interesting trend arose when analyzing the correlation data An interesting trend arose when analyzing the correlation data There are functions that are called hundreds/thousands of times There are functions that are called hundreds/thousands of times Despite this, sites seem to call a specific function only a couple times. Despite this, sites seem to call a specific function only a couple times. Example: Example: GUID 3050f3fd-98b5-11cf-bb82-00aa00bdec0b, DISPID 1 GUID 3050f3fd-98b5-11cf-bb82-00aa00bdec0b, DISPID 1 Called 346 times, only in 11 sites is it called more than 3 times (3.2%) Called 346 times, only in 11 sites is it called more than 3 times (3.2%)

Categorization Approach Since no correlation was found, another approach was taken Since no correlation was found, another approach was taken According to trends in the script activity data, divide the sites into distinct categories According to trends in the script activity data, divide the sites into distinct categories Examine the function behavior in each category, as opposed to individual sites Examine the function behavior in each category, as opposed to individual sites Three categories were chosen, roughly along the median and the end of the third quartile Three categories were chosen, roughly along the median and the end of the third quartile This gave one category 50% of the data, while the other two had 25% of the data This gave one category 50% of the data, while the other two had 25% of the data An attempt to avoid bias toward the extremely script-heavy sites An attempt to avoid bias toward the extremely script-heavy sites

Categorization Heuristic A heuristic was developed to determine whether a function would be more likely to appear in a certain category A heuristic was developed to determine whether a function would be more likely to appear in a certain category F =((avgl - avgsite)*(L - avgfunc)+(avgm - avgsite)*(M - avgfunc)+(avgh - avgsite)*(H - avgfunc)) / 3 avgl, avgm, and avgh are the average number of function calls per category (542, 2882, and respectively) avgl, avgm, and avgh are the average number of function calls per category (542, 2882, and respectively) avgsite is the overall average number of function calls per site (5777) avgsite is the overall average number of function calls per site (5777) avgfunc is the avg number of function calls per function (1984). avgfunc is the avg number of function calls per function (1984). L, M, and H are the specific number of times the function was called in the low, medium, and high category L, M, and H are the specific number of times the function was called in the low, medium, and high category

Statistical Variation Among Categories The heuristic separated out the functions into three distinct sections The heuristic separated out the functions into three distinct sections Along the higher values were mostly functions that had few arguments supplied Along the higher values were mostly functions that had few arguments supplied In the middle, there were whole objects represented (a GUID, and all of its related function calls) In the middle, there were whole objects represented (a GUID, and all of its related function calls) At the lowest negative values were functions that were commonly called with arguments At the lowest negative values were functions that were commonly called with arguments

Argument Distributions A further analysis was done on whether there exists a difference in the behavior of a function in the separate categories A further analysis was done on whether there exists a difference in the behavior of a function in the separate categories The distributions of BSTR (Byte String) lengths and I4 (4-byte Integer) values were considered The distributions of BSTR (Byte String) lengths and I4 (4-byte Integer) values were considered Several functions were examined, but this specific one (referred to as “Second”, as it had the second highest heuristic value) is exemplary of the trends noticed Several functions were examined, but this specific one (referred to as “Second”, as it had the second highest heuristic value) is exemplary of the trends noticed The argument type frequency of “Second”: The argument type frequency of “Second”: LOW: 0 arguments: I4 arguments: 0 BSTR arguments: 2634 DISPATCH arguments: 14 NULL arguments: 0 BOOL arguments: 0 MID: 0 arguments: I4 arguments: 0 BSTR arguments: 9888 DISPATCH arguments: 1 NULL arguments: 0 BOOL arguments: 0 HIGH: 0 arguments: I4 arguments: 0 BSTR arguments: 9447 DISPATCH arguments: 19 NULL arguments: 0 BOOL arguments: 0

Conclusions of Approach The trend seen is that there is no major statistical difference in the argument value distribution among the categories, but there are distinct characteristic differences seen The trend seen is that there is no major statistical difference in the argument value distribution among the categories, but there are distinct characteristic differences seen Functions that appear more commonly in less- active sites tend to have arguments supplied to them Functions that appear more commonly in less- active sites tend to have arguments supplied to them No general correlation exists between functions and how active the site calling it is No general correlation exists between functions and how active the site calling it is There may exist correlation in some other characteristic, however There may exist correlation in some other characteristic, however

Integer analysis

Functions through Three Sets Looked through 3 of the runs: Looked through 3 of the runs: 5 seeds, depth 5:1,324 sites 5 seeds, depth 5:1,324 sites 10 seeds, depth 3:1,184 sites 10 seeds, depth 3:1,184 sites 200 seeds, depth 1:15,790 sites 200 seeds, depth 1:15,790 sites Picked three most common functions with integer arguments of the first run to analyze Picked three most common functions with integer arguments of the first run to analyze Goal: Look for consistency throughout function behavior in differing sets of sites Goal: Look for consistency throughout function behavior in differing sets of sites

Functions through Three Sets In all three data sets, the values of the argument had a very large range, from 0 to the millions or billions In all three data sets, the values of the argument had a very large range, from 0 to the millions or billions Distributions did not stay consistent through sets, all had differing commonly occurring values Distributions did not stay consistent through sets, all had differing commonly occurring values

Functions through Three Sets Similar pattern in all 3 sets Similar pattern in all 3 sets Low values were used Low values were used Numbers near 0 most common, occurrences drop off as values get larger Numbers near 0 most common, occurrences drop off as values get larger

Functions through Three Sets Values range from 0 to in the hundreds Values range from 0 to in the hundreds Second data set did not have enough data Second data set did not have enough data Similar common numbers in both sets: 3, 300, and 728 Similar common numbers in both sets: 3, 300, and 728

Patterns in DISPID Usage Looked at what DISPIDs were used, without regard to the GUIDS of the calling classes Looked at what DISPIDs were used, without regard to the GUIDS of the calling classes DISPIDs had a large range, from lows of less than -2 billion, to highs of over 3 million DISPIDs had a large range, from lows of less than -2 billion, to highs of over 3 million Out of 743,270 functions analyzed, The vast majority had DISPIDs within 4 distinct ranges Out of 743,270 functions analyzed, The vast majority had DISPIDs within 4 distinct ranges 205 of the function did not fall within these groups, and instead were one of 6 other numbers 205 of the function did not fall within these groups, and instead were one of 6 other numbers Within each of the four ranges, occurrences at specific numbers formed patterns Within each of the four ranges, occurrences at specific numbers formed patterns

DISPID Usage – First Range The most common range for DISPIDs – 3,000,000-3,001, ,201 functions, about 66% 490,201 functions, about 66% 1,067 out of 1,286 different numbers used 1,067 out of 1,286 different numbers used Numbers nearer to 3 million are most common, higher numbers were used less Numbers nearer to 3 million are most common, higher numbers were used less Number range:Average Occurrences: 3,000,000-3,000,1991,121 3,000,200-3,000, ,000,400-3,000, ,000,600-3,000, ,000,800+1

DISPID Usage – Second Range Second common range for DISPIDs – 0-2, ,224 functions, about 22% 164,224 functions, about 22% 39 numbers in this range were used 39 numbers in this range were used 0 and 1,103 were the most common 0 and 1,103 were the most common Numbers clumped around 5 groups: 0-9, , , , and , with 2313 being an exception Numbers clumped around 5 groups: 0-9, , , , and , with 2313 being an exception

DISPID Usage – Third Range Third range for DISPIDs – -2,147,417,109 to -2,147,411,105 50,541 functions, about 7% 50,541 functions, about 7% 55 numbers in this range used 55 numbers in this range used Most occurrences were around numbers ending in round thousands Most occurrences were around numbers ending in round thousands

DISPID Usage – Fourth Range Fourth range for DISPIDs – 10,001-10,087 38,099 functions, about 5% 38,099 functions, about 5% 75 numbers out of the range were used 75 numbers out of the range were used Uniquely used by 3050f55d-98b5-11cf-bb82-00aa00bdce0b Uniquely used by 3050f55d-98b5-11cf-bb82-00aa00bdce0b DISPIDs 10,001-10,007 are most common DISPIDs 10,001-10,007 are most common

Patterns in DISPID Usage Looked at what DISPIDs were used, without regard to the GUIDS of the calling classes Looked at what DISPIDs were used, without regard to the GUIDS of the calling classes DISPIDs had a large range, from lows of less than -2 billion, to highs of over 3 million DISPIDs had a large range, from lows of less than -2 billion, to highs of over 3 million Out of 743,270 functions analyzed, The vast majority had DISPIDs within 4 distinct ranges Out of 743,270 functions analyzed, The vast majority had DISPIDs within 4 distinct ranges Within each of the four ranges, occurrences at specific numbers formed patterns Within each of the four ranges, occurrences at specific numbers formed patterns

Function with Multiple Integers Looked for patterns in the relations among the integer arguments of functions taking multiple arguments Looked for patterns in the relations among the integer arguments of functions taking multiple arguments Not very many functions in this category Not very many functions in this category One took two arguments, first was always 0 One took two arguments, first was always 0 One took two arguments, always the same. Arguments were all from (1,1) to (31,31) and (1908,1908) to (1908) One took two arguments, always the same. Arguments were all from (1,1) to (31,31) and (1908,1908) to (1908) All came from 2 signup sites on a particular website All came from 2 signup sites on a particular website Two took two differing arguments, could not find relation between arguments Two took two differing arguments, could not find relation between arguments Other functions did not have a large enough sample size Other functions did not have a large enough sample size

Functions with Multiple Integers Function itself had consistent patterns in the values it took: 95% of arguments were (1,1) or (3,2) Function itself had consistent patterns in the values it took: 95% of arguments were (1,1) or (3,2) No consistent relations between arguments No consistent relations between arguments

Function Pairs Examined Examined GUID: 3050f55d-98b5-11cf-bb82-00aa00bdce0b DISPIDs: Out of 38,099 occurrences, 3,595 were followed by: Out of 38,099 occurrences, 3,595 were followed by: GUID: c59c6b12-f6c1-11cf a0c911e8b2 DISPID: 0 Second function had no independent occurrences Second function had no independent occurrences Similar arguments: Similar arguments: First function took a variety of numbers and types of arguments First function took a variety of numbers and types of arguments Second function always took a DISPATCH argument, followed by the same arguments as the first function Second function always took a DISPATCH argument, followed by the same arguments as the first function

Conclusions of Approach Functions arguments through sets: Functions arguments through sets: Seems to be consistent patterns in certain functions Seems to be consistent patterns in certain functions Range, values taken, values common, value distribution Range, values taken, values common, value distribution DISPID usage DISPID usage 4 ranges with very few exceptions 4 ranges with very few exceptions Common subranges or distribution patterns within each range Common subranges or distribution patterns within each range Multiple arguments Multiple arguments Uncommon type of function Uncommon type of function No noticeable relations in arguments No noticeable relations in arguments Function pairs Function pairs Dependent functions have clear patterns Dependent functions have clear patterns Function position Function position Argument types and values Argument types and values Only one example – do more exist? Only one example – do more exist?

Byte string analysis

Byte String Analysis Buffer overflows are a common method of exploiting a targeted system One method: create a very long string to break boundary checking, then append shellcode at the end to inject into the assembly code We are interested in the length of BSTR objects feeded into given functions For any given API, what is considered a normal string length?

Class-based analysis Initial analyses were done on a class-by-class basis Samples were grouped together and analyzed according to GUID Byte strings are typically very small More than 70% of the commonly called Javascript classes typically received byte strings of less than length 20. (39 out of 55 functions from this crawl) Less than 10% of these ever receive a string greater than 5000 characters in length (4 out of 55 functions from this crawl).

Class-based analysis Analysis of individual classes shows same trend toward smaller strings However, analyzing based on classes groups byte strings of all class functions together, which results in inaccuracy and lost information BSTR length Exact length At most this length BSTR length Exact length At most this length …

Parameter-based analysis Second analysis split samples into individual arguments of unique functions of each class Given a sample set with values in the interval (a, b) with average μ and standard deviation σ, we expect values to largely lie within the interval (μ – σ, μ + σ) We also expect (μ – σ, μ + σ) to be smaller than (a, b) The smaller (μ – σ, μ + σ) is in proportion to (a, b), the more well-defined our sample set becomes

Parameter-based analysis Length of expected interval: 2σ Length of entire interval: n = b – a + 1 2σ/n represents the ratio of the expected interval to the entire interval Since 2σ < n, 0 < 2σ/n < 1 When 2σ/n = 0, σ = 0 and all values in data set are equal When 2σ/n = 1, σ = n/2 and all values in data equal either a or b As 2σ/n goes from 0 to 1, shape of graph begins to shift

Ratio is no more than: Amount of functions:Percentage:

When ratio is 0, amount of strings is typically low Otherwise, ratio increases as amount of strings decreases The function arguments with the smallest non-zero ratio are the most well-defined

Ratio is no more than:Amount of functions:Percentage: Only function arguments that see 9 or fewer strings are removed; however… Most zero-ratio functions are pruned (2607 to 731) Many functions with ratio > 0.5 are pruned (1540 to 883) Functions with ratio < 0.5 are affected minimally (1442 to 1332) Analysis with pruning

Ratio is no more than:Amount of functions:Percentage: Analysis with pruning Only function arguments that see 99 or fewer strings are removed; however… Almost all zero-ratio functions are pruned (731 to 232) Almost all functions with ratio > 0.5 are pruned (883 to 266) Only some functions with ratio < 0.5 are affected (1332 to 979)

String frequency requirement > 1 > 10 > 100 Ratio = < Ratio < < Ratio < As a function is seen in the wild more frequently, the byte string lengths it takes in begin to fall into specific intervals. Functions with substantial evidence are well-defined in the lengths of byte strings they tend to receive! Analysis with pruning

Comparing w/malicious data Symantec provided us with test samples used for Canary testing These samples trigger browser exploit but do not inject actual shellcode The worst thing they can do is crash the browser Malicious samples fell into one of three categories: Bad BSTR Bad I4 Bad DISPATCH (object) Example: “MSIE Popup Window Address Bar Spoofing Weakness” Callback data: Compare with data from May crawl: 491 strings seen over the 20,416 websites visited during that crawl Smallest: 70 Largest: 80 Average: Standard deviation: 2.33 Expected interval: [73.99, 78.65] Entire interval [70, 80] Length 150 is 31.6 standard deviations away from the average length! DISPIDGUIDParamsType 1Value f55f-98b5-11cf-bb82-00aa00bdce0b1BSTR150

Trend volatility How does web activity change over time? 28 crawls of 1000 sites over May 7 to May 13 were performedto investigate this RunSize (KB)Size (MB)DLL callsURLs w/scriptsRunSize (KB)Size (MB)DLL callsURLs w/scripts Each crawl differs by several hundred thousand DLL calls Amount of sites with actual scripts change

Trend volatility These runs were done ~5.5 hrs apart Change is very slight Zero-ratio functions increase High-ratio functions decrease

Trend volatility These runs were done ~1 day apart Change is also very slight Zero-ratio functions decrease Mid-ratio functions (R = 0.5) increase

Trend volatility These runs were done ~6 days apart Change is a little more apparent Zero-ratio functions decrease Mid-ratio functions (R = 0.5) increase

Trend volatility State of Javascript activity on Web is constantly changing Changes are somewhat unpredictable (and entirely dependent on decisions of webmaster) These changes in the long run are not major; however, they still exist and need to be addressed

Conclusions of Approach Substantial evidence in favor of existing trends for byte string arguments This approach can be adapted to anything that can be quantified as a number Changes in state of web will require any heuristic developed to have at least a basic learning capability Plan to continue research over the summer