Download presentation
Presentation is loading. Please wait.
1
Intelligent Detection of Malicious Script Code CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche
2
Introduction 3-quarter project Sponsored by Symantec Main focuses: Web programming Web programming Database development Database development Data mining Data mining Artificial intelligence Artificial intelligence
3
Overview Current security software catches known malicious attacks based on a list of signatures Current security software catches known malicious attacks based on a list of signatures The problem: New attacks are being created every day The problem: New attacks are being created every day Developers need to create new signatures for these attacks Developers need to create new signatures for these attacks Until these signatures are made, users are vulnerable to these attacks Until these signatures are made, users are vulnerable to these attacks
4
Overview (cont.) Our objective is to build a system that can effectively detect malicious activity without relying on signature lists Our objective is to build a system that can effectively detect malicious activity without relying on signature lists The goal of our research is to see if and how artificial intelligence can discern malicious code from non-malicious code The goal of our research is to see if and how artificial intelligence can discern malicious code from non-malicious code
5
Data Gathering Gather data using a web crawler (probably a modified web crawler based on the Heritrix software) Gather data using a web crawler (probably a modified web crawler based on the Heritrix software) Crawler scours a list of known “safe” websites Crawler scours a list of known “safe” websites Will also branch out into websites linked to by these websites for additional data, if necessary Will also branch out into websites linked to by these websites for additional data, if necessary While this is performed, we will gather key information on the scripts (function calls, parameter values, return values, etc.) While this is performed, we will gather key information on the scripts (function calls, parameter values, return values, etc.) This will be done in Internet Explorer This will be done in Internet Explorer
6
Data Storage When data is gathered it will need to be stored for the analysis that will take place later When data is gathered it will need to be stored for the analysis that will take place later Need to develop a database that can efficiently store the script activity of tens of thousands (possibly millions) of websites Need to develop a database that can efficiently store the script activity of tens of thousands (possibly millions) of websites
7
Data Analysis Using information from database, deduce normal behavior Using information from database, deduce normal behavior Find a robust algorithm for generating a heuristic for acceptable behavior Find a robust algorithm for generating a heuristic for acceptable behavior The goal here is to later weigh this heuristic against scripts to determine abnormal (and thus potentially malicious) behavior The goal here is to later weigh this heuristic against scripts to determine abnormal (and thus potentially malicious) behavior
8
Challenges Gathering Gathering How to grab relevant information from scripts? How to grab relevant information from scripts? How deep do we search? How deep do we search? Good websites may inadvertently link to malicious ones Good websites may inadvertently link to malicious ones The traversal graph is probably infinitely long The traversal graph is probably infinitely long Storage Storage In what form should the data be stored? In what form should the data be stored? Need efficient way to store data without simplifying it Need efficient way to store data without simplifying it Example: A simple laundry list of function calls does not take call sequence into account Example: A simple laundry list of function calls does not take call sequence into account Analysis Analysis What analysis algorithm can handle all of this data? What analysis algorithm can handle all of this data? How can we ensure that the normality heuristic it generates minimizes false positives and maximizes true positives? How can we ensure that the normality heuristic it generates minimizes false positives and maximizes true positives?
9
Milestones Phase I: Setup Phase I: Setup Set up equipment for research, ensure whitelist is clean Set up equipment for research, ensure whitelist is clean Phase II: Crawler Phase II: Crawler Modify crawler to grab and output necessary data so that it can later be stored and begin crawler activity for sample information Modify crawler to grab and output necessary data so that it can later be stored and begin crawler activity for sample information Phase III: Database Phase III: Database Research and develop an effective structure for storing data and link it to webcrawler Research and develop an effective structure for storing data and link it to webcrawler Phase IV: Analysis Phase IV: Analysis Research and develop an effective algorithm for learning from massive amounts of data Research and develop an effective algorithm for learning from massive amounts of data Phase V: Verification Phase V: Verification Using webcrawler, visit a large volume of websites to ensure that heuristic generated in phase IV is accurate Using webcrawler, visit a large volume of websites to ensure that heuristic generated in phase IV is accurate Certain milestones may need to be revisited depending on results in each phase Certain milestones may need to be revisited depending on results in each phase
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.