Automated malware classification based on network behavior

Name: Automated malware classification based on network behavior
Uploaded: 2017-12-18T15:06:59+00:00
Duration: PTM7S27
Channel: Cassandra Nash
Description: Automated malware classification based on network behavior

Automated malware classification based on network behavior
2013 international conference on computing , networking and communications, communications and information security symposium Author : Saeed Nari , Ali A. Ghorbani Speaker : Wen Lin Yu /17

outline Introduction Related works Automated Malware Classification Evaluation Conclusion /17

Introduction Malware software has long become one of the major security threats on the Internet . Anti-virus programs primarily use content-based signatures in order to identify and classify malwares into their respective families. 1. 何謂 malware 2. 一般anti-virus 使用 content based 方法去作分類 malware 及偵測 4. content based 壞處在於說如果今天病毒變種了新型病毒再content 上可能會有些許不同，導致無法被偵測出來 /17

Content-based approaches :
This approach is not very accurate due to the existence of obfuscation , polymorphic and metamorphic techniques widely used by modern malware . Behavior-based approaches: Analysis by system call Network activity Content based 精准度不高 . polymorphic 和 metamorphic 技術的應用導致病毒難以被偵測這兩者都會在傳遞的時候去改變code 本身 polymorphic 比較容易被偵測因為本身 virus body 相同但病毒本身雖然code改變了但是行為跟溝通模式沒有改變 Behavior 通常分析 system call 或是網路的活動 /17

Related work Content-based approach
Kolter and Maloof applied machine learning to classify malicious executables using n-grams of byte codes as features . Tian used features from printable strings contained in malware samples to distinguish between malicious and benign executables. 1. 利用基於 disassemble 玩的 byte codes 去做 malware feature 分析壞處是需要disassemble 速度慢 2, 分析 malware 裡面可以被印出的 string 有哪些 === 當然以上的假設都是基於 malware 可以被拆解出來看且disassemble 速度慢沒效率當下一代病毒出來特徵以及有不同content 時此種方法偵測率就會降低 /17

Behavior-based approaches :
Lee and Mody represent malware samples with sequences of system calls and use string edit distance to classify them . Bailey apply normalized compression distance(NCD) as a similarity measure for classifying malware samples. Rieck use the information contained in the analysis reports created by CWSandBox . Behavior based 方法通常是去計算兩個 profile 的 distance 大小也就是相似度 system calls 方式去建立 malware profile string edit distance 去分類 2. Bailey 改善edit distance 並且利用 NCD （另一個數學公式） 3. Cwsandbox 07 年其他人發表的paper 自動化分析binary /17

This paper focus on how to automated classification malware by network behavior . And consider the dependencies between network flows. 1. 此篇重點在於如何利用惡意軟體網路活動的情形來自動分類而與其他篇paper 不同的是，他們多考慮了 network flow 之間的相關性因為作者認為現今的 malware 都會跟外部的 host 去做溝通所以他相信有一的機會可以利用 network behavior 就可以分類malware /17

Automated Malware classification
Network Trace (pcap files) Flow Extraction Behavior Tree Generation Classification Feature Extraction 利用 network traces 形成 pcap files 將每一個flow extraction 之後形成一個 behavior profile graph 型式將 graph feature 拉出來之後（後面會定義 feature 有哪些）最後利用已知 labled 好的sample data set 去當做 training data 和 test data /17

Flow Extraction Network flows are extracted from pcap files based on port numbers and protocols using TShark utility . Tshark 是 wireshark 提供的命令列指令 /17

We use dependencies between network flows to create behavior profile.
Behavior graphs Existing works on behavior based classification use network flow information such as port number and protocol to create profiles. We use dependencies between network flows to create behavior profile. 1. 為了可以精準的分類，我們需要一個詳細的 malware behavior ，根據 port number 和 protocol 可以去建立 behavior profile 2. 根據不同種的 flow 據建立一個完整的 behavior profile /17

Example behavior graph
每一個node 回用不同 protocol 連線的 flow 而此種graph 的 edge 唯有方向性的 DNS flow 回傳的 ip address 和其他的 flow 的 ip address 一樣時則 A-> B /17

Features based on the behavior graphs :
Feature Extraction There are two approaches for comparing and classifying the malware samples based on the behavior graphs. Graph edit distance Distance based on maximum common sub-graph Features based on the behavior graphs : Graph size Root out-degree Average out-degree Maximum out-degree Number of specific nodes Feature extraction 目前熱門方法有以下兩種 1. xxxxx 2. xxxxx 由於去計算兩個graph edit distance 或是最大的相同子圖所花時間很長，在時間上無法達到即時偵測在這邊作者選擇 feature based 方法，而feature based 方法需考慮到下面幾點 1. 5. 計算特殊node數量是用來區分相似架構但是不同 node labels /17

Using classification algorithms provided by WEKA library .
Classifying malware samples to their respective families using the feature vectors extracted in the previous step. Using classification algorithms provided by WEKA library . 這邊最主要目的就是將上一步驟選出來的feature vector 喂進去WEKA 去label 相對應的 families WEKA 是一個 open source 的軟體本身提供許多 machine learning 上面的演算法提供使用者去做分類使用c4.5 algo 去產生決策樹 /17

Evaluation Labeling the Dataset
We used the malware dataset provided by Communication Research Center Canada(CRC) . Malwares will be assigned a label by 11 anti-viruses scanners . We identified 13 malware families with this approach . 1. 資料是沒有 label 過的 2. 所以利用現有的 antivirus 去做一個類似 label 投票的動作，選出最多antiviurs 所掃出的label 結果當作此malware label 當然還有設定不同的門檻門檻值越高則 dataset 數量越少作者這邊設定去分別作 training data && test data /17

Dataset size Majority Threshold 6 7 8 Dataset Size 3768 3347 2907 /17

Classification Accuracy
1. 作者將剛剛的 dataset 分成 training data test data 2. 未來加入 system call 分析作者認為還可以在超越其他防毒系統 /17

Conclusion The framework author proposed outperforms five antivirus programs in classifying malware samples . The experiment author made only show that the framework has better performance than other five , but doesn’t show that it has better detection rate than other . 作者題出了一格自動化分類malware 的架構，優於其他家5個 anyi-virus system 而我比較好奇的是，在未知一個程式是malware 的情況下 , 他的d detection rate 和其他家想比不知道是如何？ /17

Thank you /17

Automated malware classification based on network behavior

Similar presentations

Presentation on theme: "Automated malware classification based on network behavior"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automated malware classification based on network behavior

Similar presentations

Presentation on theme: "Automated malware classification based on network behavior"— Presentation transcript:

Similar presentations

About project

Feedback