Download presentation
Presentation is loading. Please wait.
Published byDaniella Viviani Modified over 5 years ago
2
CNN are proven image classifiers, performing better than humans over the last several year.
They are also used in non-image classification such as sound, genes, and malware. They process data in organized in vectors, matrices, or tensors. Normally that organization is naturally laid out. Spatial location such as images. Temporal location such as sound waves. Physical proximity such as a DNA strand. But what about non-natural data like machine process metrics? Extending the research by Dr. Abdelsalam for identifying malware infected machines in a cloud environment.
3
Metric columns (m) Organized by Process rows (p)
into a “Process/Metric” Image Process Image Columns: Metrics Sampled What’s the best order for the rows and columns?
4
Process Row Ordering 129 non-unique processes (of 408) Process name alphanumeric. Sibling relationships. VM then PID count. PID the VM count. 10 Randomly Generated Row Ordering Metric Column Ordering 28 Numeric metrics and two strings. Correlated. Absolute Value of Correlated. Anti-Correlated (Counter point). 10 Randomly Generated Row Ordering
6
Conclusion: Maintaining an order of process rows between experiments greatly improves performance. Ordering the metrics according to statistical correlation improves performance. Future work: Construct a ‘correlation’ ordering for process rows to see if that improves performance. Increase the malware data pool to include other machines (Windows OS, IOS, Android, etc.). Use data sets with a model independent machine learning such as AutoKeras. Identify other non-natural data for testing this methodology. Include time based LSTM as the output blackbox node for time based analysis.
7
Data Acquisition: LINUX/Wordpress Application Server Virtual Machine is Infected, Samples taken from All Servers Tier 1 Tier 2 Tier 3 LOAD BALANCED FIREWALL Web Server Application Server Database Clients ... . . . Web Server . . . Web Server Application Server Clients ... Web Server
8
Metric samples taken every 10 second for 30 minutes from all machines.
15 minute mark malware is injected into application server. 114 different linux malware injected. Samples from 912 machines. 29 million process samples, 2.9 million from infected machines.
9
Experiment Workflow Gather Raw Data Generate TF Records Run CNN
Focused on the Generate TF Records step. This is where the data is preprocessed for the CNN .
10
Metric column ordering:
Needed to define relationships between metrics. Picture image rows are positively correlated for surfaces and negatively correlated for edges. Ran exhaustive MySQL query to identify correlations between metrics for all samples (912 VMs)
11
Metric column ordering:
Ordered the results by the correlation [1,-1] relating metrics. Statistical Correlation: Correlated: ABS-Correlated: Anti-Correlated: 10 Randomly Generated Columns Ordering
12
Process row ordering: Determined that the initial paper did not stabilize process ordering between experiments, only by experiment. Identified 129 “non-unique” processes by name (out of 408) that were run on more than one virtual machine. Used these process as the first 129 in a list for ordering across all experiments. A process missing in an experiment sample is filled with zeros. “Unique” processes not in this list are added as a bottom row when they appear.
13
Process Row Ordering Attempted running a MySQL query to identity correlation between processes, proved infeasible with questionable results. Instead found other relationships between processes for mapping out the row order. Process name alphanumeric. Sibling relationships. VM then PID count. PID the VM count. 10 Randomly Generated Row Ordering
14
CNN Model: Used original 2D-CNN model for all experiments. 2 Convolutional layers, first is 32 nodes, second is 64 nodes, each use a 3x3 filter and relu activation. Pooling layer with a downsize of a factor of 2 after each convolutional layers. 2 Dense layers first with 1024 nodes, second 512 nodes, and relu activation each. Predictive layer is binary and produces a probability. Ordered columns and rows during the construction of the tensorflow records before processing CNN.
15
Processing Machine: Central Processor Unit: Inte Corei GHz x 12 Memory: 15.6 GB Graphical Processor Unit: GeForceTMGTX 1070i/PCIe/SSE2 OS: 64-bit Ubuntu 18.04LTS (Gnome ) CUDA: 10.0 Python: 3.6 using Tensorflow and Tensorboard
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.