Categorizing networks using Machine Learning

Categorizing networks using Machine Learning
Ralucca Gera (based on Greg Allen’s Thesis), Applied Mathematics Dept. Naval Postgraduate School Monterey, California

HDD classification Which addresses found on a secondary storage device are useful to a forensic analyst? identify user groups: useful (useful information about social network of the device’s user) not useful ( s that could be ignored by an analyst conducting an investigation (i.e. or Observation: ~95% addresses scanned are 'not-useful' Sample useful Sample not useful

Data: Collection process
Data consist of 400 graphs from 10 NPS volunteers (details on the next pages). The drives ranged in size and contained a variety of today's most popular operating systems, including Windows, OSX, and Linux. Example graph for one HDD

Data: HDD to Weighted Networks
HDD (model 1) Network (model 1) HDD (model 2) Network (model 2)

Data consist from 10 NPS volunteers.
Data: the 400 graphs Data consist from 10 NPS volunteers. For each HDD, use both Model 1 (within 128 bytes) and Model 2 (within 256 bytes)  10⋅2 graphs For each model, create a graph file for each of the top 20 largest connected components (for each device)  10⋅2⋅20 graphs Note: components naturally capture ‘similar’ address (observed by Janina Green’s thesis)

Machine Learning Experiments

Used Orange (GUI)

Graph attributes Normalized the ones that were not in [0,1]
Did classification using both the normalized and non-normalized data Many computationally 'cheap' (seconds to compute); some attributes costly First approach was to just use intuition to pick out attributes that would have seemed to work best; however, as research continued, more and more showed up; NetworkX provided a useful repository of algorithms, although many had to be altered to fit the data A hyperlink to the data

Each individual test for our experiments
Experiment Design Questions posed: Is it possible to correctly classify a network as being useful or not, based on the graph's underlying topological structure? Does the size of the window used to create graphs have an impact on our ability to classify them correctly? What attributes are most effective for classifying the graphs in our dataset? Does our ability to correctly classify our graphs improve when we train against a multi-class labeling scheme, as opposed to a binary scheme in which the only labels are `Useful' and `Not-Useful'? Each individual test for our experiments was repeated 10 times using the cross validation sampling method using 5 folds, and we present the results based on the average over the 10 trials.

Conducted 5 experiments
Graphs used 400 Attributes used 41 4 Normalized Order: number of nodes in the component divided by the number of nodes in entire image. Normalized Size: number of edges in the component divided by the number of edges in entire image. Average degree. Density 10 Average neighbor degree (r-normalized) Pearson coefficient Transitivity Highest Betweenness Maximal matching (divided by number of edges) Maximal matching Number of nodes (percentage from entire image) Degree distribution (best fit) Degree Distribution value Algorithms 1.Classification Tree 2. SVM 3. Logistic Regression 4. Naive Bayes 128 vs. 256 yes Type of categories Useful vs. not-useful 9 categories

Results

Measuring accuracy: Precision vs. Recall
Precision: % of how many of the instances that are predicted as being relevant are actually relevant Recall: % of instances that are relevant and accurately predicted as being relevant Precision = 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 Recall = 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 F_1 = 𝑟𝑒𝑐𝑎𝑙𝑙 + 1 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = =2 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛⋅𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙

Experiment 1: All Graphs, All Attributes
Baseline test to determine if we could answer question 1: Can we accurately classify groups of addresses located close to each other as being useful based on their formed graph’s underlying topological structure? Labeled each graph into Useful or Not (We know the ground) Used all 400 graphs with all 41 attributes Input Spreadsheet into Orange and ran 4 supervised machine learning algorithms: Naïve Bayes Classification Tree Logistic Regression SVMNaïve Bayes

Experiment 1: Results Naïve Bayes prediction

Experiment 2: 128 vs 256 Byte Windows
Identical to Experiment except dataset divided into two subsets, one in which the graphs were constructed using a 128-byte window, and one subset made up of those graphs created with a 256-byte window

Experiment 2 Results 128 byte Window
Method AUC CA F1 Precision Recall Naïve Bayes 0.963 0.929 0.522 0.353 1.000 Logistic Regression 0.747 0.974 0.600 0.750 0.500 SVM 0.913 0.987 0.833 Classification Tree 0.832 0.986 0.785 0.960 0.667 * Conclusion was that both window sizes performed well, although the Logistic Regression results do vary 256 byte window Method AUC CA F1 Precision Recall Naïve Bayes 0.957 0.917 0.480 0.316 1.000 Logistic Regression 0.997 0.994 0.923 0.857 SVM 0.900 0.962 0.625 0.500 0.833 Classification Tree 0.699 0.960 0.436 0.463 0.417

Experiment 3: Select Attributes
Intent is to determine what attributes work best Tried different combinations of different attributes Started with 4 basic attributes: Order Size Average Degree Density Still Labeled each graph into Useful or Not Multiple iterations ran with different combinations of attributes

Experiment 3 - Results Minimalist – Order, Size, Average Degree, and Density (Computationally inexpensive) Method AUC CA F1 Precision Recall Naïve Bayes 0.968 0.861 0.358 0.218 1.000 Classification Tree 0.867 0.974 0.692 0.643 0.750 Logistic Regression 0.500 0.961 0.000 SVM Clearly not enough attributes to achieve any meaningful results Minimalist – plus Average Neighbor Degree Method AUC CA F1 Precision Recall Naïve Bayes 0.968 0.939 0.558 0.387 1.000 Classification Tree 0.872 0.984 0.783 0.818 0.750 Logistic Regression 0.500 0.961 0.000 SVM 0.830 0.981 0.727 0.800 0.667 Much better results with the addition of just the attribute Top 10: Density, Nodes(% of drive), Avg Neighbor Degree(r-norm), Maximal Matching/Edges, Min(r-norm), Betweenness, max_core, Pearson, Transitivity, Degree Distribution Method AUC CA F1 Precision Recall Naïve Bayes 0.966 0.935 0.545 0.375 1.000 Classification Tree 0.767 0.967 0.578 0.627 0.550 Logistic Regression 0.580 0.961 0.250 0.500 0.167 SVM 0.955 0.990 0.880 0.846 0.917 Best with the top 10 attributes; falls as the number gets > 10

Degree Distribution

Average Neighbor degree

Betweenness Centrality

Density

Pearson Corr Coeff

Transitivity

Modularity

Experiment 4: Multiple Classes
We saw similar groups on different devices that kept showing up Enough to make 9 classes Owner (Useful), Database, Ubuntu, Microsoft, Certificates, Broadcast, Username, Mac Artifact, Other Reduced the previous 95% of ‘Not-Useful’ graphs to ~50%

Experiment 4 - Results 9 Classes Similar to results from Experiment 2
Method AUC CA F1 Precision Recall Naïve Bayes 0.818 0.958 0.552 0.471 0.667 Classification Tree 0.820 0.977 0.684 0.725 0.650 Logistic Regression 0.790 0.981 0.700 0.875 0.583 SVM 0.912 0.984 0.800 0.769 0.833 Similar to results from Experiment 2 8 Classes - Combine Useful and Ubuntu class into 1 (see next slide for reasoning) Method AUC CA F1 Precision Recall Naïve Bayes 0.986 0.974 0.852 0.742 1.000 Classification Tree 0.925 0.987 0.907 0.971 Logistic Regression 0.933 0.909 0.952 0.870 SVM 0.984 0.898 0.846 0.957 Better Back to binary classification scheme with 2 Classes, but now with Ubuntu and Useful combined Method AUC CA F1 Precision Recall Naïve Bayes 0.980 0.965 0.797 0.663 1.000 Classification Tree 0.941 0.982 0.877 0.861 0.894 Logistic Regression 0.975 0.994 0.954 0.955 SVM 0.970 0.984 0.896 0.843 Best

Extra Slides

Experiment 4: confusion matrix
• Owner: addresses that the owner communicated with • Database: addresses in the form of • Ubuntu: addresses in the form of or • Microsoft: addresses in the form of • Certificates: addresses in the form of • Broadcast: addresses in the form of • Username: addreses in the form of owner’s • Mac Artifact: addresses appearing to be MAC commands.

Classification Tree Classification tree for graphs with more than 20 nodes

References Greg Allen -- Masters in CS (Network Science), NPS. Thesis title: Locality Based Clustering. Here is a list of the attribute data he used. Janina Green - Masters in CS (Network Science), NPS. Thesis title: Constructing Social Networks from Secondary Storage with Bulk Analysis Tools

Categorizing networks using Machine Learning

Similar presentations

Presentation on theme: "Categorizing networks using Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Categorizing networks using Machine Learning

Similar presentations

Presentation on theme: "Categorizing networks using Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback