Su Zhang 1. Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing.

Su Zhang 1

Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing. Functions Available For Our Approach. Statistical Results Plan For Next Phase. 2

National Vulnerability Database U.S. government repository of standards based vulnerability management data. Data included in each NVD entry Published Date Time Vulnerable softwares CPE Specification Derived data Published Date Time Month Published Date Time Day Two adjacent vulnerabilities CPE diff (v1,v2) Version diff CPE Specification Software Name Adjacent different Published Date Time ttpv Adjacent different Published Date Time ttnv 4

Linux: 56925 instances Sun: 24726 instances Cisco: 20120 instances Mozilla: 19965 instances Microsoft: 16703 instances Apple: 14809 instances. 5

Huge size of nominal types (vendors and software) will result in a scalability issue. Top six take up 43.4% of all instances. We have too many vendors(10411) in NVD. The seventh most popular/vulnerable vendor is much less than the sixth. Vendors are independent for our approach. 8

NVD dataTraining/Testing dataset Starting from 2005 since before that the data looks unstable. Correct some obvious errors in NVD(e.g. cpe:/o:linux:linux_kernel:390). Attributes Published time : Only use month and day. Version diff: A normalized difference between two versions. Vendor: Removed. 9

Attributes Group vulnerabilities published at the same day- we can guarantee ttnv/ttpv are non-zero values. ttnv is the predicted attribute. For each software Delete its first bunch of instances. Delete its last bunch of instances. 10

v1= 3.6.4; v2 = 3.6; MaxVersionLength=4; v1= expand ( v1, 4 ) = 3.6.4.0 v2 =expand ( v2, 4 ) = 3.6.0.0 diff(v1, v2) = (3-3) * 100 0 +(6-6) * 100 -1 +(4-0) * 100 -2 +(0-0) * 100 -3 = 4 E -4 11

Vendor, soft, version, month, day, vdiff, ttpv, ttnv linux,kernel,2.6.18, 05, 02, 0, 70, 5 linux,kernel,2.6.19.2, 05, 07,1.02E-4,5, 281 12

Least Mean Square. Linear Regression Multilayer Perceptron. SMOreg. RBF Network. Gaussian Processes. 13

Function: Linear Regression Training Dataset: 66% Linux(Randomly picked since 2005). Test Dataset: the rest 34% Test Result: Correlation coefficient 0.5127 Mean absolute error 11.2358 Root mean squared error 25.4037 Relative absolute error 107.629 % Root relative squared error 86.0388 % Total Number of Instances 17967 14

Mean absolute error : Root mean square error: 16

Relative absolute error: Root relative squared error: 17

Function: Least Mean Square Training Dataset: 66% Linux(Randomly picked since 2005). Test Dataset: the rest 34% Test Result: Correlation coefficient -0.1501 Mean absolute error 7.6676 Root mean squared error 30.6038 Relative absolute error 73.449 % Root relative squared error 103.6507 % Total Number of Instances 17967 18

Function: Multilayer Perceptron Training Dataset: 66% Linux(Randomly picked since 2005). Test Dataset: the rest 34% Test Result: Correlation coefficient 0.9886 Mean absolute error 0.4068 Root mean squared error 4.6905 Relative absolute error 3.7802 % Root relative squared error 15.1644 % Total Number of Instances 17967 19

Function: RBF Network Training Dataset: 66% Linux(Randomly picked since 2005). Test Dataset: the rest 34% Test Result: Linear Regression Model ttnv = -15.3206 * pCluster_0_1 + 21.6205 Correlation coefficient 0.1822 Mean absolute error 10.5857 Root mean squared error 29.048 Relative absolute error 101.4023 % Root relative squared error 98.3814 % Total Number of Instances 17967 20

Linear Regression: Not accurate enough but looks promising (correlation coefficient: 0.5127). Least Mean Square: Probably not good for our approach(negative correlation coefficient). Multilayer Perceptron: Looks good but it couldnt provide us with a linear model. 21

SMOreg: For most vendors, it takes too long time to finish (usually more than 80 hours). RBF Network: Not very accurate. Gaussian Processes: Runs out of heap memory for most of our experiments. 22

Adding CVSS metrics as predictive attributes. Binarize our predictive attributes (e.g. divide ttnv/ttpv into several categories.) Use regression SVM with multiple kernels. 23

Try to find out an optimal model for our prediction. Try to investigate how to apply it with MulVAL if we get a good model. Otherwise, find out the reason why it is not accurate enough. 24

Thank you! 25

Su Zhang 1. Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing.

Similar presentations

Presentation on theme: "Su Zhang 1. Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Su Zhang 1. Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing.

Similar presentations

Presentation on theme: "Su Zhang 1. Quick Review. Data Source – NVD. Six Most Popular/Vulnerable Vendors For Our Experiments. Why The Six Vendors Are Chosen. Data Preprocessing."— Presentation transcript:

Similar presentations

About project

Feedback