Machine Learning for Cyber

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Indian Statistical Institute Kolkata
Neural Networks Chapter Feed-Forward Neural Networks.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Appendix: The WEKA Data Mining Software
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
COMP24111: Machine Learning Ensemble Models Gavin Brown
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Usman Roshan Dept. of Computer Science NJIT
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
TensorFlow The Deep Learning Library You Should Be Using.
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Introduction to Machine Learning
Machine Learning for Computer Security
Data-intensive Computing Algorithms: Classification
Semi-Supervised Clustering
LECTURE 01: Introduction to Algorithms and Basic Linux Computing
Deep learning David Kauchak CS158 – Fall 2016.
AI development using Data Science Virtual Machines (DSVM) in Azure
Computer Hardware and Software
Deep Learning in HEP Large number of applications:
Cloud Big Data Decision Support System for Machine Learning on AWS
Classification with Perceptrons Reading:
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
Machine Learning Basics
Waikato Environment for Knowledge Analysis
Prepared by Kimberly Sayre and Jinbo Bi
WEKA.
Sampath Jayarathna Cal Poly Pomona
Unsupervised Learning and Autoencoders
Machine Learning & Data Science
Categorizing networks using Machine Learning
Classifying enterprises by economic activity
Machine Learning with Weka
APACHE MXNET By Beni Mulyana.
Integrating Deep Learning with Cyber Forensics
Classification Boundaries
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Creative Activity and Research Day (CARD)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Lecture 10 – Introduction to Weka
Artificial Intelligence Club
Automatic Handwriting Generation
Usman Roshan Dept. of Computer Science NJIT
Azure Machine Learning
Data Mining CSCI 307, Spring 2019 Lecture 7
Igor Stančin, Alan Jović to: {igor.stancin,
Python4ML An open-source course for everyone
Machine Learning for Cyber
Machine Learning for Cyber
Data Mining CSCI 307, Spring 2019 Lecture 8
Machine Learning for Cyber
Presentation transcript:

Machine Learning for Cyber Unit 1: Introduction

Learning Outcomes Upon completion of this unit: Students will have a better understanding of machine learning approaches. Students will have a better understanding of features. Students will have a better understanding of data sets. Students will have a better understanding of the need for machine learning to solve cyber security problems. Students will have a better understanding of the difference between deep learning and machine learning. Students will have a better understanding of big data and how it relates to machine learning and cyber security.

Terms Machine learning Cyber security Big data Deep learning This sets up the problem we’ll use to demonstrate various control and data structures of scripts. This is a common security need, and various commercial tools such as tripwire and tiger do this. They are not scripts, but they work very much like what is here. You might mention that, in practice, one would put the files we will create in places other than where this exercise puts them. Normally the files would go in a protected area, but here I opt for simplicity.

Tools WEKA Python Numpy Pandas Sklearn Tensorflow Hadoop Spark AWS GPUs, CPUs, TPUs, cognitive processors There are other attributes, such as i-node number, device number, time of last access, and time of last change to i-node (as opposed to time of last change to the file). Getting these requires some fancy commands and formatting (basically, call stat(1) and collapse the output into a line), and it seemed easier not to do this in this unit. All of these can be obtained from the widely used “ls” command.

What is machine learning? Methods for predicting, detecting , or grouping data samples based on a model The model must be learned with data Methods can be geometrical (or not) and the model is based on distance metrics or linear boundaries In practice, we would take special care of the file holding the attributes and names (keep them on write-once media, etc.).

Why machine learning for cyber? Too much data Building models by hand is labor intensive Machine learning can also learn models

What is big data? Lots of data Terabytes So much data that a single computer with 8 RAM and latest CPU cannot do the work Instead, need more powerful computer Better yet, several computers working in parallell Two main approaches: Parallel CPUs GPUs (1 or several also in parallel)

Machine learning Terms - 1 Supervised Classifiers Classifier Y_test Train Data Train Model Evaluation dividing Data X_test Test Data

Machine learning Terms - 2 Unsupervised Clustering Clustering Data Clustered Data Evaluation

Machine learning Terms - 3 Features Data sets Data pre-processing Performance metrics

Machine learning algorithms Naïve Bayes Decision trees Random forest KNN Linear regression Logistic regression Neural networks Support Vector Machines

What is deep learning? Neural networks with more layers between the input and output layers Batch processing for big data Matrix multiplication operation takes advantage of GPUs Have outperformed all others since around 2012

Machine learning pipeline Pre-processing (formatting, featured…) Data Vector Space Model Evaluation Machine Learning Algorithms

Dataset formats .csv .libsvm .arff etc … 0,tcp,http,SF,162,4528,0,0,0,1, … ,normal. .libsvm [label] [index 1]:[value 1] [index 2]:[value 2] … .arff The format of Weka storage data @duration numeric @protocol_type {tcp,udp,icmp} … @data 0,tcp,http,SF,162,4528,0,0,0,1, … ,normal etc …

What is a sample? Defining your sample is critical Examples: A single item: text (Bag-of-Words) Data science is popular. An image: fingerprint.bmp Elements or averages within a time window:

Data sets NSL-KDD network intrusion Unsw big data networking Iris Phishing Honeypot unsupervised Denial of service Malware Ransomware Biometrics

Companies using Machine learning Tesla Facebook Google Amazon (Alexa for instance) Apple  Microsoft

Companies using machine learning for Cyber Northrop Grumman BluVector Banks Etc.

Sample Code Additionally, all the code used in this book can be obtained from GitHub at Prof. Calix's Github and any other complimentary materials

Environment Virtual Machine Tensorflow from (Tensorflow Website ) You can use the latest version of Linux to run your code. Ubuntu 14.04 to 16.04 (64 bit) and Mac Tensorflow from (Tensorflow Website ) Sklearn from (Scikit-learn Website ) AWS

If you want to build a physical box A GPU GeForce gtx980 or better (or 1070 or Titan) A CPU such as the AMD 8 CORE Power supply EVGA SuperNOVA 1200 P2 220 Motherboard for GPU and CPU 32 MB of RAM (DDR3) SSD hard drive 1 TB A case The total cost for 1 device with just 1 CPU and 1 GPU may be between $1,500 and $2,000.

Summary Intro to data science Intro tools for data science Intro machine learning and deep learning Intro cyber security datasets for the course Intro environment for the course