SECURING NETWORKS USING SDN AND MACHINE LEARNING DRAGOS COMANECI –
ABOUT ME Sofware Engineer/Security Researcher at Ixia in the ATI (Application Threat Intelligence) team Reverse engineering & emulating application protocols and strikes Doing a PhD on Software-Enabled Adaptive Network Traffic Management (short version: SDN + ML )
SHORT INTRODUCTION Problem: Traditional signature-based IPS/IDS approaches won’t scale as the network becomes complex Solution: Adaptive way of defending the network: SDN & Machine Learning Allows: Anomaly detection, botnet detection, honeypot rerouting
SYSTEM OVERVIEW
INTEGRATING FLOW CLASSIFICATION INTO AN SDN CONTROLLER Modern SDN Controllers are basically event handlers Streams of events come into the controller from the network and are transformed into forwarding rules Structure flow classification as events (e.g. flow match)
NETWORK ANOMALY DETECTION Continually train & refine supervised models for the traffic flows in our network When a new flow doesn’t match any model flag it as suspicious, add it to the queue for the clustering algorithm Run clustering with side information to see if there are other flows similar to it If it’s in a separate cluster => anomaly; if not, refine the model for the closest match
BOTNET DETECTION Groups of hosts communicate periodically with a C&C server and receive commands from it that are executed (eg. performing DDoS, scanning the network, sending spam, etc.) Communication flow with the C&C server => anomaly Similar communication flows are performed afterwards for the command => group of related flows Anomaly + group of related flows originating from the same host afterwards => bot
HONEYPOT TRAFFIC REROUTING As before, if the flow doesn’t match any supervised model, mark the host which initiated it as suspicious and store the flow 5-tuple Next time the host that initiated it tries to communicate reroute that flow to a honeypot
SYSTEM ARCHITECTURE
EXPERIMENTAL TESTBED
TESTING & RESULTS Used the Ixia BreakingPoint traffic emulator to simulate Enterprise, Small Business and ISP network traffic: Enterprise, SOHO/Small Business, Sandvine 2H 2013 North America Fixed application profiles
TESTING & RESULTS Along with the normal network traffic, we also emulated application attacks (Critical Strikes strikelist – 607 strikes) as well as botnet traffic (1646 different botnets, the majority of them HTTP based)
EVALUATION & RESULTS For training data, we generated packet captures with 256 streams for each flow type in the application profile Then, we proceeded to train classification models for Diffuse (C4.5) for each flow type through the WEKA ML framework Classification Accuracy: Application ProfileWithout attack/botnet trafficWith attack/botnet traffic Enterprise82%68% SOHO/Small Business87%71% Sandvine 2H 2013 North America Fixed 79%63%
CLASSIFICATION TIME How many packets do we have to inspect before we can reach a conclusion about the flow type? (cap at 20 packets) Flow features: Minimum, mean, maximum, standard deviation and sum of the packet sizes First 10 packet sizes First 10 packet communication endpoint (initiator/responder)
RESOURCE USAGE OVERHEAD 1 Mininet VM with Diffuse installed simulating a topology with 4 switches; learning switch SDN controller running in the same machine; CPU usage overhead when enabling Diffuse: 17% Memory usage overhead: 13%
CONCLUSIONS Machine learning flow classification & SDN can work together to make the network adaptive We can extract & use three types of information from the network: Flow type classification New flow type classifiers Flow groups Anomaly detection, botnet detection & honeypot rerouting can be done ML traffic classification overhead is manageable