Current State of the Dasvis Project and Ideas for Moving Forward www.sgt-inc.com Current State of the Dasvis Project and Ideas for Moving Forward 6/10/2015 Grant Orndorff Chris Wolf
Contents What is Dasvis? Dasvis Demo Positives and negatives of current state Ideas Moving Forward H2O Demo Feedback?
What is Dasvis? (Short Version) Dasvis is designed as an architecture/platform for processing big data in real-time using only FOSS projects On top of Dasvis we are designing a network analysis tool for detecting anomalies such as those that occur during large data exfiltration events DDOS attacks
What is Dasvis? Main Technologies used: Storm/Trident – Streaming Processing Engine Kafka – Distributed Queuing MongoDB – NoSQL Database CubeDB – Timeseries Data warehouse built on top of MongoDB
What is Dasvis? Inside the primary processing engine, there are two parts Tracking Monitors incoming packets Aggregates and stores them Comparing Looks for anomalies by comparing incoming data to past data
Quick Live Demo Brief explanation of custom simulator Start simulation – see time series graph Set baseline data – see comparison graphs and dashboard Introduce anomaly – see comparison graphs and dashboard again
The Good It works! Uses only Free and Open Source Software Runs on a distributed cluster, and in theory should scale well with relatively inexpensive hardware
The Room for Improvement Almost everything we’ve done involving the architecture technologies has been closely tied to the network analysis project The network analysis project is mostly a proof-of-concept in its current state Requires too much user interaction to scale to very large networks We’ve only tested using simulated traffic Ideally able to see how it handles and responds to a real environment
Moving Forward Separate the idea of the platform from the network analysis project Continue to work on platform/architecture as Dasvis Continue network analysis project as RNAAT (Real-time Network Activity and Anomaly Tracker)
Platform Goals Make it easier to set up clusters that leverage all of the FOSS we’ve mentioned today Create a library for connecting and leveraging these technologies in order to easily use them to write new big data processing programs Create a project template that comes with all dependencies and is easily configurable and customizable for different applications
RNAAT Goals Eliminate most user interaction by replacing the comparing part of the program with a machine learning algorithm Create more advanced and easy to use visualizations Integration with Splunk
H2O Library Machine Learning library designed to work with big data Replace “Comparing” Comes with lots of useful algorithms, including one advertised as an Anomaly Detection Algorithm Demo with fake data
New Visualizations Graphs to show multidimensional data were collecting http://dataviz.pitchbook.com/founders/ Feed of anomalies pushed from H2O
Questions/Feedback?