1/27 Ensemble Visualization for Cyber Situation Awareness of Network Security Data Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North.

Slides:



Advertisements
Similar presentations
OLAP over Uncertain and Imprecise Data
Advertisements

An Interactive-Voting Based Map Matching Algorithm
The Components There are three main components of inDepth Lite, inDepth and inDepth+ Real Time Component Reporting Package Configuration Tools.
Systems Analysis and Design 9th Edition
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
Multi-Variate Analysis of Mobility Models for Network Protocol Performance Evaluation Carey Williamson Nayden Markatchev
1 Presented by Jean-Daniel Fekete. 2  Motivation  Mélange [Elmqvist 2008] Multiple Focus Regions.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Live Re-orderable Accordion Drawing (LiveRAC) Peter McLachlan, Tamara Munzner Eleftherios Koutsofios, Stephen North AT&T Research Symposium August, 2007.
Reduced Support Vector Machine
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
Cluster Analysis (1).
CS218 – Final Project A “Small-Scale” Application- Level Multicast Tree Protocol Jason Lee, Lih Chen & Prabash Nanayakkara Tutor: Li Lao.
Component-Based Routing for Mobile Ad Hoc Networks Chunyue Liu, Tarek Saadawi & Myung Lee CUNY, City College.
Chapter 9 Introduction to Arrays
GEOG 111/211A Transportation Planning Trip Distribution Additional suggested reading: Chapter 5 of Ortuzar & Willumsen, third edition November 2004.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Reconstructing the Emission Height of Volcanic SO2 from Back Trajectories: Comparison of Explosive and Effusive Eruptions Modeling trace gas transport.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Information Design and Visualization
SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
ARO–MURI Thoughts on Visualization for Cyber Situation Awareness MURI Meeting July 8–9, 2015 Christopher G. Healey Lihua Hao Steve E. Hutchinson CS Department,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
*** CONFIDENTIAL *** © Toshiba Corporation 2008 Confidential Creating Report Templates.
Implementing a Speech Recognition System on a GPU using CUDA
Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Extending Spatial Hot Spot Detection Techniques to Temporal Dimensions Sungsoon Hwang Department of Geography State University of New York at Buffalo DMGIS.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Managerial Economics Demand Estimation & Forecasting.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Modeling Tough Scheduling Problems with Software Alex S. Brown Mitsui Sumitomo Marine Management (USA), Inc.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees Anne Denton Major Advisor: William Perrizo.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : E.J. Palomo, J. North, D. Elizondo, R.M. Luque, T. Watson NN Application of growing.
Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
SASMI Self-Awareness and Self-Monitoring for Innovation.
Xin Tong, Teng-Yok Lee, Han-Wei Shen The Ohio State University
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Computational Biology
Machine Learning for the Quantified Self
Lecture 2-2 Data Exploration: Understanding Data
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Supervised Time Series Pattern Discovery through Local Importance
Cristian Ferent and Alex Doboli
Load Weighting and Priority
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
Running the Experiment
Flexible Web Visualization for Alert-Based Network Security Analytics
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

1/27 Ensemble Visualization for Cyber Situation Awareness of Network Security Data Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North Carolina State University, 2 U.S. Army Research Laboratory ARO MURI Meeting, UCSB, Nov. 18, 2014

2/27 Our Last Presentation Design Constraints -Mental models - Working environment -Configurability - Accessibility -Scalability - Integration Flexible web-based visualization -Data management (MySQL & PHP) -Web-based visualization -Analysts driven 2D charts -Interactive visualization -Correlated views (track requests) Practicability of ensemble visualization in network security domain

3/27 Outline Ensemble & network ensemble -Ensemble visualization is a very active research area -How is it related to network security data analysis -How to define a network ensemble -How to perform ensemble analysis Network ensemble visualization - examples -Alert ensemble visualization -NetFlow ensemble visualization Summarization

4/27 Ensemble Data A collection of related datasets (members), from runs of a simulation or an experiment, with slightly varying parameters or initial conditions Large, time-varying, multi-dimensional, multi-attribute Exploration of inter-member relationships Discover correlated members and/or separate subsets of data that have out of ordinary behavior Time (Hour) Member 1 Member 2Member 3Member 4 … … … … … Precipitation probability forecast ensemble

5/27 Extension to Network Ensemble Motivation of network ensemble analysis -Scalability is also critical -Discovery of related or out of ordinary network traffics (members) -Time dependent alert/NetFlow data -Opportunity to apply ongoing research from ensembles How is a network security dataset an ensemble? -Define a member, e.g., network traffic to a destination -Define correlated time window and/or time-steps, e.g., hour / day -Define a data value, e,g, number of alerts per time-step Are ensemble techniques useful in network security domain? -Determine the value added to this analysis

6/27 K=1 K=2 K=3 K=4 1.Overview of inter-member relationships -Structure the members into clusters based on their similarities -Visualize the cluster hierarchy of a level of detail clustering as a tree 2.Visualizing member sets Two Stages of Ensemble Analysis Cluster 7 Cluster 5 Cluster 1 Cluster 2 Cluster 1 Cluster 2Cluster 3Cluster 4 Cluster 6 Cluster 5

7/27 Network Ensemble Analysis - Procedure 1.Define member, time window and data value (configurability) 2.Determine a methodology to compare members, i.e., a measure of inter-member relationships 3.Perform a level-of-detail clustering, resulted as a cluster tree visualization (scalability) 4.Analysts choose members to visualize from the cluster tree (configurability) 5.Visualize selected member sets using web-based 2D charts visualization (work environment, accessibility)

8/27 Example: Alert Ensemble Visualization Member -Option 1: a combination of table columns (e.g., destination IP, destination port) -Option 2: a number of equal length time windows (e.g., per day) Inter-member Relationship -Similarity of changes in numbers of alerts over time Example -Alerts from source IP = Member: destination IP, destination port

9/27 Define an Alert Ensemble Ensemble Member Time-step Data Value Extract alerts from source IP Member Comparison Time Range

10/27 Source IP – 100 members Time Number of alerts

11/27 Dynamic Time Warping Find the optimal non-linear alignment A warping path that minimizes distances between two sequences Allow shifting & distortion over time Sequences do not have to be equal length

12/ X 100 Dissimilarity Matrix A large number of white or light gray cells indicate most alert traffics are similar. A small portion of dark cells indicate members (destinations) with out of ordinary traffics.

13/27 Cluster Tree Visualization Out of Ordinary Member

14/27 Closest Cluster Distance In Each Iteration A group of identical members Pattern discovered Range to select an optimal k value (number of clusters) Out of ordinary members may exist

15/27 Clusters Visualization k=20 Cluster 173 (65 Members) Highlighted Does the 65 members share a pattern (similar traffic)? Average Number of alerts

16/27 Cluster 173 (65 members) Number of alerts

17/27 Constrains For Cluster 173 Extract the member cluster for a follow on ensemble analysis or general visual request

18/27 Follow-on: NetFlow Ensemble Member -A sequence of NetFlows over time NetFlow Similarity Measure -Time duration - Density of alerts - Distribution of alerts (intervals) Inter-Member Relationship -Similarity between time-series of NetFlows Example -NetFlows related with cluster 173 from the alerts ensemble -Member: destination IP, destination port

19/27 Define a NetFlow Ensemble Extract Traffics Related with the 65 Member Cluster in the Alert Ensemble Time Range Weights of Measurements of NetFlow Dissimilarity Ensemble Member

20/27 All Members – 65 Members destination IP, port time The traffic looks similar Remember: The members are within the alert member cluster Can we further differen- tiate them by comparing their NetFlow traffics?

21/27 65 X 65 Dissimilarity Matrix The dissimilarity matrix speaks: The NetFlow traffics are not as similar as they look like.

22/27 Cluster Tree Visualization

23/27 Closest Cluster Distance In Each Iteration

24/27 K=38 – Cluster 79 (6 Members)

25/27 K=38 Cluster 89, 90 (2 Members Each)

26/27 Summary Opportunity to apply on-going ensemble visualization researches to network security data -Handle scalability and discovery in time dimension -Extract related or out of ordinary traffics Challenges of network ensemble analysis -How to define an ensemble and inter-member relationships -How to compare traffics and analyze their correlations -How to select and visualize member clusters Examples: alert ensemble & NetFlow ensemble visualization -Flexible definition of ensemble -Visualization of inter-member relationships with cluster tree -Member sets visualization based on 2D charts in web browser

27/27 Questions?

28/27 Summary Motivation to extend ensemble visualization to network security domain -Handle scalability and discovery in time dimension -Extract related or out of ordinary traffics -Opportunity to apply on-going ensemble visualization researches Procedure of network ensemble analysis -Define an ensemble and inter-member relationships -Compare members and build cluster tree hierarchy -Select member sets to visualize Examples: alert ensemble & NetFlow ensemble visualization -Flexible definition of ensemble -Cluster tree visualization -Member sets visualization based on 2D charts

29/27 Clusters Visualization k=20 Cluster 16 (1 Member) Highlighted Traffic to IP: Port: 3389 Average Number of alerts