Download presentation
Presentation is loading. Please wait.
Published byJonathan Todd Modified over 9 years ago
1
Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha (rsriprac@ucla.edu) UCLA / Sift Security
2
Experiences RR Systems
3
What are “Big Data” systems? XXL in Size Data Volume TBs - PBs Computation Scalability Horizontally Scalable Multi-host Deployment Commodity Hardware
4
Why now? Rich Ecosystem Well Supported Open Source Software High Adoption Rate Commercial Backings “Redhat” Model Heavily Invested
5
Platform Providers
6
Technologies
7
Is it a “Big Data” problem? Many moving parts Initially maybe overwhelming 100s of configuration setting Requests some level of expertise Overkill for some problems Larger resource footprint
8
Big Data Stack
10
DFS
11
NoSQL Columnar Sits on HDFS Million Rows x Million Columns Cell-level Security
12
Titan Graph-based Datastore Optimized for (E, V) Key/Value attributes for vertices and edges 100s million vertices x 100s billion edges Capturing relationships Sits on top of HBase, Cassandra, …
13
Map-Reduce
14
Resilient Distributed Dataset (RDD) In-Memory RDD Iterative Algorithms Machine Learning
16
Impala Near-real-time analysis Micro-batch processing Pipelining of micro-batches Stream annotations
17
Sits on top of Distributed indexing and search Indexes Raw text files from HDFS HBase content Titan properties Other data replicated data streams
18
Application Log Search Full Text Indexes Flexible Faceting Automatic field extraction Dashboard-able search interface Low-cost alternative to Splunk and other search solutions
19
Real-time Blacklist Alerting Fault tolerance Netflow annotation Match alerting Application access alerting Authentication alerting Network metrics
20
Netflow Data Warehouse 3x Nodes 2x 8-Core Intel E5-2450 per node 16Gb RAM per node 72TB Storage Total ~5B Netflow records/day >1 year retention Support complex SQL-like query
21
Netflow Data Warehouse Continuous scanning Direct querying of delimited file Perform metrics and diffs Compute trending Firewall rule validations Long retention DFS
22
EMR Access Anomalies Category of insider threat Relational networks of Users/Groups Department Document Access Community structure-based anomaly detection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.