Building a Threat-Analytics Multi-Region Data Lake on AWS

Slides:



Advertisements
Similar presentations
Whats New In Dream Report 4.5 Renee Sikes Applications Engineer Dream Report Brand Manager.
Advertisements

Dream Report: Advanced Manual Data Entry
Big Data Working with Terabytes in SQL Server Andrew Novick
Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Introducing WatchGuard Dimension. Oceans of Log Data The 3 Dimensions of Big Data Volume –“Log Everything - Storage is Cheap” –Becomes too much data –
Dream Report: The Technical Basics Renee Sikes Applications Engineer Dream Report Brand Manager.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Solving Automation Reporting Problems with Dream Report Renee Sikes Applications Engineer Dream Report Brand Manager.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Instructions to run this Demo Uncheck the Check Box – ‘Always Ask Before Opening this type of file’ Always OPEN & do not save the file ‘1KEYAgile.ppt’
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
A Spotfire Demo Gallery with Data Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community November 13, 2011 DRAFT 1.
© 2008 Ocean Data Systems Ltd - Do not reproduce without permission - exakom.com creation Dream Report O CEAN D ATA S YSTEMS O CEAN D ATA S YSTEMS The.
SQL Queries Relational database and SQL MySQL LAMP SQL queries A MySQL Tutorial and applications Database Building Assignment.
Data Management Console Synonym Editor
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
DAY 14: MICROSOFT ACCESS – CHAPTER 1 Madhuri Siddula October 1, 2015.
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
Microsoft Management Seminar Series SMS 2003 Change Management.
Foundations of Business Intelligence: Databases and Information Management.
March 2004 At A Glance ITPS is a flexible and complete trending and plotting solution which provides user access to an entire mission full-resolution spacecraft.
Andy Roberts Data Architect
OM. Platinum Level Sponsors Gold Level Sponsors Pre Conference Sponsor Venue Sponsor Key Note Sponsor.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Dumps PDF Perform Data Engineering on Microsoft Azure HD Insight dumps.html Complete PDF File Download From.
Energy Management Solution
Connected Infrastructure
Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
Cloud Computing for Science
Interactive Queries in Data Warehouses
  Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World |
Data Platform and Analytics Foundational Training
100% Exam Passing Guarantee & Money Back Assurance
Amazon Storage- S3 and Glacier
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Incrementally Moving to the Cloud Using Biml
Connected Infrastructure
Energy Management Solution
Advanced Security Architecture System Engineer Cisco: practice-questions.html.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Nipa Das, Ye Jee Kim, Murphy Potts, Sadaf Mirzai
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Introduction to Ms-Access Submitted By- Navjot Kaur Mahi
Power BI Performance …Tips and Techniques.
MANAGING DATA RESOURCES
Microsoft Connect /24/ :05 AM
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Near Real Time ETLs with Azure Serverless Architecture
MANAGING DATA RESOURCES
Microsoft Dynamics.
Zoie Barrett and Brian Lam
MS AZURE By Sauras Pandey.
Donald Donais Minnesota SharePoint Users Group – April 2019
Let’s Build a Tabular Model in Azure
Power BI at Enterprise-Scale
Outperform the Competition with Azure SQL Data Warehouse
Using Columnstore indexes in Azure DevOps Services. Lessons learned
AI Discovery Template IBM Cloud Architecture Center
COMPLETE BUSINESS TEXTING SOLUTION
Advanced Geospatial Techniques: Aiding Earth Observation Applications
Using Columnstore indexes in Azure DevOps Services. Lessons learned
build a real time operational data lake in minutes.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Architecture of modern data warehouse
Presentation transcript:

Building a Threat-Analytics Multi-Region Data Lake on AWS Ori Nakar 2018

About me Researcher at Imperva Web application and database security Software development methodology and architecture Cloud computing, AWS, Docker and Big Data

Agenda What is a Data Lake? Data Lake structure and flow example Threat-Analytics Data Lake architecture Multi-region queries Demo

Because data is large and the requirements are unknown – Our Story Data was almost in our hands – we saw it coming and going We needed a solution for storing it in a way we can use it We did not know: How much data we are going to keep and for how long New business use-cases are on the way Because data is large and the requirements are unknown – we decided to go with a Data Lake

Data Lake Collection of files stored in a distributed file system Information is stored in its native form, with little or no processing Flexible and allows great amount of data to be stored, queried and analyzed

The Data Data Lake's Data Database Data All data, even unused Structured, Semi-Structured, or Unstructured Transformed when ready to be used Database Data Structured and Tranformed Added per use case

The Users Operational Data Experts Want to get their reports and slice their data Advanced Go back to the data source  Data Experts Deep analysis

Answer new business questions faster Data Lake Database Add indices Plan your queries Create a schema Store what you get

Query Engine Data Lake Database Query Engine 2 Query Engine 1

Data Structure Example raw data/events day=2018-1-1 file1.csv file2.csv file3.csv day=2018-1-2 tables/events day=2018-1-1 type=1 file1.parquet file2.parquet type=2 file3.parquet file4.parquet type=3

CSV to Parquet Example Metadata Count: 4 Column metadata – place in file, min, max, compression info Time 1/1/2018 1:00, 1/1/2018 1:01, 1/1/2018 1:05, 1/1/2018 1:11 Type 1, 1, 2, 3 Message Text 1, Text 2, Text 3, text 4 Severity Low, High, Medium, Low Time Type Message Severity 1/1/2018 1:00 1 Text 1 Low 1/1/2018 1:01 Text 2 High 1/1/2018 1:05 2 Text 3 Medium 1/1/2018 1:11 3 Text 4

AWS Management Console Architecture and Flow Amazon Athena Python boto3 DBeaver SQL Client AWS Management Console ap-northeast-1 eu-west-1 us-east-1 S3 raw data S3 aggregated data Amazon Athena

Flow Example day=2018-1-1 file1.csv file2.csv SELECT time, message, severity FROM events WHERE day='2018-1-1' GROUP BY type ORDER BY severity type=1 data_file1.parquet data_file2.parquet type=2 AWS Athena is used to hourly / daily process the data – filter, join, partition, sort and aggregate data into parquet files

Multi-region queries Data is saved in multiple regions Two available options with Athena: Single query engine in one of the regions – Like in the good old days Query engine per region – Better performance, but more work

Demo Query

Will it work for you too? Tip: Do a POC with real data Summary We got to better analytics by: Using more data SQL and other engines capabilities Queries on multiple regions Improvements: Cost reduction in storage and compute No need to maintain servers Will it work for you too? Tip: Do a POC with real data

orin@imperva.com