Securing Big Data KAIZEN APPROACH, INC.. Big Data Defined Big data is where the data volume, acquisition velocity, or data representation limits the ability.

Slides:



Advertisements
Similar presentations
Network Systems Sales LLC
Advertisements

Distributed Data Processing
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Big Data A big step towards innovation, competition and productivity.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Big Data Bijan Barikbin Denisa Teme Matthew Joseph.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
1 XML Based Networking Method for Connecting Distributed Anthropometric Databases 24 October 2006 Huaining Cheng Dr. Kathleen M. Robinette Human Effectiveness.
Iran Hutchinson.  I work for InterSystems who drives the new NoSQL project. 
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Big Data Tools Hadoop S.S.Mulay Sr. V.P. Engineering February 1, 2013.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Big Data Yuan Xue CS 292 Special topics on.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
An Introduction To Big Data For The SQL Server DBA.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
OMOP CDM on Hadoop Reference Architecture
Organizations Are Embracing New Opportunities
Data Platform and Analytics Foundational Training
Big Data Enterprise Patterns
Introduction to Distributed Platforms
An Open Source Project Commonly Used for Processing Big Data Sets
CS122B: Projects in Databases and Web Applications Winter 2017
Big Data Technology.
Hadoopla: Microsoft and the Hadoop Ecosystem
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.
Hadoop Market
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Big Data - in Performance Engineering
Microsoft Connect /22/2018 9:50 PM
Overview of big data tools
Big Data Young Lee BUS 550.
Charles Tappert Seidenberg School of CSIS, Pace University
Big DATA.
Big Data.
Presentation transcript:

Securing Big Data KAIZEN APPROACH, INC.

Big Data Defined Big data is where the data volume, acquisition velocity, or data representation limits the ability to perform effective analysis using traditional relational approaches or requires the use of significant horizontal scaling for efficient processing. (NIST 2012)

Big Data value In the eye of the beholder Value is defined through hypotheses and data modeling of the data sets Data which had been collected in the normal course of business can now be mined and correlated to find relationships and meaning Data sets vary from medical records, financial transactions, web cam photos, firewall logs, web logs, web url searches, physical security logs…

Big Data the 5 ‘Vs’ Volume: processing petabytes of data with low overhead and complexity Veracity: using data from a variety of domains Value: using commodity hardware Variety: leveraging flexible schemas to handle structured and unstructured data Velocity: performing real time analytics and ingesting streaming feeds as well batch processing

Examples of Big Data users PRIVATE SECTOR Wal-Mart Apple EBay Verizon Bank of America NYSE Amazon Google Yahoo PUBLIC SECTOR DoD CDC DoE GSA IRS NASA NOAA

Big Data Security Issues Large aggregated data store is an attractive target for hackers and malicious insiders Big Data stored in a public or hybrid cloud environment has a larger attack surface, virtual environment has its own security issues Sensitive data is being ported from mature and secure relational databases into NoSQL data stores lacking compatible security controls

Big Data Security Concerns SOURCE: CLOUD SECURITY ALLIANCE BIG DATA WORKING GROUP

NoSQL and Big Data NoSQL databases are ideal for huge quantities of data, especially unstructured or non-relational data. Some NoSQL systems do allow SQL-like query language NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage, offering marked gains in scalability and performance Challenges include support issues, lack of trained personnel, lack of standardization, immaturity, lack of a database management system Examples : HBase (Hadoop), Cassandra, MongoDB, Riak, CouchDB Hadoop is most popular

Hadoop is a Suite of Tools Distributed file system (HDFS) Distributed execution framework (MapReduce) Query language (Pig) Distributed, column-oriented data store (HBase) Machine learning (Mahout)

Hadoop Pros Process large data very efficiently Distributed storage and computation Very flexible – horizontally scalable HDFS file system is optimized for high throughput Simple API and model Parallel processing Inexpensive NoSQL database model (HBASE)

Hadoop Security Cons Security is NOT built into Hadoop (or any NoSQL database) at all: was never built for enterprise security but for publically available data No native encryption services offered Data spread on multiple machines in a cluster, making securing/hardening individual machines challenging and backup / recovery difficult Hadoop tools lack basic security controls Data veracity is a challenge given the possible multitude of data sources

Securing Big Data: Products Several types of products available: 1. NoSQL / Hadoop products with enhanced security built on top offering integrated authentication (not just Kerberos!) and encryption options 2. API gateways/proxies controlling what applications can access/which data queries can be made against a database cluster

Hadoop/NoSQL Security Products Cell-level access labels (Sqrrl/Accumulo) Kerberos authentication(Opensource, IBM, Cloudera, MapR) Access control lists for tables/column families (all Hadoop vendors) Data encryption (Sqrrl/Accumulo,Datameer,Gazzang,DataGuise,Vormetric) Authentication integration with LDAP and PKI (Sqrrl/Accumulo, MapR,Datameer)

Hadoop/NoSQL Security Products: Accumulo Sorted, distributed key/value store using Hadoop as its file system Developed by NSA beginning in 2008, Accumulo is now an open source software projected hosted by the Apache Foundation and natively integrates with Hadoop. Accumulo has three differentiators from Hadoop and other NoSQL databases: Secure: Fine-grained security controls allow organizations to control data at the cell- level, integrating existing authentication functions in the enterprise (PKI, LDAP, AD…) Scale: proven to operate and perform at massive scale with low administrative overhead Adapt: provides real-time analysis

Hadoop/NoSQL Products: Accumulo and Sqrrl Sqrrl is the commercial version of Accumulo, a startup of developers and engineers from NSA. Their version of Accumulo is Sqrrl Enterprise Sqrrl Enterprise is different from other Big Data tools because security is built into the platform, as a result, cell-level security controls do not result in any significant performance degradations. Data can be labeled or tagged by cell to provide fine grained access control. Sqrrl Enterprise integrates with enterprise Identity and Access Management (IAM) systems, such as Active Directory, LDAP, and PKI, biometrics. Sqrrl provides encryption of data-at-rest and data-in-motion

Big Data Security Products: API Gateways Appliance exposes published APIs, proxying between data on NoSQL or relational databases and applications Only approved/ published APIs permitted Tied into existing authentication sources Authorization and encryption available Malware/virus and DLP checking available Placed behind firewall Intel’s EAM, CA’s Layer7 and Mulesoft

API Gateway Example: Intel EAM

Securing Big Data: General Approaches Determine which data should be in a NoSQL database given immaturity of Big Data products/implementations Firewall off the big data clusters from rest of network Harden and secure machines (virtual and physical) where database cluster is distributed Limit who can access the databases with authentication Understand the target of and power of consolidated data to attackers and malicious insiders Realize that compliance/regulatory issues are the same for NoSQL databases as for Relational databases: backup, auditing, monitoring, securing data is still required

How Kaizen Can Help Our experienced professionals are steeped in security concepts, risk management, technology and principles of data processing We separate facts from fads and hype We’re vendor neutral, not resellers Our staff has extensive private and public sector experience with security: host/server, network and database/applications We keep up to date with current technology and events, applying best practices, experience and common sense to examine problems and come up with solutions

How Kaizen Can Help The tools to secure Big Data are new or being developed, but the concepts behind securing the data are not. Kaizen’s professionals can map the security requirements to the tools, and show what is lacking; We can test and research products, suggest procedures and practices to maintain and enhance the security of Big Data environments.

Summary Kaizen can help with big data problem analysis, test technical options and determine a solution, combining the technical and procedural This presentation surveys the problem space and possible combinations of security solutions: Secure NoSQL database implementations API gateways Encryption Leveraging existing firewall, authentication and authorization technology

Appendix: Vendors BIG DATA/HADOOP Apache IBM Cloudera Karmasphere EMC Hortonworks BIG DATA/SECURITY Sqrrl Intel/Mashery MapR Platfora Datameer Mulesoft CA/Layer7

Appendix: Vendors: Encryption Products for Big Data Gazzang Vormetric Dataguise

Big Data Consortiums and Standards Bodies Big Data Working Group