Hybrid Cloud Security: Replication and Direction of Sensitive Data Blocks Glenn Michael Koch Eric Drew Advisor: Dr. XiaoFeng Wang Mentor: Kehuan Zhang.

Slides:

Advertisements

Similar presentations

Distributed Processing, Client/Server and Clusters

Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

CCNA – Network Fundamentals

Technical Architectures

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.

Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.

Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.

Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Implementing Multi-Site Clusters April Trần Văn Huệ Nhất Nghệ CPLS.

Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)

1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.

Database Design – Lecture 16

HDFS Hadoop Distributed File System

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

Copyright Protection Allowing for Fair Use Team 9 David Dobbs William Greenwell Jennifer Kahng Virginia Volk.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

Bleeding edge technology to transform Data into Knowledge HADOOP In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log,

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,

HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.

The concept of RAID in Databases By Junaid Ali Siddiqui.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.

IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.

CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Introduction to Active Directory

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

Lect 8 Tahani al jehain. Types of attack Remote code execution: occurs when an attacker exploits a software and runs a program that the user does not.

Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.

Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

Seminar On Rain Technology

By: Joel Dominic and Carroll Wongchote 4/18/2012.

BIG DATA/ Hadoop Interview Questions.

Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.

Advanced Operating Systems Chapter 6.1 – Characteristics of a DFS Jongchan Shin.

Hadoop Aakash Kag What Why How 1.

Introduction to Distributed Platforms

Algorithms for Big Data Delivery over the Internet of Things

Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016

Replication Middleware for Cloud Based Storage Service

Ministry of Higher Education

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)

The Basics of Apache Hadoop

Outline Midterm results summary Distributed file systems – continued

Fault Tolerance Distributed Web-based Systems

Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.

Hadoop Technopoints.

Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.

Distributed computing deals with hardware

Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.

Introduction To Distributed Systems

Database System Architectures

Presentation transcript:

Hybrid Cloud Security: Replication and Direction of Sensitive Data Blocks Glenn Michael Koch Eric Drew Advisor: Dr. XiaoFeng Wang Mentor: Kehuan Zhang School of Informatics and Computing, Indiana University, Bloomington Indiana INTRODUCTION Processing of large scale data sets in a cloud computing environment carries inherent security concerns(see FIGURE 2 ). Data sent out to public commodity servers is at greater risk of being compromised than data that is kept on local servers. A hybrid cloud solution involves separating sensitive data which is confined to a private domain (private cloud), from public data (public cloud).This research involves one component of the hybrid cloud security solution, the replication and direction of sensitive data with changing replica values. The task was to create and modify java source code within the Hadoop Distributed File System, to implement alternative replication factors and then test to verify that data was replicated to the proper domain based on its security tag. REFERENCES Hadoop, The Definitive Guide p FIGURE 2: Source: Awareness, Trust and Security to Shape Government Cloud Adoption, Lockheed Martin, LM Cyber Security Alliance and Market Connections, Inc. April, 2010 HADOOP AND CLOUD COMPUTING Hadoop is a set of open source technologies that supports reliable and cost-efficient ways of dealing with large amounts of data[1]. The exponential growth of individual data footprints, as well as the amount of data generated by machines[2] calls for a means to process said data. Hadoop is able to deal with large amounts of data because it splits it into subsets and sends the data to multiple processors. Multiple processors tied together will process the data at a much higher rate and then Hadoop reassembles the data into a single result set. Complex data operations shifted to clusters of computers are known as clouds[3] and software such as Hadoop orchestrates the merging of these processes. Hadoop in its present form does not provide data security. Hadoop does provide data replication, for the purposes of performance enhancement and fault tolerance, but does not distinguish private from sanitized data. Our work is to modify data replication and control in a hybrid cloud structure. PUBLIC CLOUD PRIVATE CLOUD CLIENT DATA NODE NAMENODE 1.REQUEST TO ALLOCATE BLOCK 4.REPLICATE SANITIZED DATA 3.TRANSFER DATA TO PRIVATE DATA NODE 2. SET UP DATA REPLICATION PIPELINE DATA NODE DATA NODE DATA NODE DATA NODE HYBRID CLOUD DATA REPLICATION Data replication in a secure Hybrid Cloud environment involves:  Replicating data that is tagged sensitive only to private nodes as identified in namenode metadata  Replicating sanitized or public data to random nodes, either public or private as to provide optimum performance and fault tolerance PUBLIC CLOUD CLIENT NAMENODE REQUEST TO ALLOCATE BLOCK REPLICATE DATA TRANSFER DATA TO DATA NODE SET UP DATA REPLICATION PIPELINE DATA NODE DATA NODE DATA NODE EDITED HADOOP JAVA CODE  The original Hadoop system was designed to work on a single cloud ( figure 1 ).  Thus Hadoop is not designed to automatically detect sensitive data to ensure that this data will be secure and prevented from being accessible by public cloud.  Here we modified the original java code to be able to distribute the data over the public and private clouds, while keeping data that is considered sensitive on the private cloud ( figure 3 ).  The code makes two distinct calls to the public and private cloud which is distinguished be a true or false value. FIGURE 1: Hadoop original structure FIGURE 3: Hybrid cloud structure DATA NODE REPLICATE DATA REPLICATE DATA REPLICATE DATA **This project is supported in part by NSF CNS