Cloud Computing: hadoop Security Design -2009

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Cloud Computing Imranul Hoque. Today’s Cloud Computing.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.
Security Framework For Cloud Computing -Sharath Reddy Gajjala.
Cloud Computing. Cloud Computing Overview Course Content
Cloud Computing الحوسبة السحابية. subject History of Cloud Before the cloud Cloud Conditions Definition of Cloud Computing Cloud Anatomy Type of Cloud.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Introduction to Cloud Computing Cloud Computing : Module 1.
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Cloud Computing
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
HUSKY CONSULTANTS FRANKLIN VALENCIA WIOLETA MILCZAREK ANTHONY GAGLIARDI JR. BRIAN CONNERY.
Hadoop implementation of MapReduce computational model Ján Vaňo.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Cloud computing Cloud Computing1. NIST: Five essential characteristics On-demand self-service Computing capabilities, disks are demanded over the network.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Web Technologies Lecture 13 Introduction to cloud computing.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Computing Shannon McManus Michael Weihert. What is Cloud Computing?
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Next Generation of Apache Hadoop MapReduce Owen
RANDY MODOWSKI COSC Cloud Computing. Road Map What is Cloud Computing? History of “The Cloud” Cloud Milestones How Cloud Computing is being used.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
11. Looking Ahead.
Chapter 6: Securing the Cloud
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
An Introduction to Cloud Computing
Hadoop Clusters Tess Fulkerson.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Capitalize on modern technology
The Basics of Apache Hadoop
Lecture 16 (Intro to MapReduce and Hadoop)
Using and Building Infrastructure Clouds for Science
Presentation transcript:

Cloud Computing: hadoop Security Design -2009 *All opinions and information are mine and do not represent the view(S) of my employer Kaveh Noorbakhsh Kent State: CS Owen O’Malley | Kan Zhang | Sanjay Radia | Ram Marti | Christopher Harrell | Yahoo!

Brief History: Cloud Computing as a Service 1961: John McCarthy Introduces Concept of Cloud Computing as a business model 1969 ARAPANET 1997 “Cloud Computing” coined by Ramnath Chellappa 1999 Saleforce.com Enterprise Applications via simple web interface 2002 Amazon Web Services 2004 HDFS & Map/Reduce in Nutch 2006 Google Docs Amazon EC2 Yahoo hires Doug Cutting 2008 Eucalyptus 1st Open Source AWS API for Private Clouds OpenNebula Private and Hybrid clouds Hadoop hits web scale 2009 MS Azure Amazon RDS MySQL supported 2011 Amazon RDS supports Oracle Office 365

Hadoop – Funny Name, Big Impact

Map/Reduce An Introduction Map/Reduce allows computation to scale out over many “cheap” systems rather than one expensive super computer

Divide and Conquer Partition Combine “Work” w1 w2 w3 r1 r2 r3 “Result” “worker” “worker” “worker” r1 r2 r3 Combine “Result”

Two Layers MapReduce: Code runs here HDFS: Data lives here

Advantages of the Cloud Database as a Service = DBaaS Infrastructure as a Service = Iaas Software as a Service = SaaS Platform as a Service = PaaS Share hardware and energy costs Share employee costs Fast spin-up and tear down Expand quickly to meet demands Costs ideally proportional to usage Scalability

Cloud Services Spending Billions of Dollars

Cloud vs Total IT Spending Billions of Dollars

Security Challenges of the Cloud Where is my data living? You may not know where you data is exactly since the data can be distributed among many physical disks Where is my data going? In the cloud, especially in map/reduce, data is constantly in moving from node to node and nodes may be across multiple mini-clouds Who has access to my data? There may be other clients using the cloud, as well as, administrators and others who maintained the cloud that could have access to the data if it is not properly protected.

Hadoop Security Concerns Hadoop services do not authenticate users or other services. (a)  A user can access an HDFS or MapReduce cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented. (b)  An attacker can masquerade as Hadoop services. For example, user code running on a MapReduce cluster can register itself as a new TaskTracker. DataNodes do not enforce any access control on accesses to its data blocks. This makes it possible for an unauthorized client to read a data block as long as she can supply its block ID. It’s also possible for anyone to write arbitrary data blocks to DataNodes.

Security Requirements for Hadoop Users are only allowed to access HDFS files that they have permission to access. Users are only allowed to access or modify their own MapReduce jobs. User to service mutual authentication to prevent unauthorized NameN- odes, DataNodes, JobTrackers, or TaskTrackers. Service to service mutual authentication to prevent unauthorized services from joining a cluster’s HDFS or MapReduce service. The degradation of performance should be no more than 3%.

Proposed Solution – Use Case 1 Accessing Data 1) User/App requests access to a data block. 2) Name Node authenticates and gives the user a block token. 3) User/App uses block token on Data Node to access block for READ, WRITE, COPY or REPLACE.

Proposed Solution – Use Case 2 Submitting Jobs 1) A user may obtain a delegation token through Kerberos. 2) Token given to user jobs for subsequent authentication to NameNode as the user. 3) Jobs can use the delegation token to access data that user/app has access to

Core Principles Analysis Confidentiality Analysis Users/Apps will only have access to the data blocks they should have via block tokens Pass

Core Principles Analysis Integrity Analysis Data is only available at the block level if the block token matches. There is an assumption that the data is good because the blocks are not checked Pass Fail

Core Principles Analysis Availability Analysis Job Tracker and Name Nodes are single points of failure for system. Tokens persist for a small period of time so the system is resilient to short outages of Name Node and Job Tracker Fail Pass

Conclusion The token method for authentication for both data and process access makes sense in a highly distributed system like hadoop. However, the fact that tokens have so much power and are not constantly re-checked leaves this design open to very serious TOCTOU attacks. As compared to the currently model(aka no security) this represents a major step forward.

The End Questions?