Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown.

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION? Mayur Palankar and Adriana Iamnitchi University of South Florida Matei Ripeanu University of British Columbia.
S4: A Simple Storage Service for Sciences Matei Ripeanu Adriana Iamnitchi University of British Columbia University of South Florida.
High Performance Computing Course Notes Grid Computing.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Distributed components
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Introduction to Distributed Systems
Overview Distributed vs. decentralized Why distributed databases
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Data Processing Architectures The difficulty is in the choice George Moore, 1900.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to Cyberspace
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
EE616 Technical Project Video Hosting Architecture By Phillip Sutton.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
DISTRIBUTED COMPUTING
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Components of Database Management System
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Webscale Computing Mike Culver Amazon Web Services.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Tools for collaboration How to share your duck tales…
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
1 NETE4631 Working with Cloud-based Storage Lecture Notes #11.
Amit Warke Jerry Philip Lateef Yusuf Supraja Narasimhan Back2Cloud: Remote Backup Service.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
1 G52IWS: Web Services Chris Greenhalgh. 2 Contents The World Wide Web Web Services example scenario Motivations Basic Operational Model Supporting standards.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Chapter 1 Characterization of Distributed Systems
Course: Cluster, grid and cloud computing systems Course author: Prof
Enterprise Architectures
Introduction to Distributed Platforms
Amazon Storage- S3 and Glacier
Introduction to Data Management in EGI
Database Management System (DBMS)
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
Introduction To Distributed Systems
Presentation transcript:

Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown

Outline What is a Data Grid Components of a Data Grid Data Grids of Today Amazon S3 Web Service

What is a Data Grid? Distributed storage mechanism providing resources to computational grids Cheap, effective, and scalable means of recording information across multiple grid sites The resources, tools, and information products that can be used for data discovery and delivery from a variety of sources, typically used for the production of valuable information.

Components of a Data Grid Case study: NERC o CSML. The Climate Science Modelling Language information Model. o The CSML Toolbox: Create and Manipulate documents which conform to the CSML schema. o The CSML Data Services. Expose documents & data pointed to. o The NDG Data Graphical User Interface - Use web service to manipulate data o Moles Schema, XQuery definitions, related software, frontend browser o Discovery Gateways & Infrastructure o Vocabulary server

Components Diagram

Storage Resource Broker Virtual data storage using namespaces Maintains metadata on files, users, groups Stored in relational DBMS Queries supported Has an API for other applications (e.g. Globus) Sharing, transfer, backup

Data Grids of Today Biomedical Informatics Research Network (BIRN) HP's Global File systems (SFS) collaboration NSF's iVDGL (International Virtual Data Grid Laboratory) o Now part of OSG European Union's DataGrid Project o Now part of the Enabling Grids for E-SciencE Natural Environment Research Counsel (NERC) Amazon Simple Storage Solution (S3)

Amazon S3 Amazon Simple Storage Service Web Service - REST / SOAP / BitTorrent Offload storage requirements to Amazon o Cost o Security Scalable - Storage, availability, speed Reliable - Fault tolerance, redundancy Fast Inexpensive - Commodity hardware Simple - Data grid is abstracted Flexible - Constraints

Amazon S3 - Design Principles Decentralization - Avoid SPoF Asynchrony - Avoid waiting on communications Autonomy - Local Responsibility - Nodes take care of themselves Controlled Concurrency - Exposed operations require little or no concurrency Failure Tolerance - Automatic recovery, minimal interruption Controlled Parallelism - Recover quickly Small Building Blocks Symmetry - Nodes are identical in functionality, minimal configuration Simplicity

Amazon S3 - Functionality Objects - Fundamental storage unit o 1B to 5GB o Metadata o Keys uniquely identify Objects Buckets - Namespace for managing objects o Users own Buckets o Buckets contain Objects o Unlimited Objects per Bucket Operations o Create, Read, Write, List, Delete Replication

Amazon S3 - Security Public key authentication + HMAC Access Control Lists for Buckets Logging for Buckets May use SSL Integrity - MD5 No data encryption

Amazon S3 - Disadvantages No renaming or moving of Buckets No content-based search No capping capabilities Cost

Amazon S3 - Costs Storage o $0.15 per GB-Month of storage used Data Transfer o $0.10 per GB - all data transfer in o $0.18 per GB - first 10 TB / month data transfer out o $0.16 per GB - next 40 TB / month data transfer out o $0.13 per GB - data transfer out / month over 50 TB Requests o $0.01 per 1,000 PUT or LIST requests o $0.01 per 10,000 GET and all other requests

References [2]: Baru, C.; Moore, R.; Rajasekar, A. & Wan, M. (1998), The SDSC storage resource broker, in 'CASCON '98: Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research', IBM Press,, pp. 5. [3] Amazon S3: [4] S. Aktas, M.; C. Fox, G. & Pierce, M. "Distributed High Performance Grid Information Service" Indiana University, 2007 [5] Garfinkel, I.; Palankar & Ripeanu. "Amazon S3 for Science Grids: a Viable Solution?" International Workshop on Data-Aware Distributed Computing, 2008

S html