Storage Systems in HPC John A. Chandy Department of Electrical and Computer Engineering University of Connecticut.

Slides:



Advertisements
Similar presentations
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Advertisements

HEC FSIO Workshop August 9, 2011 Communications and Protocols Active Networks and Active Object Storage John A. Chandy Department of Electrical and Computer.
ICS 434 Advanced Database Systems
High Productivity Computing Systems for Command and Control 13 th ICCRTS: C2 for Complex Endeavors Bellevue, WA June 17 – 19, 2008 Scott Spetka – SUNYIT.
PNFS, 61 th IETF, DC1 pNFS: Requirements 61 th IETF – DC November 10, 2004.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Google App Engine Cloud B. Ramamurthy 7/11/2014CSE651, B. Ramamurthy1.
Distributed Object Computing Weilie Yi Dec 4, 2001.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
High Performance Computing Course Notes High Performance Storage.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Chapter 4 Threads Threads: Resource ownership and execution.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
SERVER Betül ŞAHİN What is this? Betül ŞAHİN
A Study in NoSQL & Distributed Database Systems John Hawkins.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Computer System Architectures Computer System Software
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Networked File System CS Introduction to Operating Systems.
Data Storage CPTE 433 John Beckett. The Paradox “If I can go to a computer store and buy 1000 gigabytes for $50, why does it cost more in your server.
Google cloud Vs Apple Cloud Made By: Pooja Dubey (ITSNS)
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
1 Windows 2000 Product family (Week 3, Monday 1/23/2006) © Abdou Illia, Spring 2006.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Sponsored by the U.S. Department of Defense © 2008 by Carnegie Mellon University page 1 Pittsburgh, PA The Implications of a Single Mobile Computing.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Local and Remote byte-addressable NVDIMM High-level Use Cases
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
©Ian Sommerville 2000, Tom Dietterich 2001 Slide 1 Distributed Systems Architectures l Architectural design for software that executes on more than one.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
CEG 2400 FALL 2012 Windows Servers Network Operating Systems.
CEG 2400 FALL 2012 Linux/UNIX Network Operating Systems.
Background Computer System Architectures Computer System Software.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
As a general rule you should be using multiple languages these days (except for Java)
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Table General Guidelines for Better System Performance
Managing Multi-User Databases
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Grid Computing.
The Client/Server Database Environment
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
#01 Client/Server Computing
Database Management System (DBMS)
Oracle Architecture Overview
Database Systems Chapter 1
Threads Chapter 4.
Table General Guidelines for Better System Performance
Towards Unified Management
#01 Client/Server Computing
Presentation transcript:

Storage Systems in HPC John A. Chandy Department of Electrical and Computer Engineering University of Connecticut

Research Summary Storage SystemsStorage Systems –Active Storage –Parallel File Systems –Reliable Data Storage –Active Storage Networks

Storage Systems Parallel ComputingParallel Computing –Building parallel file systems to support HPC –Computation at the storage node –Data organization methods to improve performance Reliable Data StorageReliable Data Storage –Customizable and extensible storage for reliability –Backup strategies using personal storage devices –Data security, trust, and reliability in the cloud

Parallel File Systems Network Attached StorageNetwork Attached Storage –Put the storage on the network with a computer (server) acting as the go-between Network

Parallel File Systems Separate the metadata from the storageSeparate the metadata from the storage Network Metadata

Parallel File Systems How do you improve metadata performance?How do you improve metadata performance? –Distribute metadata services on data nodes –Use active storage and object services

Active Storage Allows us to run applications on storage nodesAllows us to run applications on storage nodes Can dramatically reduce data trafficCan dramatically reduce data traffic –Eliminate large network latencies Take advantage of fast RAID arrays and SSDsTake advantage of fast RAID arrays and SSDs –Drives bottle-necked by slow networks Run applications in parallel across multiple nodesRun applications in parallel across multiple nodes Make use of unused processor timeMake use of unused processor time

Programming Model Based on object storageBased on object storage RPC basedRPC based –Executable objects –RPC calls have full access to all object functions – read, write, create, set attribute, etc. Functions can be synchronous or asyncFunctions can be synchronous or async Supports multiple languages (C, Java, Python)Supports multiple languages (C, Java, Python)

Programming Model Based on work by Acharya, Riedel - Stream basedBased on work by Acharya, Riedel - Stream based Our model is Remote Procedure Call (RPC) basedOur model is Remote Procedure Call (RPC) based o Use executable objects o Added command to begin execution o Allow full access to all OSD functions Functions can be run sync or asyncFunctions can be run sync or async o Due to iSCSI 30sec timeout o Working to allow queries for async Allow parallel execution using asyncAllow parallel execution using async Support multiple languages (c, java, python)Support multiple languages (c, java, python)

Security Multiprocess implementationMultiprocess implementation –Limits AS functions from directly accessing objects –Limits access to the object services library –Enforces use of object security mechanisms chroot sandboxingchroot sandboxing –C/Java engines run in a chroot directory –Allows limited system libraries – e.g. libc

Security Multiprocess ImplementationMultiprocess Implementation o Limits AS functions from directly accessing objects o Limits access to the OSD services library  Forces the use of RPC o Enforces the use of OSD security mechanisms Chroot SandboxingChroot Sandboxing o Applied to engines o Limits engines inside a single directory o Allows limiting of libraries  AS versions of libraries possible

Active Storage Code Example

Results: AES Local vs. Active Storage

Results: Scaling with Multiple OSDs

Results: C vs. Java

High Performance Computing Active storage networkActive storage network –Computing in the network –SIMD-like processing of data in motion –Adaptive computing network elements –Application optimizations for database queries, scientific applications, data mining, sort, etc.

Active Storage Networks Data Sort

BECAT Collaboration Large Data ProblemsLarge Data Problems Parallel File Systems ImplementationParallel File Systems Implementation