1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Parallel I/O A. Patra MAE 609/CE What is Parallel I/O ? zParallel processes need parallel input/output zIdeal: Processor consuming/producing data.
By Ali Alskaykha PARALLEL VIRTUAL FILE SYSTEM PVFS PVFS Distributed File System:
Distributed components
Comparison and Performance Evaluation of SAN File System Yubing Wang & Qun Cai.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Module – 7 network-attached storage (NAS)
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
File Systems (2). Readings r Silbershatz et al: 11.8.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Networked File System CS Introduction to Operating Systems.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Advanced Operating Systems - Spring 2009 Lecture 21 – Monday April 6 st, 2009 Dan C. Marinescu Office: HEC 439 B. Office.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Computer and Network Bottlenecks Author: Rodger Burgess 27th October 2008 © Copyright reserved.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Parallel IO for Cluster Computing Tran, Van Hoai.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Truly Distributed File Systems Paul Timmins CS 535.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
SysPlex -What’s the problem Problems are growing faster than uni-processor….1980’s Leads to SMP and loosely coupled Even faster than SMP and loosely coupled.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Direct Attached Storage and Introduction to SCSI
Parallel Programming By J. H. Wang May 2, 2017.
Grid Computing.
The Client/Server Database Environment
Large Scale Test of a storage solution based on an Industry Standard
Storage Virtualization
Using the Parallel Universe beyond MPI
Direct Attached Storage and Introduction to SCSI
Hadoop Technopoints.
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Distributed File Systems
Distributed File Systems
Specialized Cloud Architectures
Distributed File Systems
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
PVFS: A Parallel File System for Linux Clusters
Parallel I/O for Distributed Applications (MPI-Conn-IO)
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8, 1999

2 Conceptual Partition Model

3 File I/O Model Support large-scale unstructured grid applications. –Manipulate single file per application, not per processor. Support collective I/O libraries. –Require fast concurrent writes to a single file.

4 Problems Need a file system NOW! Need scalable, parallel I/O. Need file management infrastructure. Need to present the I/O subsystem as a single parallel file system both internally and externally. Need production-quality code.

5 Approaches Provide independent access to file systems on each I/O node. –Can’t stripe across multiple I/O nodes to get better performance. Add a file management layer to “glue” the independent file systems so as to present a single file view. –Require users (both on and off Cplant) to differentiate between this “special” file system and other “normal” file systems. –Lots of special utilities are required. Build our own parallel file system from scratch. –A lot of work just to reinvent the wheel, let alone the right wheel. Port other parallel file systems into Cplant. –Also a lot of work with no immediate payoff.

6 Current Approach Build our I/O partition as a scalable nexus between Cplant and external file systems. +Leverage off existing and future parallel file systems. +Allow immediate payoff with Cplant accessing existing file systems. +Reduce data storage, copies, and management. –Expect lower performance with non-local file systems. –Waste external bandwidth when accessing scratch files.

7 Building the Nexus Semantics –How can and should the compute partition use this service? Architecture –What are the components and protocols between them? Implementation –What we have now and what we hope to achieve in the future?

8 Compute Partition Semantics POSIX-like. –Allow users to be in a familiar environment. No support for ordered operations (e.g., no O_APPEND). No support for data locking. –Enable fast non-overlapping concurrent writes to a single file. –Prevent a job from slowing down the entire system for others. Additional call to invalidate buffer cache. –Allow file views to synchronize when required.

9 Cplant I/O I/O Enterprise Storage Services

10 Architecture I/O nodes present a symmetric view. –Every I/O node behaves the same (except for the cache). –Without any control, a compute node may open a file with one I/O node, and write that file via another I/O node. I/O partition is fault-tolerant and scalable. –Any I/O node can go down without the system losing jobs. –Appropriate number of I/O nodes can be added to scale with the compute partition. I/O partition is the nexus for all file I/O. –It provides our POSIX-like semantics to the compute nodes and accomplishes tasks on behalf of the them outside the compute partition. Links/protocols to external storage servers are server dependent. –External implementation hidden from the compute partition.

11 Compute -- I/O node protocol Base protocol is NFS version 2. –Stateless protocols allow us to repair faulty I/O nodes without aborting applications. –Inefficiency/latency between the two partitions is currently moot; Bottleneck is not here. Extension/modifications: –Larger I/O requests. –Propagation of a call to invalidate cache on the I/O node.

12 Current Implementation Basic implementation of the I/O nodes Have straight NFS inside Linux with ability to invalidate cache. I/O nodes have no cache. I/O nodes are dumb proxies knowing only about one server. Credentials rewritten by the I/O nodes and sent to the server as if the the requests came from the I/O nodes. I/O nodes are attached via 100 BaseT’s to a Gb ethernet with an SGI O2K as the (XFS) file server on the other end. Don’t have jumbo packets. Bandwidth is about 30MB/s with 18 clients driving 3 I/O nodes, each using about 15% of CPU.

13 Current Improvements Put a VFS infrastructure into I/O node daemon. –Allow access to multiple servers. –Allow a Linux /proc interface to tune individual I/O nodes quickly and easily. –Allow vnode identification to associate buffer cache with files. Experiment with a multi-node server (SGI/CXFS).

14 Future Improvements Stop retries from going out of network. Put in jumbo packets. Put in read cache. Put in write cache. Port over Portals 3.0. Put in bulk data services. Allow dynamic compute-node-to-I/O-node mapping.

15 Looking for Collaborations Lee Ward Pang Chen