Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim.

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
High Performance Computing Course Notes Grid Computing.
Silicon Graphics, Inc. Cracow ‘03 Grid Workshop SAN over WAN - a new way of solving the GRID data access bottleneck Dr. Wolfgang Mertz Business Development.
Vorlesung Speichernetzwerke Teil 2 Dipl. – Ing. (BA) Ingo Fuchs 2003.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Simo Niskala Teemu Pasanen
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
COEN 180 NAS / SAN. NAS Network Attached Storage (NAS) Each storage device has its own network interface. Filers: storage device that interfaces at the.
Storage Area Network (SAN)
COEN 180 NAS / SAN. Storage Trends Storage Trends: Money is spend on administration Morris, Truskowski: The evolution of storage systems, IBM Systems.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Managing Storage Lesson 3.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Module 10 Configuring and Managing Storage Technologies.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Improving Network I/O Virtualization for Cloud Computing.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
SoCal Infrastructure OptIPuter Southern California Network Infrastructure Philip Papadopoulos OptIPuter Co-PI University of California, San Diego Program.
Switched Storage Architecture Benefits Computer Measurements Group November 14 th, 2002 Yves Coderre.
IST Storage & Backup Group 2011 Jack Shnell Supervisor Joe Silva Senior Storage Administrator Dennis Leong.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. NAS versus SAN NAS – Architecture to provide dedicated file level access.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Internet Protocol Storage Area Networks (IP SAN)
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Tackling I/O Issues 1 David Race 16 March 2010.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Ryan Leonard Storage and Solutions Architect
Performance measurement of transferring files on the federated SRB
Diskpool and cloud storage benchmarks used in IT-DSS
What is Fibre Channel? What is Fibre Channel? Introduction
Thoughts on Computing Upgrade Activities
Introduction to Networks
Direct Attached Storage and Introduction to SCSI
Grid Canada Testbed using HEP applications
Storage Networking Protocols
Chapter 2: The Linux System Part 1
Rajeev Bhardwaj Director, Product Management
Cost Effective Network Storage Solutions
Presentation transcript:

Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim

Han S KimConcurrent Systems Architecture Group Outline I I Introduction II GFS via Hardware Assist: SC’02 III Native WAN-GFS: SC’03 IV True Grid Prototype: SC’04 V V Production Facility: 2005 VI Future Work

Han S KimConcurrent Systems Architecture Group I IIntroductionIntroduction

Han S KimConcurrent Systems Architecture Group 1.Introduction - The Original Mode of Operation for Grid Computing To submit the user’s job to the ubiquitous grid. The job would run on the most appropriate computational platform available. Any data required for the computation would be moved to the chosen compute facility’s local disk. Output data would be written to the same disk. The normal utility used for the data transfer would be GridFTP.

Han S KimConcurrent Systems Architecture Group 1.Introduction - In Grid Supercomputing, The very large size of the data sets used.  The National Virtual Observatory consists of approximately 50 Terabytes, is used as input by several applications. Some applications write very large amounts of data  The Southern California Earthquake Center simulation Writes close to 250 Terabytes in a single run Other applications require extremely high I/O rates  The Enzo application-AMR Cosmological Simulation code Multiple Terabytes per hour is routinely written and read.

Han S KimConcurrent Systems Architecture Group 1.Introduction - Concerns about Grid Supercomputing The normal approach of moving data back and forth may not translate well to a supercomputing grid, mostly relating to the very large size of the data sets used. These size and required transfer rates are not conducive to routine migration of wholesale input and output data between grid sites. The computation system may not have enough room for a required dataset or output data. The necessary transfer rates may not be achievable.

Han S KimConcurrent Systems Architecture Group 1.Introduction - In this paper.. Show  How a Global File System, where direct file I/O operations can be performed across a WAN can obviate these concerns.  A series of large-scale demonstrations

Han S KimConcurrent Systems Architecture Group II GFS via Hardware Assist: SC’02

Han S KimConcurrent Systems Architecture Group Global File Systems were still in the concept stage. Two Concerns  The latencies involved in a widespread network such as the TeraGrid  The file systems did not yet have the capability of exportation across a WAN 2. GFS via Hardware Assist: SC’02 - At That Time…

Han S KimConcurrent Systems Architecture Group Used hardware capable of encoding Fibre Channel frames within IP packets (FCIP) Internet Protocol-based storage networking technology developed by IETF FCIP mechanisms enable the transmission of Fiber Channel information by tunneling data between storage area network facilities over IP networks. 2. GFS via Hardware Assist: SC’02 - Approach

Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist:SC’02 - The Goal of This Demo In that year, the annual Supercomputing conference was Baltimore. The distance between show floor and San Diego is greater than any within the TeraGrid. The perfect opportunity to demonstrate whether latency effects would eliminate any chance of a successful GFS at that distance.

Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist: SC’02 - Hardware Configuration btw San Diego and Baltimore Two 4GbE channels Force 10 GbE switch Nishan 4000 Brocade Fiber Channel Switch Force 10 GbE switch Nishan 4000 Brocade Fibre Channel Switch Sun SF6800 San DiegoBaltimore FC Disk Cache, 17TB Silos and Tape Drives, 6PB TeraGrid backbone, ScieNet 10Gb/s WAN Two 4GbE channels Encoded and decoded Fiber Channel frames into IP packets for transmission and reception

Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist: SC’02 - SC’02 GFS Performance btw SDSC and Baltimore 720 MB/s, 80ms round trip SDSC-Baltimore Demonstrated the a GFS could provide some of the most efficient data transfers possible over TCP/IP

Han S KimConcurrent Systems Architecture Group III Native WAN-GFS: SC’03

Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Issue and Approach Issue: Whether Global File Systems were possible without hardware FCIP encoding. SC’03 was the chance to use pre-release software from IBM’s General Parallel File System (GPFS)  A true wide area-enabled file system  Shared-Disk Architecture  Files are striped across all disks in the file system Parallel access to file data and metadata

Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - WAN-GPFS Demonstration The Central GFS, 40 Two-processor IA64 nodes which provides sufficient bandwidth to saturate the 10GbE link Each server had a single FC HBA and GbE connecters Serves the file system across the WAN to SDSC and NCSA The mode of operation was to copy data produced at SDSC across the WAN to the disk systems on the show floor To visualize it at both SDSC and NCSA 10GbE to TeraGrid

Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 The visualization application terminated normally as it ran out of data and was restarted.

Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 Over a maximum bandwidth 10 Gb/s link, the peak transfer rate was almost 9Gb/s and over 1GB/s was easily sustained.

Han S KimConcurrent Systems Architecture Group IV True Grid Prototype: SC’04

Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - The Goal of This Demonstration To implement a true grid prototype of what a GFS node on the TeraGrid would look like. The possible dominant modes of operation for grid supercomputing:  The output of a very large dataset to a central GFS repository, followed by its examination and visualization at several sites, some of which may not have the resources to ingest the dataset whole. The Enzo application  Writes on the order of a Terabyte per hour: enough for 30Gb/s TeraGrid connection  With the post processing visualization they could check how quickly the GFS could provide data in a scenario.  Ran at SDSC, writing its output directly the GPFS disks in Pittsburgh.

Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - Prototype Grid Supercomputing at SC’04 30Gb/s 40Gb/s

Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - Transfer Rates The aggregate performance: 24Gb/s The momentary peak: over 27Gb/s The rates were remarkably constant. Three 10Gb/s connections between the show floor and the TeraGrid backbone

Han S KimConcurrent Systems Architecture Group V V Production Facility: 2005

Han S KimConcurrent Systems Architecture Group 5. Production Facility: The needs for Large Disk By this time, the size of datasets had become large.  The NVO dataset was 50 Terabytes per location, which was a noticeable strain on storage resources.  If a single, central, site could maintain the dataset this would be extremely helpful to all the sites who could access it in an efficient manner. Therefore, a very large amount of spinning disk would be required. Approximately 0.5 Petabytes of Serial ATA disk drives was acquired by SDSC.

Han S KimConcurrent Systems Architecture Group 5. Production Facility: Network Organization.5 Petabyte FastT100 Disk NCSA, ANL The Network Shared Disk server 64 two-way IBM IA64 systems with a single GbE interface and Fibre Channel 2Gb/s Host Bus Adapter The disks are 32 IBM FastT100 DS4100 RAID systems with GB drivers in each. The total raw storage is 32 x 67 x 250GB = 536 TB

Han S KimConcurrent Systems Architecture Group 5. Production Facility: Serial ATA Disk Arrangement 2 Gb/s FC connection 8+P RAID

Han S KimConcurrent Systems Architecture Group The Number of Remote Nodes 5. Production Facility: Performance Scaling Maximum of almost 6GB/s out of theoretical maximum of 8GB/s

Han S KimConcurrent Systems Architecture Group 5. Production Facility: Performance Scaling The observed discrepancy between read and write rates is not yet understood However, the dominant usage of the GFS is to be remote reads.

Han S KimConcurrent Systems Architecture Group VI Future Work

Han S KimConcurrent Systems Architecture Group 6. Future Work Next year (2006), the authors hope to connect to the DEISA computational Grid in Europe which is planning a similar approach to Grid computing, allowing them to unite the TeraGrid and DEISA Global File Systems in a multi-continent system. The key contribution of this approach is a paradigm. At least in the supercomputing regime, data movement and access mechanisms will be the most important delivered capability of Grid computing, outweighing even the sharing or combination of compute resources.

Han S KimConcurrent Systems Architecture Group Thank you !