Presentation is loading. Please wait.

Presentation is loading. Please wait.

Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim.

Similar presentations


Presentation on theme: "Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim."— Presentation transcript:

1 Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim

2 Han S KimConcurrent Systems Architecture Group Outline I I Introduction II GFS via Hardware Assist: SC’02 III Native WAN-GFS: SC’03 IV True Grid Prototype: SC’04 V V Production Facility: 2005 VI Future Work

3 Han S KimConcurrent Systems Architecture Group I IIntroductionIntroduction

4 Han S KimConcurrent Systems Architecture Group 1.Introduction - The Original Mode of Operation for Grid Computing To submit the user’s job to the ubiquitous grid. The job would run on the most appropriate computational platform available. Any data required for the computation would be moved to the chosen compute facility’s local disk. Output data would be written to the same disk. The normal utility used for the data transfer would be GridFTP.

5 Han S KimConcurrent Systems Architecture Group 1.Introduction - In Grid Supercomputing, The very large size of the data sets used.  The National Virtual Observatory consists of approximately 50 Terabytes, is used as input by several applications. Some applications write very large amounts of data  The Southern California Earthquake Center simulation Writes close to 250 Terabytes in a single run Other applications require extremely high I/O rates  The Enzo application-AMR Cosmological Simulation code Multiple Terabytes per hour is routinely written and read.

6 Han S KimConcurrent Systems Architecture Group 1.Introduction - Concerns about Grid Supercomputing The normal approach of moving data back and forth may not translate well to a supercomputing grid, mostly relating to the very large size of the data sets used. These size and required transfer rates are not conducive to routine migration of wholesale input and output data between grid sites. The computation system may not have enough room for a required dataset or output data. The necessary transfer rates may not be achievable.

7 Han S KimConcurrent Systems Architecture Group 1.Introduction - In this paper.. Show  How a Global File System, where direct file I/O operations can be performed across a WAN can obviate these concerns.  A series of large-scale demonstrations

8 Han S KimConcurrent Systems Architecture Group II GFS via Hardware Assist: SC’02

9 Han S KimConcurrent Systems Architecture Group Global File Systems were still in the concept stage. Two Concerns  The latencies involved in a widespread network such as the TeraGrid  The file systems did not yet have the capability of exportation across a WAN 2. GFS via Hardware Assist: SC’02 - At That Time…

10 Han S KimConcurrent Systems Architecture Group Used hardware capable of encoding Fibre Channel frames within IP packets (FCIP) Internet Protocol-based storage networking technology developed by IETF FCIP mechanisms enable the transmission of Fiber Channel information by tunneling data between storage area network facilities over IP networks. 2. GFS via Hardware Assist: SC’02 - Approach

11 Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist:SC’02 - The Goal of This Demo In that year, the annual Supercomputing conference was Baltimore. The distance between show floor and San Diego is greater than any within the TeraGrid. The perfect opportunity to demonstrate whether latency effects would eliminate any chance of a successful GFS at that distance.

12 Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist: SC’02 - Hardware Configuration btw San Diego and Baltimore Two 4GbE channels Force 10 GbE switch Nishan 4000 Brocade 12000 Fiber Channel Switch Force 10 GbE switch Nishan 4000 Brocade 12000 Fibre Channel Switch Sun SF6800 San DiegoBaltimore FC Disk Cache, 17TB Silos and Tape Drives, 6PB TeraGrid backbone, ScieNet 10Gb/s WAN Two 4GbE channels Encoded and decoded Fiber Channel frames into IP packets for transmission and reception

13 Han S KimConcurrent Systems Architecture Group 2. GFS via Hardware Assist: SC’02 - SC’02 GFS Performance btw SDSC and Baltimore 720 MB/s, 80ms round trip SDSC-Baltimore Demonstrated the a GFS could provide some of the most efficient data transfers possible over TCP/IP

14 Han S KimConcurrent Systems Architecture Group III Native WAN-GFS: SC’03

15 Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Issue and Approach Issue: Whether Global File Systems were possible without hardware FCIP encoding. SC’03 was the chance to use pre-release software from IBM’s General Parallel File System (GPFS)  A true wide area-enabled file system  Shared-Disk Architecture  Files are striped across all disks in the file system Parallel access to file data and metadata

16 Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - WAN-GPFS Demonstration The Central GFS, 40 Two-processor IA64 nodes which provides sufficient bandwidth to saturate the 10GbE link Each server had a single FC HBA and GbE connecters Serves the file system across the WAN to SDSC and NCSA The mode of operation was to copy data produced at SDSC across the WAN to the disk systems on the show floor To visualize it at both SDSC and NCSA 10GbE to TeraGrid

17 Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 The visualization application terminated normally as it ran out of data and was restarted.

18 Han S KimConcurrent Systems Architecture Group 3. Native WAN-GFS: SC’03 - Bandwidth Results at SC’03 Over a maximum bandwidth 10 Gb/s link, the peak transfer rate was almost 9Gb/s and over 1GB/s was easily sustained.

19 Han S KimConcurrent Systems Architecture Group IV True Grid Prototype: SC’04

20 Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - The Goal of This Demonstration To implement a true grid prototype of what a GFS node on the TeraGrid would look like. The possible dominant modes of operation for grid supercomputing:  The output of a very large dataset to a central GFS repository, followed by its examination and visualization at several sites, some of which may not have the resources to ingest the dataset whole. The Enzo application  Writes on the order of a Terabyte per hour: enough for 30Gb/s TeraGrid connection  With the post processing visualization they could check how quickly the GFS could provide data in a scenario.  Ran at SDSC, writing its output directly the GPFS disks in Pittsburgh.

21 Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - Prototype Grid Supercomputing at SC’04 30Gb/s 40Gb/s

22 Han S KimConcurrent Systems Architecture Group 4. True Grid Prototype: SC’04 - Transfer Rates The aggregate performance: 24Gb/s The momentary peak: over 27Gb/s The rates were remarkably constant. Three 10Gb/s connections between the show floor and the TeraGrid backbone

23 Han S KimConcurrent Systems Architecture Group V V Production Facility: 2005

24 Han S KimConcurrent Systems Architecture Group 5. Production Facility: 2005 - The needs for Large Disk By this time, the size of datasets had become large.  The NVO dataset was 50 Terabytes per location, which was a noticeable strain on storage resources.  If a single, central, site could maintain the dataset this would be extremely helpful to all the sites who could access it in an efficient manner. Therefore, a very large amount of spinning disk would be required. Approximately 0.5 Petabytes of Serial ATA disk drives was acquired by SDSC.

25 Han S KimConcurrent Systems Architecture Group 5. Production Facility: 2005 - Network Organization.5 Petabyte FastT100 Disk NCSA, ANL The Network Shared Disk server 64 two-way IBM IA64 systems with a single GbE interface and Fibre Channel 2Gb/s Host Bus Adapter The disks are 32 IBM FastT100 DS4100 RAID systems with 67 250GB drivers in each. The total raw storage is 32 x 67 x 250GB = 536 TB

26 Han S KimConcurrent Systems Architecture Group 5. Production Facility: 2005 - Serial ATA Disk Arrangement 2 Gb/s FC connection 8+P RAID

27 Han S KimConcurrent Systems Architecture Group The Number of Remote Nodes 5. Production Facility: 2005 - Performance Scaling Maximum of almost 6GB/s out of theoretical maximum of 8GB/s

28 Han S KimConcurrent Systems Architecture Group 5. Production Facility: 2005 - Performance Scaling The observed discrepancy between read and write rates is not yet understood However, the dominant usage of the GFS is to be remote reads.

29 Han S KimConcurrent Systems Architecture Group VI Future Work

30 Han S KimConcurrent Systems Architecture Group 6. Future Work Next year (2006), the authors hope to connect to the DEISA computational Grid in Europe which is planning a similar approach to Grid computing, allowing them to unite the TeraGrid and DEISA Global File Systems in a multi-continent system. The key contribution of this approach is a paradigm. At least in the supercomputing regime, data movement and access mechanisms will be the most important delivered capability of Grid computing, outweighing even the sharing or combination of compute resources.

31 Han S KimConcurrent Systems Architecture Group Thank you !


Download ppt "Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim."

Similar presentations


Ads by Google