A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Web-Conscious Storage Management for Web Proxies Evangelos P. Markatos, Dionisios N. Pnevmatikatos, Member, IEEE, Michail D. Flouris, and Manolis G. H.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Computer System Architectures Computer System Software
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Selling the Storage Edition for Oracle November 2000.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Large Scale Parallel File System and Cluster Management ICT, CAS.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Parallel IO for Cluster Computing Tran, Van Hoai.
Tackling I/O Issues 1 David Race 16 March 2010.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Background Computer System Architectures Computer System Software.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Flash Storage 101 Revolutionizing Databases
RAID, Programmed I/O, Interrupt Driven I/O, DMA, Operating System
So far we have covered … Basic visualization algorithms
Operating Systems (CS 340 D)
Hadoop Technopoints.
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Multithreaded Programming
Operating Systems (CS 340 D)
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Database System Architectures
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer

2 Trends in Scientific Research Scientific inquiry is now information intensive –Astronomy, Biology, Chemistry, Climatology, Particle Physics – all utilize massive data sets Data sets under study are often very large –Genomics Databases (50 TB and growing) –Large Hadron Collider (15 PB/yr) Time spent manipulating data often exceeds time spent performing calculations –Checkpointing I/O demands are particularly problematic

3 Typical Scientific Workflow 1.Acquire data Observational Data (sensor-based, telescope, etc.) Information Data (gene sequences, protein folding) 2.Stage/Reorganize data to fast file system Archive retrieval Filtering extraneous data 3.Process data (e.g. Feature Extraction) 4.Output results data 5.Reorganize data for visualization 6.Visualize Data

4 Trends in Supercomputing CPU performance is increasing faster than disk performance –Multicore CPUs and increased intra-node parallelism Main memories are large –4GB cost < $ Networks are fast and wide –>10Gb network and buses available Num Application Processes is increasing rapidly –RoadRunner > 128K concurrent processes achieving >1 Petaflop –BlueGene/P > 250K concurrent processes achieving >1 Petaflop

5 I/O Bottleneck Application processes are able to construct I/O requests faster than the storage system can provide service Applications are unable to fully utilize the massive amounts of available computing power

6 Parallel File Systems Addresses I/O bottleneck by providing simultaneous access to large number of disks Switched Network I/O Nodes CPU Nodes Process 0 PFS Server 0 Process 1Process 2Process 3 PFS Server 3 PFS Server 2 PFS Server 1

7 PFS Data Distribution PFS Server 0 PFS Server 3 PFS Server 2 PFS Server 1 Strip A Strip B Strip C Strip D Strip E Strip F Logical File Data Physical Data Locations Strip A Strip E Strip B Strip F Strip DStrip C

8 Parallel File Systems (cont.) Aggregate file system bandwidth requirements largely met –Large, aligned data requests can be rapidly transferred –Scalable to hundreds of client processes and improving Areas of inadequate performance –Metadata Operations (Create, Remove, Stat) –Small Files –Unaligned Accesses –Structured I/O

9 Scientific Workflow Performance 1.Acquire or Simulate Data Primarily limited by physical bandwidth characteristics 2.Move or Reorganize Data for Processing Often metadata intensive 3.Data Analysis or Reconstruction Small, unaligned accesses perform poorly 4.Move/Reorganize Data for visualization May perform poorly (small, unaligned accesses) 5.Visualize Data Benefits from reorganization

10 Alleviating the I/O bottleneck Avoid data reorganization costs –Additional work that does not modify results –Limits use of high level libraries Increase contiguity/granularity –Interconnects and parallel file systems are well tuned for large contiguous file accesses –Limits use of low latency messaging available between cores Improve locality –Avoid device accesses entirely –Difficult to achieve in user applications

11 Benefits of Middleware Caching Improves locality –PVFS Acache and Ncache –Improve write-read and read-read accesses Small accesses –Can bundle small accesses into compound operation Alignment –Can compress accesses by performing aligned requests Transparent to application programmer

12 Proposed Caching Techniques In order to improve the performance of small and unaligned file accesses, we propose middleware designed to enhance parallel file systems with the following: 1.Shared, Concurrent Access Caching 2.Progressive Page Granularity Caching 3.MPI File View Caching

13 Shared Caching Single data cache per node –Leverages trend toward large numbers of cores –Improves contiguity of alternating request patterns Concurrent access –Single Reader/Writer –Page locking system

14 File Write Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

15 File Write Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

16 File Write Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

17 File Write Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

18 File Write Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

19 File Write w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

20 File Write w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

21 File Write w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

22 File Write w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

23 File Write w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

24 Progressive Page Caching Benefits of paged caching –Efficient for the file system –Reduces cache metadata overhead Issues with paged caching –Aligned pages may retrieve more data than otherwise required –Unaligned writes do not cache easily Read the remaining page fragment Do not update cache with small writes Progressive paged caching addresses issues while minimizing performance and metadata overhead

25 Unaligned Access Caches Accesses are independent and not on page boundaries Requires increased cache overhead How to organize unaligned data –List I/O Tree –Binary Space Partition Tree

26 Paged Cache Organization Logical File

27 BSP Tree Cache Organization Logical File 11 8

28 List I/O Tree Cache Organization 10,2 0,1 2,2 Logical File 5,3

29 Progressive Page Organization Logical File 2,21,3 0,1 2,2

30 View Cache MPI provides a more descriptive facility for describing file I/O –Collective I/O –MPI provides file views for describing file subregions Use file views as a mechanism for coalescing reads and writes during collective I/O How to take the union of multiple views. –Use a heuristic approach to detect structured I/O

31 Collective Read Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

32 Collective Read Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

33 Collective Read Example Logical File Process 0 I/O RequestsProcess 1 I/O Requests

34 Collective Read w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

35 Collective Read w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

36 Collective Read w/Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

37 Collective Read w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

38 Collective Read w/ Cache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

39 Collective Read w/ ViewCache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

40 Collective Read w/ ViewCache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

41 Collective Read w/ ViewCache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

42 Collective Read w/ ViewCache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

43 Collective Read w/ ViewCache Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Block 0Cache Block 2Cache Block 1

44 Study Methodology Simulation-based study –HECIOS Closely modelled on PVFS2 and Linux 40,000 sloc Leverages OMNeT++, INET Framework –Cache Organizations Core Sharing Aligned Page access Unaligned page access

45 HECIOS Overview HECIOS System Architecture

46 HECIOS Overview (cont.) HECIOS Main Window

47 HECIOS Overview (cont.) HECIOS Simulation Top View

48 HECIOS Overview (cont.) HECIOS Simulation Detailed View

49 Contributions 1.HECIOS, the High End Computing I/O Simulator developed and made available under open source license. 2.Flash I/O and BT-IO traced at large scale and traces now publicly available 3.Rigorous study of caching factors in parallel file system 4.Novel cache designs for unaligned file access and MPI view coalescing

50 The End Thank You For Your Time! Questions? Brad Settlemyer

51 Dissertation Schedule August – Complete trace parser enhancements. Shared cache impl. Complete trace collection. September – Aligned cache sharing study. October – Unaligned cache sharing study. November – SigMetrics deadline. View coalescing cache. December – Finalize experiments. Finish writing thesis. Defend thesis.

52 PVFS Scalability Read and Write Bandwidth Curves for PVFS

53 Shared Caching (cont.) Logical File Process 0 I/O RequestsProcess 1 I/O Requests Cache Page 0Cache Page 2Cache Page 1

54 Bandwidth Effects Write Bandwidth on Adenine (MB/sec) Num Clients PVFS w/ 8 IONodes PVFS w/ Replication 16 IONodes Percent Performance % % % % %

55 Experimental Data Distribution PFS Server 0 PFS Server 3 PFS Server 2 PFS Server 1 Strip A Strip B Strip C Strip D Strip E Strip F Logical File Data Physical Data Locations Strip A Strip E Strip B Strip F Strip DStrip C Strip A Strip E Strip DStrip CStrip B Strip F

56 Discussion (cont.) PFS Server 0 PFS Server 3 PFS Server 2 PFS Server 1 Strip A Strip B Strip C Strip D Strip E Strip F Logical File Data Physical Data Locations Strip A Strip E Strip B Strip F Strip DStrip C Strip AStrip DStrip C Strip F Strip B Strip E

57 Switched Network I/O Nodes CPU Nodes Process 0 PFS Server 0 Process 1Process 2Process 3 PFS Server 3 PFS Server 2 PFS Server 1