Parallel I/O Optimizations Sources/Credits:  R. Thakur, W. Gropp, E. Lusk. A Case for Using MPI's Derived Datatypes to Improve I/O Performance. Supercomputing.

Slides:



Advertisements
Similar presentations
Parallel File System Simulator In order to test the Parallel File System (PFS) scheduling algorithms in a light-weighted approach, we have developed the.
Advertisements

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Patterns of.
File Consistency in a Parallel Environment Kenin Coloma
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
1 Characterizing the Sort Operation on Multithreaded Architectures Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad* The Advanced Computer Architecture.
Data Locality Aware Strategy for Two-Phase Collective I/O. Rosa Filgueira, David E.Singh, Juan C. Pichel, Florin Isaila, and Jesús Carretero. Universidad.
I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001.
January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical.
CS 524 – High- Performance Computing Outline. CS High-Performance Computing (Wi 2003/2004) - Asim LUMS2 Description (1) Introduction to.
Efficient Support for Interactive Browsing Operations in Clustered CBR Video Servers IEEE Transactions on Multimedia, Vol. 4, No.1, March 2002 Min-You.
Sangmin Seo, Robert Latham, Junchao Zhang, Pavan Balaji Argonne National Laboratory {sseo, robl, jczhang, May 4, 2015 Implementation and.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
1 I/O Management in Representative Operating Systems.
Distributed Computer Security 8.2 Discretionary Access Control Models - Liang Zhao.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Fine Grain MPI Earl J. Dodd Humaira Kamal, Alan University of British Columbia 1.
1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.
Visual Interfaces to Digital Libraries Katy Börner School of Library and Information Science Indiana University, Bloomington
Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.
Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.
Computer Science Department of 1 Massively Parallel Genomic Sequence Search on Blue Gene/P Heshan Lin (NCSU) Pavan Balaji.
CPE731: Advanced Computer Architecture Course Introduction Dr. Gheith Abandah د. غيث علي عبندة.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Integrated Maximum Flow Algorithm for Optimal Response Time Retrieval of Replicated Data Nihat Altiparmak, Ali Saman Tosun The University of Texas at San.
Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Replication Strategy based on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Hiding Periodic I/O Costs in Parallel Applications Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003.
Presenters: Rezan Amiri Sahar Delroshan
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Λίστα Εργασιών External Memory Data Structures Vitter, J. S. and Shriver, E. 1994a. Algorithms for parallel memory I: Two-level memories. Algorithmica.
A Study of Caching in Parallel File Systems Dissertation Proposal Brad Settlemyer.
Data Grid Technologies Sathish Vadhiyar Sources/Credits: Technical papers listed in references.
MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.
Making a DSM Consistency Protocol Hierarchy-Aware: An Efficient Synchronization Scheme Gabriel Antoniu, Luc Bougé, Sébastien Lacour IRISA / INRIA & ENS.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
Ch 11 Distributed File System Ch11.1 Architecture Lei Zhang Oct
SciDAC SDM Center All Hands Meeting, October 5-7, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Jianwei Li, Avery Ching,
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Cloud-based movie search web application with transaction service Group 14 Yuanfan Zhang Ji Zhang Zhuomeng Li.
DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Parallel I/O Optimizations
Data Center Energy Efficiency: Scale-Up/Scale-Out Processor Design Background & Analysis By Nick.
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information.
Database Performance Tuning and Query Optimization
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Ch 11 Distributed File System
Hadoop Technopoints.
Multiple-resource Request Scheduling. for Differentiated QoS
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

Parallel I/O Optimizations Sources/Credits:  R. Thakur, W. Gropp, E. Lusk. A Case for Using MPI's Derived Datatypes to Improve I/O Performance. Supercomputing 98  (bibliography)  Xiaosong Ma, Marianne Winslett, Jonghyun Lee, and Shengke Yu. Improving MPI IO output performance with active buffering plus threads. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, April 2003.Improving MPI IO output performance with active buffering plus threads

High Performance with Derived Data Types (Thakur et. al: SC 98)  Potential of parallel file systems not fully utilized because of application’s I/O access patterns a.Many small requests to non-contiguous blocks b.Most parallel file systems access single large chunk  Thus motivation for making a single call using derived data types  ROMIO (MPICH’s I/O) performs 2 optimizations – data sieving and collective I/O

Datatype Constructors in MPI 1.contiguous 2.vector/hvector 3.indexed/hindexed/indexed_block 4.struct 5.subarray 6.darray IIII IIIIIIIIII IIIIIIIII IIIDDDDCC

Different levels of access

Optimizations in ROMIO for derived-datatype noncontiguous access 1.Data sieving Make a few, large contiguous requests to the file system even if the user’s requests consists of several, small, nocontiguous requests Extract (pick out data) in memory that is really needed This is ok for read? For write? Use small buffer for writing with data sieving than for reading with data sieving. Why? Read-modify-write along with locking Greater the size of the write buffer, greater the contention among processes for locks

Optimizations in ROMIO for derived-datatype noncontiguous access 1.Data sieving 2.Collective I/O During collective-I/O functions, the implementation can analyze and merge the requests of different processes The merged request can be large and continuous although the individual requests were noncontiguous. Perform I/O in 2 phases: I/O phase – processes perform I/O for the merged request. Some data may belong to other processes. If the merged request is not contiguous, use data sieving Communication phase – processes redistribute data to obtain the desired distribution Additional cost of communication phase can be offset by performance gain due to contiguous access. Data sieving and collective-I/O also help improve caching and prefetching in underlying file system

Collective I/O Illustration P0P1P0P1 P0P1 P0P1P0P1 P0P1P0P1

Active Buffering with Threads (Xiaosong Ma et al.: IPDPS 2003)  Above optimizations alone are not enough.  Active Buffering – use of separate I/O nodes  Overlapping I/O access with computation by threads  Buffer space automatically adjusted to available memory

Original Scheme (Ma: IPDPS 2002)  Hierarchical buffering scheme  Dedicated I/O server nodes  During I/O: if(not overflow in compute nodes) compute nodes -> local buffers else if(not overflow in server nodes) compute nodes ->server buffers (using MPI) else server nodes -> I/O system  During computation: Server nodes clear local buffers and I/O write Fetch data from compute nodes (one-sided communication) and I/O write

Current Scheme  I/O threads collective I/O overlapped with main threads computation and communication  Uses pthreads with kernel-level scheduling  Interception of ROMIO’s I/O calls  Main threads and I/O threads coordinate by buffer queue  Producer-consumer and bounded-buffer problem

Execution Timeline

Bibliography  Philip H. Carns, Walter B. Ligon III, Robert B. Ross, and Rajeev Thakur. PVFS: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages , Atlanta, GA, October USENIX Association.PVFS: A parallel file system for linux clusters  Jose Aguilar. A graph theoretical model for scheduling simultaneous I/O operations on parallel and distributed environments. Parallel Processing Letters, 12(1): , March 2002.A graph theoretical model for scheduling simultaneous I/O operations on parallel and distributed environments  Rajesh Bordawekar. Implementation of collective I/O in the Intel Paragon parallel file system: Initial experiences. In Proceedings of the 11th ACM International Conference on Supercomputing, pages ACM Press, July 1997.Implementation of collective I/O in the Intel Paragon parallel file system: Initial experiences  Peter Brezany, Marianne Winslett, Denis A. Nicole, and Toni Cortes. Parallel I/O and storage technology. In Proceedings of the Seventh International Euro-Par Conference, volume 2150 of Lecture Notes in Computer Science, pages , Manchester, UK, August Springer- Verlag. Parallel I/O and storage technology  Bradley Broom, Rob Fowler, and Ken Kennedy. KelpIO: A telescope- ready domain-specific I/O library for irregular block-structured applications. In Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, pages , Brisbane, Australia, May IEEE Computer Society PressKelpIO: A telescope- ready domain-specific I/O library for irregular block-structured applications

Bibliography  J. Carretero, F. Pérez, P. de Miguel, F. Garc\'\ia, and L. Alonso. I/O data mapping in \em ParFiSys: support for high-performance I/O in parallel and distributed systems. In Euro-Par '96, volume 1123 of Lecture Notes in Computer Science, pages Springer-Verlag, August 1996I/O data mapping in \em ParFiSys: support for high-performance I/O in parallel and distributed systems  Ying Chen, Marianne Winslett, Y. Cho, and S. Kuo. Automatic parallel I/O performance optimization using genetic algorithms. In Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, pages IEEE Computer Society Press, July 1998.Automatic parallel I/O performance optimization using genetic algorithms  Ying Chen, Ian Foster, Jarek Nieplocha, and Marianne Winslett. Optimizing collective I/O performance on parallel computers: A multisystem study. In Proceedings of the 11th ACM International Conference on Supercomputing, pages ACM Press, July 1997.Optimizing collective I/O performance on parallel computers: A multisystem study  Avery Ching, Alok Choudhary, Kenin Coloma, Wei keng Liao, Robert Ross, and William Gropp. Noncontiguous I/O accesses through MPI-IO. In Proceedings of the Third IEEE/ACM International Symposium on Cluster Computing and the Grid, pages , Tokyo, Japan, May IEEE Computer Society Press.Noncontiguous I/O accesses through MPI-IO  Phillip M. Dickens and Rajeev Thakur. Evaluation of collective I/O implementations on parallel architectures. Journal of Parallel and Distributed Computing, 61(8): , August 2001.Evaluation of collective I/O implementations on parallel architectures

Bibliography  Félix Garcia-Carballeira, Alejandro Calderon, Jesus Carretero, Javier Fernandez, and Jose M. Perez. The design of the Expand parallel file system. The International Journal of High Performance Computing Applications, 17(1):21-38, 2003The design of the Expand parallel file system  Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pages , Bolton Landing, NY, October ACM Press.The Google file system  James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew A. Chien, and David S. Blumenthal. PPFS: A high performance portable parallel file system. In Hai Jin, Toni Cortes, and Rajkumar Buyya, editors, High Performance Mass Storage and Parallel {I/O}: Technologies and Applications, chapter 22, pages IEEE Computer Society Press and Wiley, New York, NY, 2001.PPFS: A high performance portable parallel file system  Meenakshi A. Kandaswamy, Mahmut Kandemir, Alok Choudhary, and David Bernholdt. An experimental evaluation of I/O optimizations on different applications. IEEE Transactions on Parallel and Distributed Systems, 13(7): , July 2002.An experimental evaluation of I/O optimizations on different applications  Mahmut Kandemir. Compiler-directed collective I/O. IEEE Transactions on Parallel and Distributed Systems, 12(12): , December 2001.Compiler-directed collective I/O

Bibliography  Xiaosong Ma, Marianne Winslett, Jonghyun Lee, and Shengke Yu. Improving MPI IO output performance with active buffering plus threads. In Proceedings of the International Parallel and Distributed Processing Symposium. IEEE Computer Society Press, April Improving MPI IO output performance with active buffering plus threads  Tara M. Madhyastha and Daniel A. Reed. Learning to classify parallel input/output access patterns. IEEE Transactions on Parallel and Distributed Systems, 13(8): , August 2002.Learning to classify parallel input/output access patterns  Ethan L. Miller and Randy H. Katz. RAMA: An easy-to-use, high- performance parallel file system. Parallel Computing, 23(4- 5): , June 1997.RAMA: An easy-to-use, high- performance parallel file system  Bill Nitzberg and Virginia Lo. Collective buffering: Improving parallel I/O performance. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing, pages , Portland, OR, August IEEE Computer Society Press. See also later version nitzberg:bcollective.nitzberg:bcollective  Huseyin Simitci and Daniel Reed. A comparison of logical and physical parallel I/O patterns. The International Journal of High Performance Computing Applications, 12(3): , Fall 1998.

Bibliography  Domenico Talia and Pradip K. Srimani. Parallel data- intensive algorithms and applications. Parallel Computing, 28(5): , May 2002.Parallel data- intensive algorithms and applications  Len Wisniewski, Brad Smisloff, and Nils Nieuwejaar. Sun MPI I/O: Efficient I/O for parallel applications. In Proceedings of SC99: High Performance Networking and Computing, Portland, OR, November ACM Press and IEEE Computer Society PressSun MPI I/O: Efficient I/O for parallel applications  K. K. Lee, M. Kallahalla, B. S. Lee, and P. J. Varman. Performance comparison of prefetching and placement policies for parallel I/O. International Journal of Parallel and Distributed Systems and Networks, 5(2):76-84, Performance comparison of prefetching and placement policies for parallel I/O  M. Kallahalla and P. J. Varman. PC-OPT: Optimal offline prefetching and caching for parallel I/O systems. IEEE Transactions on Computers, 51(11): , November 2002.PC-OPT: Optimal offline prefetching and caching for parallel I/O systems